# [Various] Ashes of the Singularity DX12 Benchmarks



## p4inkill3r

A relevant blog post from the game's creators,Oxide: The birth of a new API

PCPerspective

ExtremeTech

EuroGamer

Legit Reviews

Computerbase.de


Thanks to @Mahigan for the insights and legwork!

An Oxide rep responds to address the various discrepancies seen in the benchmark:
Quote:


> Originally Posted by *Kollock*
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


----------



## Redzo

ExtremeTech FPS charts for DX11 vs DX12, Fury X vs 980Ti.

You could put this link up in the OP as well.


----------



## DampMonkey

Those gains from the 390x are insane


----------



## EVGA-JacobF

I did a quick test with TITANX, results below. DX12 was about 10FPS faster on my test. The performance seems related to number of CPU cores though. Going to do some more testing.

PS: Latest version of PrecisionX has full OSD support











DX12:


DX11:


----------



## CasualCat

Is there a stand alone benchmark or do you have to buy the game?

edit: thanks JacopF


----------



## Ha-Nocri

up to more than 2x faster compared to dx11, 390x. Very nice


----------



## DFroN

Fury X and 980Ti neck and neck in the DX12 scores, compared to the Ti being 1.4x faster at 4K and 1.8x faster at 1080p in DX11.


----------



## Robenger

Still seems like it relies on strong single core performance. Wonder if that's just the engine?


----------



## Redzo

Insane gains, I'll tell you that.
This looks promising.


----------



## PostalTwinkie

I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.

Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


----------



## Robenger

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.
> 
> Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


Well they're all pretty similar across the benchmarks. ExtremeTech had the best write up. They attribute it to AMD having a lot of driver overhead in DX11. I think in part it may also have to do with AMD being a little more integrated into DX12 then Nvidia with Mantle code.


----------



## mutantmagnet

Quote:


> Quite frankly, the notion of DX12 running slower than DX11 in some scenarios isn't what we expected to see. Whether it's a game-specific issue, or a driver-related one, you can be sure that Nvidia's engineering team are digging deep into this benchmark now in an effort to figure out what's going on. We'll update with any news.


I would bet money Nvidia's problems are related to their 0.5 GB slow ram on the GTX 970.

It would serve them right for gambling on this idea but we won't know until someone does a test with a 960 and 980.


----------



## CasualCat

Quote:


> Originally Posted by *Robenger*
> 
> Well they're all pretty similar across the benchmarks. ExtremeTech had the best write up. They attribute it to AMD having a lot of driver overhead in DX11. I think in part it may also have to do with AMD being a little more integrated into DX12 then Nvidia with Mantle code.


Well I hope this is a poor example of DX12 then otherwise it seems pretty underwhelming.

Net result is Nvidia gains little over DX11. AMD gains a lot over DX11, but basically comes up to Nvidia's DX11/DX12 level. So great if you have AMD hardware, but still not knock your socks off overall improvement that was hinted at with DX12.

edit: Basically it'd have been nice to see both neck and neck on DX11 with appreciable gains then on DX12.


----------



## chuy409

Quote:


> Originally Posted by *DampMonkey*
> 
> Those gains from the 390x are insane


Those 390x dx11 results arent comparable of modern dx11 games. When have you seen a 390x being 30fps slower than a 980 in modern games? None. The dx12 results show how both gpus are currently performing in dx11 games. I dont know whats up with their amd dx11 results. We didnt get much information from this benchmark than what we already know. 390x and 980 are on par-we already knew this. Did dx 12 help AMD cpus anything worthwhile? Nope. Better thread usage was probably the #1 concern here and it doesnt appear to be that in this case.

We did get an fps boost, but also got decreases as well. So everything isnt fine and dandy.


----------



## Prophet4NO1

What I gather from this, most of AMD's performance issues have been software related. They could not, for what ever reason, get their drivers and DX to work well together. But, DX12 fixes that and gives them massive gains. While, Nvidia seems to have smaller gains.

That or Nvidia is getting smaller gains from poor drivers for DX12?

Either way, they are pretty much neck and neck now. Might see some price wars coming.


----------



## Themisseble

Yet again reviews from different sites contradict themselves.
PCpaper shows poor performance of i3 and FX line on DX12 (i7 destroys them all - DX12 or DX11 - which is very hard to believe)... while eurogamers shows how i3 in DX12 mode destroys i7 in DX11 mode.


----------



## Ganf

And today we have some pudding for everyone to play in. Over here we have the red pudding, and over there we have the green pudding. Everyone is allowed to play but please don't throw pudding across the table at the other players.


----------



## Robenger

Quote:


> Originally Posted by *Ganf*
> 
> And today we have some pudding for everyone to play in. Over here we have the red pudding, and over there we have the green pudding. Everyone is allowed to play but please don't throw pudding across the table at the other players.


Beautiful analogy


----------



## PostalTwinkie

Quote:


> Originally Posted by *mutantmagnet*
> 
> I would bet money Nvidia's problems are related to their 0.5 GB slow ram on the GTX 970.
> 
> It would serve them right for gambling on this idea but we won't know until someone does a test with a 960 and 980.


Based on what? Because that "slow" 0.5 GB didn't cause performance issues in DX 11 sub *5*K resolution.

EDIT:

Also, your theory doesn't explain the 980 Ti seeing marginal gains, while AMD still sees massive gains. This is software, not hardware I am betting. For all we now AMD has been sandbagging DX 11 support for awhile now, to show "massive" gains on DX12 and other APIs that are coming.

Well, maybe "sandbagging" is a bit harsh. Maybe this is the culmination of AMD focusing on newer APIs, and less on the "outgoing" (it really isn't) DX 11. This could all be what dumping money into one thing, and not the other, looks like.
Quote:


> Originally Posted by *Ganf*
> 
> And today we have some pudding for everyone to play in. Over here we have the red pudding, and over there we have the green pudding. Everyone is allowed to play but please don't throw pudding across the table at the other players.


Too late! Shots fired!


----------



## Robenger

Quote:


> Consider Nvidia. One of the fundamental differences between Nvidia and AMD is that Nvidia has a far more hands-on approach to game development. Nvidia often dedicates engineering resources and personnel to improving performance in specific titles. In many cases, this includes embedding engineers on-site, where they work with the developer directly for weeks or months. Features like multi-GPU support, for instance, require specific support from the IHV (Integrated Hardware Vendor). Because DirectX 11 is a high level API that doesn't map cleanly to any single GPU architecture, there's a great deal that Nvidia can do to optimize its performance from within their own drivers. That's even before we get to GameWorks, which licenses GeForce-optimized libraries for direct integration as middleware (GameWorks, as a program, will continue and expand under DirectX 12).
> 
> DirectX 12, in contrast, gives the developer far more control over how resources are used and allocated. It offers vastly superior tools for monitoring CPU and GPU workloads, and allows for fine-tuning in ways that were simply impossible under DX11. It also puts Nvidia at a relative disadvantage. For a decade or more, Nvidia has done enormous amounts of work to improve performance in-driver. DirectX 12 makes much of that work obsolete. That doesn't mean Nvidia won't work with developers to improve performance or that the company can't optimize its drivers for DX12, but the very nature of DirectX 12 precludes certain kinds of optimization and requires different techniques.


- From the last page on the ExtremeTech article.


----------



## geoxile

Quote:


> Originally Posted by *CasualCat*
> 
> Well I hope this is a poor example of DX12 then otherwise it seems pretty underwhelming.
> 
> Net result is Nvidia gains little over DX11. AMD gains a lot over DX11, but basically comes up to Nvidia's DX11/DX12 level. So great if you have AMD hardware, but still not knock your socks off overall improvement that was hinted at with DX12.
> 
> edit: Basically it'd have been nice to see both neck and neck on DX11 with appreciable gains then on DX12.


Well that's really the point of DX12. It reduces CPU overhead and distributes the CPU workload across multiple threads. At the most basic level it's designed to alleviate any CPU bottlenecks, not improve GPU performance.

I'm sure once we start to see renderers implementing Async compute we will see GPU-side performance improvements.


----------



## dieanotherday

Quote:


> Originally Posted by *Ganf*
> 
> And today we have some pudding for everyone to play in. Over here we have the red pudding, and over there we have the green pudding. Everyone is allowed to play but please don't throw pudding across the table at the other players.


lol that's funneh


----------



## Heuchler

not like software is need to make hardware run.

Computer Base has a nice review
http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/


----------



## p4inkill3r

Quote:


> Originally Posted by *Heuchler*
> 
> not like software is need to make hardware run.
> 
> Computer Base has a nice review
> http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/


Thanks, added it to the OP.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Robenger*
> 
> - From the last page on the ExtremeTech article.


Basically AMD sucks at developing drivers within the DX 11 environment; either by lack of ability or finances to fund it. Where as Nvidia is capable of providing, for whatever reason, the resources to really maximize DX 11.

EDIT:

Not sure anyone should be surprised by this.


----------



## Robenger

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Basically AMD sucks at developing drivers within the DX 11 environment; either by lack of ability or finances to fund it. Where as Nvidia is capable of providing, for whatever reason, the resources to really maximize DX 11.


That's one way of interpreting that....


----------



## geoxile

Quote:


> Originally Posted by *Robenger*
> 
> - From the last page on the ExtremeTech article.


http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/

Another interesting commentary on drivers by a former Nvidia driver developer intern.

The most relevant part
Quote:


> *The first lesson is: Nearly every game ships broken.* We're talking major AAA titles from vendors who are everyday names in the industry. In some cases, we're talking about blatant violations of API rules - one D3D9 game never even called BeginFrame/EndFrame. Some are mistakes or oversights - one shipped bad shaders that heavily impacted performance on NV drivers. These things were day to day occurrences that went into a bug tracker. *Then somebody would go in, find out what the game screwed up, and patch the driver to deal with it. There are lots of optional patches already in the driver that are simply toggled on or off as per-game settings, and then hacks that are more specific to games - up to and including total replacement of the shipping shaders with custom versions by the driver team.* Ever wondered why nearly every major game release is accompanied by a matching driver release from AMD and/or NVIDIA? There you go.


In short, the current API infrastructure and the methodology is FUBAR. Because the APIs obfuscate a lot of things the game devs write crappy (game) renderers, and then the driver teams have to try to fix those problems on their (driver) side.

Clearly, Nvidia's been more hands on with developers and most likely their driver-side fixes are just more comprehensive and game-specific. AMD traditionally hasn't been as hands on, with the exception of GE titles.

With DX12, the game devs will require greater knowledge and they'll handle more work. But that means the hand-holding by the driver team becomes less necessary and less effective, since DX12 won't hide things away like earlier APIs. The implication is that the driver teams from AMD and Nvidia will only be doing the bare minimum to support DX12 renderers.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Robenger*
> 
> That's one way of interpreting that....


Feel free to share the other way(s).


----------



## aDyerSituation

Not sure how to take this. I have a month or two to decide between at least one Fury X/980 Ti/Titan X.

Hopefully more reviews like this come out in the meantime. Really rooting for AMD to catch up with DX12 but not holding my breath..


----------



## Ha-Nocri

Think the best example is Flurry X vs 980ti. Even @1080p they are neck-to-neck under dx12.


----------



## CasualCat

Quote:


> Originally Posted by *geoxile*
> 
> Well that's really the point of DX12. It reduces CPU overhead and distributes the CPU workload across multiple threads. At the most basic level it's designed to alleviate any CPU bottlenecks, not improve GPU performance.
> 
> I'm sure once we start to see renderers implementing Async compute we will see GPU-side performance improvements.


I was just under the impression the bottlenecks were holding back the GPUs more. Maybe the problem is the testers are using higher end GPUs that have the ability to "brute force" their way through despite overhead? Wonder what results would look like for some of the lower end cards?


----------



## Robenger

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Feel free to share the other way(s).


That Nvidia put all of their eggs in one basket called DX11. Now that developers have more control they need less of Nvidia and Game(sDon't)work.


----------



## chuy409

Quote:


> Originally Posted by *Robenger*
> 
> That Nvidia put all of their eggs in one basket called DX11. Now that developers have more control they need less of Nvidia and Game(sDon't)work.


I prefer to call it Gamerekts.


----------



## Kand

Quote:


> Originally Posted by *Robenger*
> 
> That Nvidia put all of their eggs in one basket called DX11. Now that developers have more control they need less of Nvidia and Game(sDon't)work.


Current gen consoles will never be able to support dx12.

Look at how slow dx11 adoption was and is. I expect the same rate for dx12.

Nothing to see here, folks. Move along.


----------



## Ganf

On the one hand I want a pair of 980 TI Lightnings, because after this 290x I now have an extreme fetish for the brand....

On the other hand, my scales of nerdish justice we shall call them, I want to pick up at least one Fury X to experiment with.

On the third hand (don't ask where I keep it) DX12 is supposed to be introducing cross-platform GPU compatibility, and my scientific curiosity itches furiously whenever that topic is mentioned....

I'm already waiting for the 980 TI's, what're the odds we'll see some cross-platform tests before those cards drop so that I can decide whether to pick up one of each or not?


----------



## ENTERPRISE

Interesting read but I will hold off getting a full opinion until there is another DX12 Benchmark available as a comparison between the two to see if there is an average. Obviously it would see that DX12 benefits AMD a fair bit in the fact it is where it should have been all a long....fairly neck and neck. One has to assume that AMD's drivers/integration with the DX11 API was bad. We all knew it was bad.....but I did not think it was to that extreme.


----------



## Forceman

Quote:


> Originally Posted by *Robenger*
> 
> That Nvidia put all of their eggs in one basket called DX11. Now that developers have more control they need less of Nvidia and Game(sDon't)work.


Those resources Nvidia were allocating to DX11 development don't just go away, so even if that was the case I think you can reasonably expect Nvidia to start putting some of their DX11 effort into DX12 development. And let's not get too carried away based on one benchmark from one developer.


----------



## Ganf

Quote:


> Originally Posted by *ENTERPRISE*
> 
> Interesting read but I will hold off getting a full opinion until there is another DX12 Benchmark available as a comparison between the two to see if there is an average. Obviously it would see that DX12 benefits AMD a fair bit in the fact it is where it should have been all a long....fairly neck and neck. One has to assume that AMD's drivers/integration with the DX11 API was bad. We all knew it was bad.....but I did not think it was to that extreme.


I want to know what was bad about them. What changed between 11 and 12 that AMD is suddenly competent? Are the API's that fundamentally different and AMD just had the right staff at hand to tackle DX12 appropriately? Is AMD's hardware fundamentally borked on an architecture scale and there's nothing they can do to work around it? Did AMD just not care?

There's a story there, I'd like to hear it.


----------



## Themisseble

I still wonder how well will Fx 8350 do against i5 6600K in DX12.


----------



## Superplush

Quote:


> Originally Posted by *Ganf*
> 
> I want to know what was bad about them. What changed between 11 and 12 that AMD is suddenly competent? Are the API's that fundamentally different and AMD just had the right staff at hand to tackle DX12 appropriately? Is AMD's hardware fundamentally borked on an architecture scale and there's nothing they can do to work around it? Did AMD just not care?
> 
> There's a story there, I'd like to hear it.


Well just off the bat, something you might've known. AMD were working with intel over DX12, almost partnered with them to get it to work so I guess AMD had insider knowlage as to what was happening at least.

personally I'm saddened that Intel hold DX12 to ransom with Win10


----------



## bluewr

Quote:


> Originally Posted by *Kand*
> 
> Current gen consoles will never be able to support dx12.
> 
> Look at how slow dx11 adoption was and is. I expect the same rate for dx12.
> 
> Nothing to see here, folks. Move along.


Microsoft is already expected to put DX12 onto the Xbone, PS4 uses a form of Open GL, but Sony have their own optimization method.


----------



## chuy409

Quote:


> Originally Posted by *Themisseble*
> 
> I still wonder how well will Fx 8350 do against i5 6600K in DX12.


http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark/Results-Heavy

Not so well...even the i3 is beating it let alone a 6600k


----------



## geoxile

Quote:


> Originally Posted by *bluewr*
> 
> Microsoft is already expected to put DX12 onto the Xbone, PS4 uses a form of Open GL, but Sony have their own optimization method.


I've never heard of GNM being a variation of OpenGL.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Forceman*
> 
> Those resources Nvidia were allocating to DX11 development don't just go away, so even if that was the case I think you can reasonably expect Nvidia to start putting some of their DX11 effort into DX12 development. And let's not get too carried away based on one benchmark from one developer.


DX12 is too new. Maybe in 2 to 3 generation we might get GPUs fast enough to push DX12 to the limit. DX11 was never really a problem for AMD with HD 5870 but as GPUs got faster it started to show off at lower resolutions. Also them working in Mantle made it so their DX11 performance was really left behind. I think they knew there was no point on spending R&D on DX11 with DX12 doing the work for them.


----------



## Serandur

Quote:


> Originally Posted by *Ganf*
> 
> I want to know what was bad about them. What changed between 11 and 12 that AMD is suddenly competent? Are the API's that fundamentally different and AMD just had the right staff at hand to tackle DX12 appropriately? Is AMD's hardware fundamentally borked on an architecture scale and there's nothing they can do to work around it? Did AMD just not care?
> 
> There's a story there, I'd like to hear it.


I want to know too.

Regardless of the results, DX12's presence doesn't alleviate problems with all existing DX11 games and I hope AMD haven't given up entirely on catching up to Nvidia in that area.


----------



## PontiacGTX

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.
> 
> Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


AMD already knew how to improve their drivers for a low level API, a thing that nvidia is trying to understand.
Quote:


> Originally Posted by *PostalTwinkie*
> 
> Basically AMD sucks at developing drivers within the DX 11 environment; either by lack of ability or finances to fund it. Where as Nvidia is capable of providing, for whatever reason, the resources to really maximize DX 11.
> 
> EDIT:
> 
> Not sure anyone should be surprised by this.


Or Nvidia is bad doing Directx 12 Drivers








indeed AMD didnt optimize the Dx11 drivers because they plan to push DIrectx 12. it is about better performance on new A+ games.but in the meanwhile they can optimize the drivers to match Maxwel
Quote:


> Originally Posted by *Kand*
> 
> Current gen consoles will never be able to support dx12.
> 
> Look at how slow dx11 adoption was and is. I expect the same rate for dx12.
> 
> Nothing to see here, folks. Move along.


Xbox one console is able to.


----------



## Ganf

Quote:


> Originally Posted by *ZealotKi11er*
> 
> DX12 is too new. Maybe in 2 to 3 generation we might get GPUs fast enough to push DX12 to the limit. DX11 was never really a problem for AMD with HD 5870 but as GPUs got faster it started to show off at lower resolutions. Also them working in Mantle made it so their DX11 performance was really left behind. I think they knew there was no point on spending R&D on DX11 with DX12 doing the work for them.


No point or no budget. It's been clear for a while that all of AMD's eggs are in the 2016 basket...

2016 may end up being the year of the Omelet....


----------



## ZealotKi11er

Quote:


> Originally Posted by *Serandur*
> 
> I want to know too.
> 
> Regardless of the results, DX12's presence doesn't alleviate problems with all existing DX11 games and I hope AMD haven't given up entirely on catching up to Nvidia in that area.


Forget DX11 from AMD. Yes AMD does worse in DX11 in in actual games its far less for them to even bother at this point.


----------



## Serandur

CPU scaling from PC Perspective:


----------



## KyadCK

Quote:


> Originally Posted by *Superplush*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Ganf*
> 
> I want to know what was bad about them. What changed between 11 and 12 that AMD is suddenly competent? Are the API's that fundamentally different and AMD just had the right staff at hand to tackle DX12 appropriately? Is AMD's hardware fundamentally borked on an architecture scale and there's nothing they can do to work around it? Did AMD just not care?
> 
> There's a story there, I'd like to hear it.
> 
> 
> 
> Well just off the bat, something you might've known. AMD were working with *intel* over DX12, almost partnered with them to get it to work so I guess AMD had insider knowlage as to what was happening at least.
> 
> personally I'm saddened that *Intel* hold DX12 to ransom with Win10
Click to expand...

You want to give that another go?


----------



## Heuchler

Quote:


> Originally Posted by *chuy409*
> 
> http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark/Results-Heavy
> 
> Not so well...even the i3 is beating it let alone a 6600k


Quote:


> Good news for AMD
> In the Ashes of the Singularity benchmark, clock speeds have far more impact than they did in 3DMark's feature test, while Hyper-Threading has a minimal performance impact. Oxide developers told me the reason they suspect Hyper-Threading didn't knock it out of the ball park on their new game engine is the shared L1 cache design of Intel's CPUs.
> 
> With Hyper-Threading a yawner and high-clock speeds a big bonus, AMD's budget-priced chips are pretty much set up as the dark horse CPU to for DirectX 12-if this single benchmark test is indicative of what we can expect to see from DirectX 12 overall, of course.
> 
> In fact, Oxide's developers said their internal testing showed AMD's APUs and CPUs having an edge since they give you more cores than Intel for the money. AMD's design also doesn't share L1 cache the way Intel's chips do.
> 
> AMD gives you more cores for your money
> The numbers really add up when you factor in the cost-per-core from AMD. An AMD FX-8350 gives you 8-cores (with some shared resources) for $165. That doesn't even net you a quad-core from Intel CPUs. The cheapest quad-core from Intel is the 3.2GHz Core i5-4460 for $180-and that quad-core Haswell CPU doesn't even have Hyper-Threading turned on. Nor can it be overclocked.


Oxide developers told me their internal testing with the Ashes of the Singularity benchmark showed 8-core AMD CPUs giving even the high-end Core i7-4770K a tough time. [PCWorld]
Windows 10's DirectX 12 graphics performance tested: More CPU cores, more oomph





Nvidia's lab ran the new benchmark on a six-core Intel Core i7-5820K @ 2.0 GHz



Nvidia saw very hefty performance increases of up to 82 percent on DirectX 12 over DirectX 11 on a multi-core low-clock speed chip. The reason? At lower clock speeds, DirectX 11's inability to use more of the cores on the Core i7-5820K held performance back since it's mostly single-threaded. DirectX 12 spreads the load across those cores so that even at low clock speeds, you see a significant performance increase.


----------



## PontiacGTX

Quote:


> Originally Posted by *Heuchler*
> 
> Oxide developers told me their internal testing with the Ashes of the Singularity benchmark showed 8-core AMD CPUs giving even the high-end Core i7-4770K a tough time. [PCWorld]
> Windows 10's DirectX 12 graphics performance tested: More CPU cores, more oomph
> 
> 
> 
> 
> 
> Nvidia's lab ran the new benchmark on a six-core Intel Core i7-5820K @ 2.0 GHz
> 
> 
> 
> Nvidia saw very hefty performance increases of up to 82 percent on DirectX 12 over DirectX 11 on a multi-core low-clock speed chip. The reason? At lower clock speeds, DirectX 11's inability to use more of the cores on the Core i7-5820K held performance back since it's mostly single-threaded. DirectX 12 spreads the load across those cores so that even at low clock speeds, you see a significant performance increase.


the first chart comes from 3dmark, which isnt real world measurement of a game..bu nvidia claim that a game isnt a real world example.


----------



## Twist86

Shame windows 10 is what it is....looks like DX12 might not be as much fluff as DX10 was.
Quote:


> Originally Posted by *Superplush*
> 
> Well just off the bat, something you might've known. AMD were working with intel over DX12, almost partnered with them to get it to work so I guess AMD had insider knowlage as to what was happening at least.
> 
> personally I'm saddened that Intel hold DX12 to ransom with Win10


Yeah no kidding, I would pay $50 to get it on Windows 7


----------



## Xuper

Ok this benchmark shows massive boost over DX12 on both AMD and Nvidia , Question : What is this all noise about Nvidia ? Why did Nvidia say "Not Good" ? i don't see any problem.they are both match , So what's Problem ? does that mean Nvidia didn't expect massive boost for AMD ? Really?


----------



## jmcosta

FX cpus forever forgotten lol

i hope that amd puts those bulldozers aside and make something new and enough to compete dual cores


----------



## chuy409

Quote:


> Originally Posted by *Xuper*
> 
> Ok this benchmark show massive boost over DX11 on both AMD and Nvidia , Question : What is this all noise about Nvidia ? Why did Nvidia say "Not Good" ? i don't see any problem.they are both match , So what's Problem ? does that mean Nvidia didn't expect massive boost for AMD ? Really?


graphs show the massive benefit amd cards had despite being false. the 390x and 980 were never that far apart. 30 fps apart? no way, not a single dx11 game today shows that. dx12 results reflect how these two gpus perform in modern dx11 games. So in essence, dx12 didnt bring any benefits to either gpu camp or cpu camp, just a fps boost. Graphs definitely look misleading to me.


----------



## Heuchler

From Comupter Base review i7-4770K and A10-7870K ASHES OF THE SINGULARITY
irectX-12-Benchmarks zeigen Unterschiede zwischen AMD und Nvidia

High End GPUs [600+ USD] should be paired with expensive components but most gamers don't have these cards.
But DX12 looks good for $150 MSRP APU with what are more common Gamer cards R9 280X/HD 7970 and GTX 770.

Will see how well this game scales with Cores and Megaherz along with different Architecture. Still an Alpha is an Alpha.


----------



## Themisseble

Quote:


> Originally Posted by *Serandur*
> 
> CPU scaling from PC Perspective:


Kinda hard to believe these benchmarks...
Even eurogamer shows that i3 Dx12 does better than i7 Dx11 on GTX 980.
Also by testing mantle in DX12 I am sure that i3 will be slower than FX 6300 by at least 20-30%.


----------



## pengs

Quote:


> Originally Posted by *Kand*
> 
> Current gen consoles will never be able to support dx12.
> 
> Look at how slow dx11 adoption was and is. I expect the same rate for dx12.
> 
> Nothing to see here, folks. Move along.


Dunno, the next gens are all capable of DX12.

And then there is Windows 10 which is creeping in close to W8's market share already and the fact that it's free. It's a far cry from purchasing a two hundred dollar version of Windows 7 when it dropped in 2009 to use a new version of DirectX, having little hardware to support it or legs to carry it as the consoles were DX9 driven.

Not the same.


----------



## chuy409

Quote:


> Originally Posted by *chuy409*
> 
> graphs show the massive benefit amd cards had despite being false. the 390x and 980 were never that far apart. 30 fps apart? no way, not a single dx11 game today shows that. dx12 results reflect how these two gpus perform in modern dx11 games. So in essence, dx12 didnt bring any benefits to either gpu camp or cpu camp, just a fps boost.


Quote:


> Originally Posted by *Heuchler*
> 
> 
> 
> From Comupter Base review i7-4770K and A10-7870K ASHES OF THE SINGULARITY
> irectX-12-Benchmarks zeigen Unterschiede zwischen AMD und Nvidia
> 
> High End GPUs [600+ USD] should be paired with expensive components but most gamers don't have these cards.
> But DX12 looks good for $150 MSRP APU with what are more common Gamer cards R9 280X/HD 7970 and GTX 770.
> 
> Will see how well this game scales with Cores and Megaherz along with different Architecture. Still an Alpha is an Alpha.


I wouldnt just stop at the 7870k. Im still waiting for g3258 oc results. Dx12 suppose to take away cpu overhead. If its done right, the g3258 might become a legendary cpu like the 2500k


----------



## Ganf

Quote:


> Originally Posted by *Twist86*
> 
> Shame windows 10 is what it is....looks like DX12 might not be as much fluff as DX10 was.


Hopefully won't be long before someone makes a tool that blocks all of the spyware. There's gotta be at least a dozen groups working on it already, it's not a small issue.


----------



## SpeedyVT

So what you're saying is that an OLD AMD is throwing it's weight around with a new NVidia GPU on DX12. Seems about right.


----------



## Kand

Quote:


> Originally Posted by *bluewr*
> 
> Microsoft is already expected to put DX12 onto the Xbone, PS4 uses a form of Open GL, but Sony have their own optimization method.


You want to believe but its limited by the hardware. These are still just glorified 7870s with GCN 1.0.


----------



## pengs

Quote:


> Originally Posted by *PontiacGTX*
> 
> the first chart comes from 3dmark, which isnt real world measurement of a game..bu nvidia claim that a game isnt a real world example.


Yep, from the PCPer review. NV has everything and nothing to lose. GCN is planted into the next gens which means that they've lost the ability to have developers design engines and games around their architecture. And there's also that Mantle thing... which looks a lot like that DX12 thing.

Must be what fear smells like


----------



## iLeakStuff

LOL RIP NVIDIA









I think this review from ExtremeTech was the best. It tests the high end from both camps.

We went from 4K with everything maxed in *DX11* with Nvidia beating the crap out of AMD


To 4K with everything maxed in *DX12* with Fury X suddenly beating 980Ti by 10%










Same story with 1080P with everything maxed out
DX11


DX12


I hope for Nvidia`s sake that this is the result of the developer working closely with AMD and Nvidia still have to share code with the developer to get the Nvidia cards to do great in the game. Otherwise 2016 might be a bad year for Nvidia


----------



## Themisseble

Quote:


> Originally Posted by *bluewr*
> 
> Microsoft is already expected to put DX12 onto the Xbone, PS4 uses a form of Open GL, but Sony have their own optimization method.


I hope that PS4 would use VULKAN some day...


----------



## Kand

Quote:


> Originally Posted by *iLeakStuff*
> 
> I hope for Nvidia`s sake that this is the result of the developer working closely with AMD


It is.


----------



## Kuivamaa

Quote:


> Originally Posted by *p4inkill3r*
> 
> A relevant blog post from the game's creators,Oxide: The birth of a new API
> 
> PCPerspective
> 
> ExtremeTech
> 
> EuroGamer
> 
> Legit Reviews
> 
> Computerbase.de


I was expecting good gains with DX12 and I am still impressed.


----------



## mtcn77

Nvidias drivers, even excluding MSAA, got slower according to this test. At 'normal' settings, '1080p';


Spoiler: Computerbase.de








GTX 770 is 10% faster in Directx 11,
_GTX 960_ is 12% faster in Direct 11,
*GTX 970* is 15% faster in Directx 11.
What ever it is you are doing Nvidia, turn back immediately!


----------



## Ganf

Quote:


> Originally Posted by *mtcn77*
> 
> Nvidias drivers, even excluding MSAA, got slower according to this test. At 'normal' settings, '1080p';
> 
> 
> Spoiler: Computerbase.de
> 
> 
> 
> 
> 
> 
> 
> 
> GTX 770 is 10% faster in Directx 11,
> _GTX 960_ is 12% faster in Direct 11,
> *GTX 970* is 15% faster in Directx 11.
> What ever it is you are doing Nvidia, turn back immediately!


Looks like they're polishing those flagships up for a nice parade to me. Shame about the other 99% of their market.


----------



## Themisseble

Quote:


> Originally Posted by *chuy409*
> 
> I wouldnt just stop at the 7870k. Im still waiting for g3258 oc results. Dx12 suppose to take away cpu overhead. If its done right, the g3258 might become a legendary cpu like the 2500k


Nah... as you can see quad core i5 at 1.7GHz is as fast as dual core at 3.9Ghz.
http://www.pcworld.com/article/2971612/software-games/windows-10s-radical-directx-12-graphics-tech-tested-more-cpu-cores-more-performance.html

This might help a lot to mobile CPU.. basically you can now spend more power for GPU.


----------



## BinaryDemon

To me it just looks like the Fury X is massively under-performing on DX11.


----------



## iLeakStuff

Quote:


> Originally Posted by *BinaryDemon*
> 
> To me it just looks like the Fury X is massively under-performing on DX11.


980Ti is underperforming in DX12

Fury X is underperforming in DX11

Both are working like they should in both DX11 and DX12

Place your bets...


----------



## Cyro999

Quote:


> Those 390x dx11 results arent comparable of modern dx11 games. When have you seen a 390x being 30fps slower than a 980 in modern games? None.


Wrong, actually. There are a range of games where nvidia GPU gets ~1.5x more FPS when CPU bound in dx11. WoW is the most popular example, but there are more.

This has been a huge deal since April of last year when the difference went from being a bit in Nvidia's favor to a lot in Nvidia's favor and AMD didn't respond properly to it.
Quote:


> I hope for Nvidia`s sake that this is the result of the developer working closely with AMD and Nvidia still have to share code with the developer to get the Nvidia cards to do great in the game. Otherwise 2016 might be a bad year for Nvidia biggrin.gif


A 10% performance difference when CPU bound means quite little; as said above, a ~1.5x performance gap has existed for over a year in some circumstances and a lot of people and reviewers don't even notice it because they don't test in ways that expose it. This does NOT say anything about the performance of either GPU - just the capabilities of the software when working with a CPU of any given strength. If you're entirely GPU bound, there would be no performance difference shown.

Here's a good pic showing how far behind AMD is for dx11 performance, relative to how far ahead they are on dx12.










The +% is advantage for Nvidia - mostly on the green side (dx11)

negative %'s is how much AMD is ahead, mostly on the blue (dx12)

The dx11 advantage for Nvidia removed all AMD GPU's from the table when i was upgrading this gen. There was no way i could buy one when my most played game was CPU bound most of the time and showing 1.5x FPS to Nvidia, it was on more games too and AMD did not adress it at all.

In the current state of dx12, if AMD got 5-10% ahead, good for them - it'd be nice, but unlike the dx11 difference it would probably not force people to buy only AMD hardware for competitive performance. 5 or 10% just doesn't have the power that +50% FPS does.


----------



## MonarchX

Some of these reviews expect NVidia performance to say the same. NVidia claims that this benchmark is an alpha, but the developer dismisses that statement. This can mean several things, like:

1. NVidia's drivers are highly optimized in general and are squeezing 90% of what their current high-end cards can do and their DirectX 11 drivers very efficiently utilize CPU's. That results in a somewhat minor improvement when DirectX 12 is used.
2. NVidia's current DirectX 12 drivers are sub-par.
3. This benchmark really IS in alpha/beta state.
4. AMD's DirectX 11 drivers were un-optimized crap, while DirectX 12 drivers were superb
5. AMD cards are actually better performers , but were hammered down by DirectX 11's inability to efficiently utilize CPU's (or bad drivers like I said earlier)


----------



## ZealotKi11er

Quote:


> Originally Posted by *Cyro999*
> 
> Wrong, actually. There are a range of games where nvidia GPU gets ~1.5x more FPS when CPU bound in dx11. WoW is the most popular example, but there are more.
> 
> This has been a huge deal since April of last year when the difference went from being a bit in Nvidia's favor to a lot in Nvidia's favor and AMD didn't respond properly to it.
> A 10% performance difference when CPU bound means quite little; as said above, a ~1.5x performance gap has existed for over a year in some circumstances and a lot of people and reviewers don't even notice it because they don't test in ways that expose it. This does NOT say anything about the performance of either GPU - just the capabilities of the software when working with a CPU of any given strength. If you're entirely GPU bound, there would be no performance difference shown.
> 
> Here's a good pic showing how far behind AMD is for dx11 performance, relative to how far ahead they are on dx12.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The +% is advantage for Nvidia - mostly on the green side (dx11)
> 
> negative %'s is how much AMD is ahead, mostly on the blue (dx12)
> 
> The dx11 advantage for Nvidia removed all AMD GPU's from the table when i was upgrading this gen. There was no way i could buy one when my most played game was CPU bound most of the time and showing 1.5x FPS to Nvidia, it was on more games too and AMD did not adress it at all.
> 
> In the current state of dx12, if AMD got 5-10% ahead, good for them - it'd be nice, but unlike the dx11 difference it would probably not force people to buy only AMD hardware for competitive performance. 5 or 10% just doesn't have the power that +50% FPS does.


People are not going to buy new cards in the next year to play DX11 games.


----------



## chuy409

Quote:


> Originally Posted by *BinaryDemon*
> 
> To me it just looks like the Fury X is massively under-performing on DX11.


all amd cards are MASSIVELY under-performing in dx11


----------



## dubldwn

Not sure where people are getting the 980Ti is only showing "marginal gains" from DX12 in this benchmark. It's showing marginal losses. It's slower. Which isn't really odd. What's odd are the massive gains for the Fury.


----------



## NuclearPeace

Quote:


> Originally Posted by *ZealotKi11er*
> 
> People are not going to buy new cards in the next year to play DX11 games.


Speak for yourself. DX12 at this point is still nothing more than a proof of concept. Its going to to be hard to make definitive conclusions when the sample size is this low and the technology was this new. Remember how people swore that the 680 was faster than the 7970 and now the 780 is only a little better than a 280x?

We all know how slow DirectX adaptation is. Look at how many games are still used today that rely on DirectX 9 or 10. Not many games even today are making full use of DX11; BF4 is the only title that I can think of that does. All of this new tech is expensive to use and only AAA developers at this point can be expected to use Direct X 12. This is why DirectX 11.3 was created; it was to offer a new Direct X 11 that smaller firms can use without having to completely retool and retrain people.

Eventually I am probably going to be buying a variant of Hawaii Pro (R9 290 or 390) and DX11 performance and overhead is still a big concern for me because of my i3 4370. Direct X 12 is still a long way from mainstream support. Plus, all of my current DirectX 11 games or older wont suddenly convert to Direct X 12. It seems like AMD has completely left Direct X 11 driver performance on the side to focus all of their energy on DirectX12. I want to get the most out of the next AMD GPU I get with my current titles.


----------



## Ganf

Quote:


> Originally Posted by *chuy409*
> 
> all amd cards are MASSIVELY under-performing in dx11


To be honest these benches are a bit lopsided on AMD's increases. They were working with AMD, AMD probably sold them on how many GCN cards there are in the wild as opposed to non-GCN, and convinced them to focus on DX12 and not worry about poor DX11 optimization.


----------



## BiG StroOnZ

NVIDIA says that "We Don't Believe Ashes of the Singularity Benchmark To Be A Good Indicator Of DX12 Performance"
Quote:


> This title is in an early Alpha stage according to the creator. It's hard to say what is going on with alpha software. It is still being finished and optimized. It still has bugs, such as the one that Oxide found where there is an issue on their side which negatively effects DX12 performance when MSAA is used. They are hoping to have a fix on their side shortly.
> 
> We think the game looks intriguing, but an alpha benchmark has limited usefulness. It will tell you how your system runs a series of preselected scenes from the alpha version of Ashes of Singularity. We do not believe it is a good indicator of overall DirectX 12 gaming performance.
> 
> We've worked closely with Microsoft for years on DirectX 12 and have powered every major DirectX 12 public demo they have shown. We have the upmost confidence in DX12, our DX12 drivers and our architecture's ability to perform in DX12.
> 
> When accurate DX12 metrics arrive, the story will be the same as it was for DX11.


Quote:


> It should not be considered that because the game is not yet publically out, it's not a legitimate test. While there are still optimizations to be had, Ashes of the Singularity in its pre-beta stage is as - or more - optimized as most released games. What's the point of optimizing code 6 months after a title is released, after all? Certainly, things will change a bit until release. But PC games with digital updates are always changing, we certainly won't hold back from making big changes post launch if we feel it makes the game better!


----------



## mtcn77

Game optimisation is game developers' field.
Quote:


> Originally Posted by *NuclearPeace*
> 
> Speak for yourself. DX12 at this point is still nothing more than a proof of concept. Its going to to be hard to make definitive conclusions when the sample size is this low and the technology was this new. Remember how people swore that the 680 was faster than the 7970 and now the 780 is only a little better than a 280x?


Yet, the trend is the same.


----------



## p4inkill3r

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> NVIDIA says that "We Don't Believe Ashes of the Singularity Benchmark To Be A Good Indicator Of DX12 Performance"


It sounds like Oxide is responsing to nvidia's complaints in their blog post:
Quote:


> "There are incorrect statements regarding issues with MSAA. Specifically, that the application has a bug in it which precludes the validity of the test. We assure everyone that is absolutely not the case. Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.
> 
> "So what is going on then? Our analysis indicates that the any D3D12 problems are quite mundane. New API, new drivers. Some optimizations that that the drivers are doing in DX11 just aren't working in DX12 yet. Oxide believes it has identified some of the issues with MSAA and is working to implement work arounds on our code. This in no way affects the validity of a DX12 to DX12 test, as the same exact work load gets sent to everyone's GPUs. This type of optimizations is just the nature of brand new APIs with immature drivers."


----------



## mtcn77

Quote:


> Originally Posted by *Ganf*
> 
> To be honest these benches are a bit lopsided on AMD's increases. They were working with AMD, AMD probably sold them on how many GCN cards there are in the wild as opposed to non-GCN, and convinced them to focus on DX12 and not worry about poor DX11 optimization.


And Nvidia was ahead in StarDock's own demo. Looks like Stardock is one of their partners. It is amazing how much can change in 6 months.


Spoiler: Warning: Spoiler!


----------



## iLeakStuff

Oooops. Maybe AMD is truly better with DX12 after all....

I`m a bit conflicted here


----------



## chuy409

Quote:


> Originally Posted by *NuclearPeace*
> 
> Speak for yourself. DX12 at this point is still nothing more than a proof of concept. Its going to to be hard to make definitive conclusions when the sample size is this low and the technology was this new. Remember how people swore that the 680 was faster than the 7970 and now the 780 is only a little better than a 280x?
> .


Exactly. Why does amd wait that long to bring out the potential? Yes, the 280x is slightly behind the 780 but so what? They needed the FULL power of the card at launch, not a year later when nvidia dropped the 900 series and mopped the floor by then.


----------



## PontiacGTX

Quote:


> Originally Posted by *Kand*
> 
> You want to believe but its limited by the hardware. These are still just glorified 7870s with GCN 1.0.


they are gcn1.1 therfore has FL12_0


----------



## Cyro999

Quote:


> Originally Posted by *ZealotKi11er*
> 
> People are not going to buy new cards in the next year to play DX11 games.


They absolutely will, but even if they didn't then it wouldn't matter. People bought plenty of cards in 2013, 2014, 2015 to play dx11 games and AMD failed them in that way. That failure will be less relevant in 2016 and 2017 but still there.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *p4inkill3r*
> 
> It sounds like Oxide is responsing to nvidia's complaints in their blog post:


There's also this from the Publisher of the game:
Quote:


> @NVIDIAGeForce @nvidia tread lightly. @OxideGames.
> 
> - Brad Wardell (@draginol) August 16, 2015


Seems pretty immature.


----------



## iLeakStuff

There is a *ton* of AMD mention in their twitter page...
https://twitter.com/oxidegames

Fishy...

EDIT: they are very old. Nevermind


----------



## Smokey the Bear

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> There's also this from the Publisher of the game:
> Seems pretty immature.


Extremely, and I don't even see the problem with Nvidia's original comments. They seemed accurate.

And yes, their twitter is littered in AMD stuff.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *iLeakStuff*
> 
> There is a *ton* of AMD mention in their twitter page...
> https://twitter.com/oxidegames
> 
> Fishy...
> 
> EDIT: they are very old. Nevermind


No, they are definitely affiliated with AMD in some way:



They have tweets from a couple months ago referencing AMD CPUs and Mantle.


----------



## p4inkill3r

And so it starts.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> There's also this from the Publisher of the game:
> Seems pretty immature.


Probably some drama there. These guys were very adamant about being one of the first DX12 titles on the market, and Nvidia seems to be taking their sweet time getting into it. I wouldn't be surprised if Nvidia sniggered a bit when they asked for help with DX12, tried to push Gameworks on them and the developer felt snubbed because they are using DX12 so heavily in their marketing.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Smokey the Bear*
> 
> Extremely, and I don't even see the problem with Nvidia's original comments. They seemed accurate.
> 
> And yes, their twitter is littered in AMD stuff.


What NVIDIA said seems pretty reasonable.

Their whole twitter is AMD, no doubt about that.
Quote:


> Originally Posted by *Ganf*
> 
> Probably some drama there. These guys were very adamant about being one of the first DX12 titles on the market, and Nvidia seems to be taking their sweet time getting into it. I wouldn't be surprised if Nvidia sniggered a bit when they asked for help with DX12, tried to push Gameworks on them and the developer felt snubbed because they are using DX12 so heavily in their marketing.


Yeah it doesn't help that the next game I know of with DX12 support is also an AMD Gaming Evolved title (Deus Ex: Mankind Divided).

Maybe AMD is taking their own "Gameworks" route here.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> What NVIDIA said seems pretty reasonable.
> 
> Their whole twitter is AMD, no doubt about that.
> Yeah it doesn't help that the next game I know of with DX12 support is also an AMD Gaming Evolved title (Deus Ex: Mankind Divided).
> 
> Maybe AMD is taking their own "Gameworks" route here.


Of course they are, they have to. Gameworks is turning out to be too successful for them to ignore they have to push back on that field.

I want to say that the first card-neutral DX12 game we'll see is Fable Legends but given AMD's involvement with Microsoft I can't guarantee that.

I've been saying for months that AMD wasn't losing the API war against Gameworks. People are about to see just how true that is over the next year, and it isn't going to be a pleasant year for us gamers. AMD's budget is now too tight to live up to their hype, so I wouldn't be surprised if all of these AMD supported DX12 titles ended up leaving a sour taste in the community's mouth for the new API when AMD's support is lacking.


----------



## KyadCK

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Smokey the Bear*
> 
> Extremely, and I don't even see the problem with Nvidia's original comments. They seemed accurate.
> 
> And yes, their twitter is littered in AMD stuff.
> 
> 
> 
> What NVIDIA said seems pretty reasonable.
> 
> Their whole twitter is AMD, no doubt about that.
> Quote:
> 
> 
> 
> Originally Posted by *Ganf*
> 
> Probably some drama there. These guys were very adamant about being one of the first DX12 titles on the market, and Nvidia seems to be taking their sweet time getting into it. I wouldn't be surprised if Nvidia sniggered a bit when they asked for help with DX12, tried to push Gameworks on them and the developer felt snubbed because they are using DX12 so heavily in their marketing.
> 
> Click to expand...
> 
> Yeah it doesn't help that the next game I know of with DX12 support is also an AMD Gaming Evolved title (Deus Ex: Mankind Divided).
> 
> Maybe AMD is taking their own "Gameworks" route here.
Click to expand...

Their whole twitter is 2014. They literally do not have a tweet from 2015.

On top of that, the majority of it is from their Mantle demo, StarSwarm, again not from 2015. What would nVidia have to do with Mantle?


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> Of course they are, they have to. Gameworks is turning out to be too successful for them to ignore they have to push back on that field.
> 
> I want to say that the first card-neutral DX12 game we'll see is Fable Legends but given AMD's involvement with Microsoft I can't guarantee that.
> 
> I've been saying for months that AMD wasn't losing the API war against Gameworks. People are about to see just how true that is over the next year, and it isn't going to be a pleasant year for us gamers. AMD's budget is now too tight to live up to their hype, so I wouldn't be surprised if all of these AMD supported DX12 titles ended up leaving a sour taste in the community's mouth for the new API when AMD's support is lacking.


I agree, looks like its going to be Gamesworks vs AMD Gaming Evolved Direct X 12 Supported Games for 2016.
Quote:


> Originally Posted by *KyadCK*
> 
> Their whole twitter is 2014. They literally do not have a tweet from 2015.
> 
> On top of that, the majority of it is from their Mantle demo, StarSwarm, again not from 2015. What would nVidia have to do with Mantle?


So if your whole twitter is littered with AMD, even if it is from a year ago. Then you still have no affiliation with AMD a year later, because there is nothing recent? Not to mention DX 12 benchmarks launch from your game and it shows a sudden miraculous improvement in AMD's cards performance. NVIDIA comes out directly and says this isn't accurate representation of performance. You see no connection?

Looks at your sig rigs, OK never mind don't answer that.


----------



## Ganf

Quote:


> Originally Posted by *KyadCK*
> 
> Their whole twitter is 2014. They literally do not have a tweet from 2015.
> 
> On top of that, the majority of it is from their Mantle demo, StarSwarm, again not from 2015. What would nVidia have to do with Mantle?


You're looking at the wrong twitter account. A little critical thinking would've also told you that twitter isn't the end-all be-all of social marketing also and to check their webpage and facebook.

https://twitter.com/ashesgame


----------



## p4inkill3r

Quote:


> Originally Posted by *Ganf*
> 
> You're looking at the wrong twitter account. A little critical thinking would've also told you that twitter isn't the end-all be-all of social marketing also and to check their webpage and facebook.
> 
> https://twitter.com/ashesgame


AMD is orchestrating a huge coup and Oxide is in on it, but they just so happened to forget to clear out their incriminating tweets.

Makes sense to me.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> You're looking at the wrong twitter account. A little critical thinking would've also told you that twitter isn't the end-all be-all of social marketing also and to check their webpage and facebook.
> 
> https://twitter.com/ashesgame


Good find, here's a dead giveaway "Our Friends:"



From 5 months ago.


----------



## OneB1t

for some reason only 4 threads are used for this benchmark which is kind of sad hope that they will allow better scaling in release version

(thats why 6700 is faster than 5960)


----------



## p4inkill3r

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Good find, here's a dead giveaway "Our Friends:"
> 
> 
> 
> From 5 months ago.


"Our enemies" just doesn't have the same sound.


----------



## Ganf

Quote:


> Originally Posted by *p4inkill3r*
> 
> AMD is orchestrating a huge coup and Oxide is in on it, but they just so happened to forget to clear out their incriminating tweets.
> 
> Makes sense to me.


Not about a coup or anything dramatic, it's just the same crap Nvidia pulls with helping out a developer that otherwise wouldn't get the time of day and then allowing them to hype up the cross-marketing and branding on their own. AOTS was even part of AMD's demonstration at E3, their affiliation with AMD isn't a mystery.

Edit: Nevermind, I'm the one misreading crap. I apologize.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *p4inkill3r*
> 
> "Our enemies" just doesn't have the same sound.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> I think you're a little bit confused about the angle I'm taking, might be a good idea to take a deep breath, stop looking at my sig rig, and check what I've been posting again.


When I brought up a sig rig, I wasn't responding to you. Best go back and read what I was saying and who I was saying it to.


----------



## KyadCK

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Ganf*
> 
> Of course they are, they have to. Gameworks is turning out to be too successful for them to ignore they have to push back on that field.
> 
> I want to say that the first card-neutral DX12 game we'll see is Fable Legends but given AMD's involvement with Microsoft I can't guarantee that.
> 
> I've been saying for months that AMD wasn't losing the API war against Gameworks. People are about to see just how true that is over the next year, and it isn't going to be a pleasant year for us gamers. AMD's budget is now too tight to live up to their hype, so I wouldn't be surprised if all of these AMD supported DX12 titles ended up leaving a sour taste in the community's mouth for the new API when AMD's support is lacking.
> 
> 
> 
> I agree, looks like its going to be Gamesworks vs AMD Gaming Evolved Direct X 12 Supported Games for 2016.
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> Their whole twitter is 2014. They literally do not have a tweet from 2015.
> 
> On top of that, the majority of it is from their Mantle demo, StarSwarm, again not from 2015. What would nVidia have to do with Mantle?
> 
> Click to expand...
> 
> So if your whole twitter is littered with AMD, even if it is from a year ago. Then you still have no affiliation with AMD a year later, because there is nothing recent? Not to mention DX 12 benchmarks launch from your game and it shows a sudden miraculous improvement in AMD's cards performance. NVIDIA comes out directly as says this isn't accurate representation of performance. You see no connection.
> 
> Looks at your sig rigs, OK never mind don't answer that.
Click to expand...

There is literally no reason to involve nVidia in anything Mantle. I'm sure you can figure out why.

They do however source both DX12 and Vulkan, and even mention nVidia several times as well. Guess you missed that.
Quote:


> Originally Posted by *Ganf*
> 
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> Their whole twitter is 2014. They literally do not have a tweet from 2015.
> 
> On top of that, the majority of it is from their Mantle demo, StarSwarm, again not from 2015. What would nVidia have to do with Mantle?
> 
> 
> 
> You're looking at the wrong twitter account. A little critical thinking would've also told you that twitter isn't the end-all be-all of social marketing also and to check their webpage and facebook.
> 
> https://twitter.com/ashesgame
Click to expand...

Eh, used the link that was provided and was being talked about. Modern or not, it's what BiG StroOnZ was trying to use to rationalize... whatever he thinks needs rationalizing.
Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Ganf*
> 
> You're looking at the wrong twitter account. A little critical thinking would've also told you that twitter isn't the end-all be-all of social marketing also and to check their webpage and facebook.
> 
> https://twitter.com/ashesgame
> 
> 
> 
> Good find, here's a dead giveaway "Our Friends:"
> 
> From 5 months ago.
Click to expand...

Actual question for you.

Do you think it is impossible or immoral to be "friends" with both sides?

And a second question, is it impossible for you to see that they are re-tweeting just about everything that has to do with their game since it's, you know, a marketing thing? If nVidia decided to post "AotS is awesome!", they would probably do that too, as publicity is publicity. They're even re-tweeting reviews that we source in this very thread.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> When I brought up a sig rig, I wasn't responding to you. Best go back and read what I was saying and who I was saying it to.


Caught it, fixed it, check the edit.


----------



## ZealotKi11er

This is so amazing. I dont even know why OCN even bothers. Why speculate about Nvidia and AMD. Just observe and compare and use a bit of critical thinking. People have been saying for months that DX12 is very importand to AMD. With DX12 optimazation of the game is almost entirely done by the game dev. That plays a big part for AMD compare to Nvidia. Nvidia does not have a Ace in the DX12 hand.


----------



## KyadCK

Quote:


> Originally Posted by *ZealotKi11er*
> 
> This is so amazing. I dont even know why OCN even bothers. Why speculate about Nvidia and AMD. Just observe and compare and use a bit of critical thinking. People have been saying for months that DX12 is very importand to AMD. With DX12 optimazation of the game is almost entirely done by the game dev. That plays a big part for AMD compare to Nvidia. Nvidia does not have a Ace in the DX12 hand.


Because gossiping about who's sleeping with who is more fun then looking at graphs.

Welcome to MTV, enjoy your stay.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *KyadCK*
> 
> There is literally no reason to involve nVidia in anything Mantle. I'm sure you can figure out why.
> 
> They do however source both DX12 and Vulkan, and even mention nVidia several times as well. Guess you missed that.
> 
> Actual question for you.
> 
> Do you think it is impossible or immoral to be "friends" with both sides?
> 
> And a second question, is it impossible for you to see that they are re-tweeting just about everything that has to do with their game since it's, you know, a marketing thing? If nVidia decided to post "AotS is awesome!", they would probably do that too, as publicity is publicity. They're even re-tweeting reviews that we source in this very thread.


I'm sure you can figure out why there is no reason to involve NVIDIA in anything Mantle related, because all games that had Mantle support were Gaming Evolved Titles or Never Settle Bundles.

The majority of what they reference in the twitter is from AMD, for AMD, or Pro AMD. I'm sure you can figure out why.

Doesn't seem they are friends based on the twitter and blog war they are having. Going back and forth with each other attempting to discredit each other.

Why are you deflecting the fact that the majority of their re-tweets are talking about their "their partners from Oxide"



They aren't even denying that they are affiliated.


----------



## Klocek001

pascal is supposed to be 10x faster than maxwell, how are they gonna deliver this promise with maxwell getting a huge boost from dx12 ?


----------



## KyadCK

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> There is literally no reason to involve nVidia in anything Mantle. I'm sure you can figure out why.
> 
> They do however source both DX12 and Vulkan, and even mention nVidia several times as well. Guess you missed that.
> 
> Actual question for you.
> 
> Do you think it is impossible or immoral to be "friends" with both sides?
> 
> And a second question, is it impossible for you to see that they are re-tweeting just about everything that has to do with their game since it's, you know, a marketing thing? If nVidia decided to post "AotS is awesome!", they would probably do that too, as publicity is publicity. They're even re-tweeting reviews that we source in this very thread.
> 
> 
> 
> I'm sure you can figure out why there is no reason to involve NVIDIA in anything Mantle related, because all games that had Mantle support were Gaming Evolved Titles or Never Settle Bundles.
> 
> The majority of what they reference in the twitter is from AMD, for AMD, or Pro AMD. I'm sure you can figure out why.
> 
> Doesn't seem they are friends based on the twitter and blog war they are having. Going back and forth with each other attempting to discredit each other.
> 
> Why are you deflecting the fact that the majority of their re-tweets are talking about their "their partners from Oxide"
Click to expand...

Because what do I care what AMD says? They are irrelevant, and would obviously talk up their own products and connections because what else would they do.

Everything re-tweeted on that account is relevant to their game and/or an event featuring it or them. It is marketing.
Quote:


> Originally Posted by *Klocek001*
> 
> pascal is supposed to be 10x faster than maxwell, how are they gonna deliver this promise with maxwell getting a huge boost from dx12 ?


Because the only ones who actually think a x10 overall is going to happen in a single generation are high on 'shrooms?


----------



## Cyro999

Quote:


> Originally Posted by *Klocek001*
> 
> pascal is supposed to be 10x faster than maxwell, how are they gonna deliver this promise with maxwell getting a huge boost from dx12 ?


dx11 and dx12 are not about how fast the GPU itself is. They're about how fast instructions are given to the GPU. They don't improve performance, they just stop it from dropping when the rest of the system + software can't keep up.


----------



## Kuivamaa

Quote:


> Originally Posted by *OneB1t*
> 
> for some reason only 4 threads are used for this benchmark which is kind of sad hope that they will allow better scaling in release version
> 
> (thats why 6700 is faster than 5960)


6700 has a bit higher ipc, a big clock advantage (probably around 1GHz) and only wins by a hair. That means that the 5960X possibly compensates the lack of ipc and clocks through threads.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *KyadCK*
> 
> Because what do I care what AMD says? They are irrelevant, and would obviously talk up their own products and connections because what else would they do.
> 
> Everything re-tweeted on that account is relevant to their game and/or an event featuring it or them. It is marketing.


Re-read my previous post, they are affiliated and are even stating this:



"partners" = affiliation


----------



## PontiacGTX

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> I agree, looks like its going to be Gamesworks vs AMD Gaming Evolved Direct X 12 Supported Games for 2016.


only that gaming evolved titles dont destroy the competition´s performance or release broken games and blame the competition .








Quote:


> So if your whole twitter is littered with AMD, even if it is from a year ago. Then you still have no affiliation with AMD a year later, because there is nothing recent? Not to mention DX 12 benchmarks launch from your game and it shows a sudden miraculous improvement in AMD's cards performance. NVIDIA comes out directly and says this isn't accurate representation of performance. You see no connection?


they were the same game dev team which made the frst early bench for star swarm where nvidia had better performance, stop the conspirancy theory







Quote:


> Looks at your sig rigs, OK never mind don't answer that.


...
Quote:


> Originally Posted by *OneB1t*
> 
> for some reason only 4 threads are used for this benchmark which is kind of sad hope that they will allow better scaling in release version
> 
> (thats why 6700 is faster than 5960)


3 GHz vs 4GHz+5%


----------



## BiG StroOnZ

Quote:


> Originally Posted by *PontiacGTX*
> 
> onyl that gaming evolved games dont destroy the competition´s performance or release broken games and blame the competition .
> 
> 
> 
> 
> 
> 
> 
> 
> they were the same game dev team which made the frst early bench for star swarm where nvidia had better performance, stop the conspirancy theory


Apparently this game does, as they are affiliated with AMD and looking at their twitter they even say they are partners.

Sorry there is no conspiracy here:



They are partners.


----------



## error-id10t

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.
> 
> Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


Haven't read further, but don't we all know AMD drivers suck at DX11. It was proven with Mantle and now apparently with DX12..


----------



## PontiacGTX

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Apparently this game does, as they are affiliated with AMD and looking at their twitter they even say they are partners.
> 
> Sorry there is no conspiracy here:
> 
> 
> 
> They are partners. Maybe this wasn't the case in the early Star Swarm benches, but it is now.


whats wrong with it? the first test with DX12 were done biased towards nvidia.which could mean the drivers for nvidia were better back in 2014. now in 2015 amd decided to improve the performance on what they are betting to get a bigger market share given maybe new AAA titles will have directx 12 support.


----------



## mav451

1) http://www.overclock3d.net/articles/gpu_displays/amd_explains_dx12_multi_gpu_benefits/1
It even states, "Games Optimized for AMD." So yeah - seeing it highlighted in the literature, I can't see how this result is a surprise to anyone. We need to see this play out.

2) As for referencing Brad Wardell's twitter account:
I find even the most 'respected' accounts will have gaffes - but personal ones often blur that line. I can't say I'm surprised to see him antagonizing nVidia, especially with how close he is with AMD. It's understandable.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *PontiacGTX*
> 
> whats wrong with it? the first test with DX12 were done biased towards nvidia.which could mean the drivers for nvidia were better back in 2014. now in 2015 amd decided to improve the performance on what they are betting to get a bigger market share given maybe new AAA titles will have directx 12 support.


What's wrong with it, NVIDIA came out directly and said it isn't a representation of DX12 performance. Oxide fired back and basically dismissed everything they said.

Sorry but I'm not basing my feelings on DX12 performance off of one game basically sponsored by AMD.


----------



## Faithh

Quote:


> Originally Posted by *PontiacGTX*
> 
> whats wrong with it? the first test with DX12 were done biased towards nvidia.which could mean the drivers for nvidia were better back in 2014. now in 2015 amd decided to improve the performance on what they are betting to get a bigger market share given maybe new AAA titles will have directx 12 support.


Which were Star Swarm tests, specially written for AMD's PR. How logical is it that it's Nvidia biased? If Nvidia had better drivers, they'll likely perform better so AMD is here only to blame. How else can you win? So you're essentially saying as long as Nvidia wins it's Nvidia biased.


----------



## Ganf

Quote:


> Originally Posted by *Klocek001*
> 
> pascal is supposed to be 10x faster than maxwell, how are they gonna deliver this promise with maxwell getting a huge boost from dx12 ?


Because it's not 10x faster, that was some slick fast-talk by Jen.

His self proclaimed "CEO Math" starts at roughly 4:45. When you keep up with what he's saying and break it down it boils down to 8 Pascal cards in SLI are 10x faster than 4 Maxwell cards in SLI, but only when you use NVLink. In other words, it scales well when you use their proprietary system that will only be available to big enterprise.


----------



## SuchOverclock

Anyone remember when the Dev on "Ashes of the Singularity" said that DX12 would give a 200% performance increase.

I dont see it lol
Quote:


> Originally Posted by *Ganf*
> 
> Because it's not 10x faster, that was some slick fast-talk by Jen.
> 
> His self proclaimed "CEO Math" starts at roughly 4:45. When you keep up with what he's saying and break it down it boils down to 8 Pascal cards in SLI are 10x faster than 4 Maxwell cards in SLI, but only when you use NVLink. In other words, it scales well when you use their proprietary system that will only be available to big enterprise.


Pascal is going to be a game changer for sure! I honestly cant wait! I am just worried about AMD, will they have something to hit back with?

I like the wars, as it drives a massive performance increase yearly.


----------



## Ganf

Quote:


> Originally Posted by *SuchOverclock*
> 
> Anyone remember when the Dev on "Ashes of the Singularity" said that DX12 would give a 200% performance increase.
> 
> I dont see it lol


I think that was before they went and beefed up the game to handle up to 20,000 units on screen, to be honest.


----------



## lacrossewacker

Quote:


> Originally Posted by *SuchOverclock*
> 
> Anyone remember when the Dev on "Ashes of the Singularity" said that DX12 would give a 200% performance increase.
> 
> I dont see it lol
> Pascal is going to be a game changer for sure! I honestly cant wait! I am just worried about AMD, will they have something to hit back with?
> 
> I like the wars, as it drives a massive performance increase yearly.


source?

Eurogamer....
Quote:


> Average frame-rates with the Core i3 processor rise from 24fps to 40fps - a 67 per cent boost to performance. On top of that, even the i7 sees a big increase - rising from an average of just 28fps up to 49fps, a 75 per cent uplift. In both cases, lowest recorded frame-rates double - from 12 to 29fps on the i3 and from 15 to 32fps on the i7. But probably the biggest takeaway from the graph is that the i3 running DX12 is significantly outperforming the i7 on DX11 with the same GPU running the same game at the same settings.


Seems pretty significant to me
Quote:


> Originally Posted by *wccftech*
> This was the real surprise. With a less powerful PC, the DirectX 12 benchmark registered a gigantic 35.6 FPS increase, which *equates to +180% performance.* Also, for the first time we're not marked as GPU bound even though GTX 770 is obviously inferior to the GPUs we previously used.


----------



## SuchOverclock

Quote:


> Originally Posted by *Ganf*
> 
> I think that was before they went and beefed up the game to handle up to 20,000 units on screen, to be honest.


Damn, this could be a reason why DX11 performance isnt as high, I'm sure DX11 can handle a certain amount of triangles.


----------



## Ganf

Quote:


> Originally Posted by *lacrossewacker*
> 
> source?
> 
> Eurogamer....
> Seems pretty significant to me


Another source at Gamingbolt. For 70%.

http://gamingbolt.com/interview-with-brad-wardell-ps4xbox-one-differences-directx-12-ashes-of-the-singularity-and-more
Quote:


> Kurtis Simpson: With the public perception of Xbox One and since the announcement of DirectX12 and its impact on the Xbox One, Microsoft has seemed pretty silent. They appear to have taken a silent approach as to how DirectX12 will affect the Xbox One. What are your thoughts on this as to why they're so silent? Whereas other people and developers such as you have been quite outspoken about its benefits and what kind of impact it's going to have.
> Brad Wardell: With the Xbox One we're being pretty speculative right because there isn't a game that's using DirectX 12 on the console at this point in time, so I can't even do a side by side comparison. Whereas on the PC we have Ashes of the Singularity. It is a game that's been optimized for DirectX 11 and updated for DirectX 12, and you can run them side by side on the same hardware and get a 70% boost on DirectX 12 over DirectX 11.
> So it's pretty easy for me to say yes you'll get a huge impact on PC, but on the console it's all a theory. They have nothing, they don't even know. I mean I've talked to the development team there on this subject for a while and it basically boils down to, we don't know how much of an effect it will have because so much of it is in the hands of the developer.


That's an interesting little tidbit I hadn't heard about the XB1. Seems like it may not be seeing as significant an increase as people are hoping if he's playing it off like that.


----------



## SuchOverclock

Quote:


> Originally Posted by *lacrossewacker*
> 
> source?
> 
> Eurogamer....
> Seems pretty significant to me


This was while in dev, I'm too lazy to search for it







but the quote explains why they didnt reach the 200% performance boost, maybe they used it up on the units.

Quote:


> Originally Posted by *Ganf*
> 
> I think that was before they went and beefed up the game to handle up to 20,000 units on screen, to be honest.


----------



## Master__Shake

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> NVIDIA says that "We Don't Believe Ashes of the Singularity Benchmark To Be A Good Indicator Of DX12 Performance"


terrible news for nvidia.

it's the only representation of dx12 we have


----------



## Ganf

Quote:


> Originally Posted by *Master__Shake*
> 
> terrible news for nvidia.
> 
> it's the only representation of dx12 we have


Yeah, thing is even with the AMD bias I think that it is a good indicator of Nvidia's performance increase. This RTS has some silly processor requirements going on. They "Recommend" an i7 and a 4gb GPU.

http://www.ashesofthesingularity.com/game/faq
Quote:


> Q. What are the system requirements for the Ashes of the Singularity Alpha?A. These are the absolute MINIMUM System Requirements for the Alpha
> 
> Absolute Minimum: •64-bit Windows 7 / 8 / 10 OS
> •Quad Core CPU
> •8 GB Memory
> •2 GB DirectX 11 Video Card
> •1920x1080 Display Resolution
> •High-speed Internet Connection
> 
> We Recommend: •64-bit Windows 7 / 8 / 10 OS
> •i7 (or equivalent) CPU
> •8 GB Memory
> •4 GB DirectX 11 Video Card
> •1920x1080 Display Resolution
> •High-speed Internet Connection


----------



## Master__Shake

Quote:


> Originally Posted by *Ganf*
> 
> Yeah, thing is even with the AMD bias I think that it is a good indicator of Nvidia's performance increase. This RTS has some silly processor requirements going on. They "Recommend" an i7 and a 4gb GPU.
> 
> http://www.ashesofthesingularity.com/game/faq


well with the new generations of cards, having a 4gb card is simple and affordable.

i'm sure an i7 isn't necessary.

an i5 or 8xxx amd would be fine.


----------



## mtcn77

Quote:


> Originally Posted by *SuchOverclock*
> 
> Anyone remember when the Dev on "Ashes of the Singularity" said that DX12 would give a 200% performance increase.
> 
> I dont see it lol


I'll bite.


Spoiler: Warning: Spoiler!


----------



## p4inkill3r

Quote:


> Originally Posted by *Ganf*
> 
> Yeah, thing is even with the AMD bias I think that it is a good indicator of Nvidia's performance increase. This RTS has some silly processor requirements going on. They "Recommend" an i7 and a 4gb GPU.
> 
> http://www.ashesofthesingularity.com/game/faq


I think it is good that they're going to push the envelope some, barring the game being terrible, of course.

Absolute minimum of a quad core CPU is beautiful, IMO.


----------



## Ganf

Quote:


> Originally Posted by *Master__Shake*
> 
> well with the new generations of cards, having a 4gb card is simple and affordable.
> 
> i'm sure an i7 isn't necessary.
> 
> an i5 or 8xxx amd would be fine.


Not my point.

If an i7 is recommended that just lends credence to the theory that the game is a CPU smasher, which isn't a big leap of logic to make, it's a new RTS with a huge number of units on field and takes multithreading to it's full potential. DX12's claim to fame is the extra draw calls, it's supposed to boost performance most significantly for CPU bound applications and this is one of those scenarios. If there is a game that can choke a CPU on DX11, it should be this game, and with it being targeted to DX12 and apparently receiving the most optimization in DX12, the extra draw calls should show the most on this game above all others.

So in other words, if you want to represent the performance boost between DX11 and DX12, AotS is a perfect case scenario. Some lower end CPU's should really bring it out.


----------



## Master__Shake

Quote:


> Originally Posted by *Ganf*
> 
> Not my point.
> 
> *If an i7 is recommended that just lends credence to the theory that the game is a CPU smasher, which isn't a big leap of logic to make, it's a new RTS with a huge number of units on field and takes multithreading to it's full potential.* DX12's claim to fame is the extra draw calls, it's supposed to boost performance most significantly for CPU bound applications and this is one of those scenarios. If there is a game that can choke a CPU on DX11, it should be this game, and with it being targeted to DX12 and apparently receiving the most optimization in DX12, the extra draw calls should show the most on this game above all others.
> 
> So in other words, if you want to represent the performance boost between DX11 and DX12, AotS is a perfect case scenario. Some lower end CPU's should really bring it out.


after i posted i realised that you meant the EE i7.


----------



## Mas

Sorry, haven't had time to go through the links and read up (posting from work)

My biggest question: does DX12 justify i7 5930K build over i7 6700K build? Gaming isn't the ONLY reason I'm eyeing the 5930k build mind you, I also want it for things like transcoding, streaming/recording, etc... but for pure gaming, how does DX12 change things up in reality? Will those extra cores finally be justified (for those who can afford it) for gamers with DX12 games?


----------



## azanimefan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.
> 
> Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


its because directx12 is almost 1/2 the code of mantle. literally, half of dx12 code is line for line mantle; that and nvidia has pretty much rewritten dx11 with their drivers, so it almost behaves like dx12 for them. this is the difference in the driver teams between amd and nvidia. well those driver teams will have a whole lot less to do with dx12


----------



## PostalTwinkie

Quote:


> Originally Posted by *Robenger*
> 
> That Nvidia put all of their eggs in one basket called DX11. Now that developers have more control they need less of Nvidia and Game(sDon't)work.


But GameWorks makes up a small fraction of games, and in those games is a small fraction of optional content/feature. So that doesn't make sense.

Quote:


> Originally Posted by *ZealotKi11er*
> 
> ...... Also them working in Mantle made it so their DX11 performance was really left behind. I think they knew there was no point on spending R&D on DX11 with DX12 doing the work for them.


The concern with this is that not all games will be DX 12, DX 11 is going to be around for a long time I am sure. Unless there is some massive planned shift that the whole industry is just going to pick up and move as one. Which I am all for!

Quote:


> Originally Posted by *PontiacGTX*
> 
> AMD already knew how to improve their drivers for a low level API, a thing that nvidia is trying to understand.
> Or Nvidia is bad doing Directx 12 Drivers
> 
> 
> 
> 
> 
> 
> 
> 
> indeed AMD didnt optimize the Dx11 drivers because they plan to push DIrectx 12. it is about better performance on new A+ games.but in the meanwhile they can optimize the drivers to match Maxwel


Maybe it does boil down to Nvidia just not throwing enough resource or talent at DX 12 yet. If that is the case, that is really scary for AMD actually. Even ignoring Pascal, what happens when Maxwell gets proper DX 12 drivers - using your scenario? AMD gets absolutely destroyed if that is the case.

Something tells me these benchmark results might have to do with the developers extremely close relationship to AMD. That it is going to be one of the cherry picked ones that come out, at least I hope for AMD's sake. Because if this is because Nvidia just hasn't done a proper DX 12 driver, AMD is flatly screwed when they do.
Quote:


> Originally Posted by *Ganf*
> 
> I think that was before they went and beefed up the game to handle up to 20,000 units on screen, to be honest.


Starcraft II DX 12 Edition!

Quote:


> Originally Posted by *Mas*
> 
> Sorry, haven't had time to go through the links and read up (posting from work)
> 
> My biggest question: does DX12 justify i7 5930K build over i7 6700K build? Gaming isn't the ONLY reason I'm eyeing the 5930k build mind you, I also want it for things like transcoding, streaming/recording, etc... but for pure gaming, how does DX12 change things up in reality? Will those extra cores finally be justified (for those who can afford it) for gamers with DX12 games?


The question is if you should do the 5820K or the 5930K - do you need the extra PCI-E lanes the 5930K providers over a 5820K? Are you going beyind SLi (Tri or Quad?) AND a PCI-E SSD?


----------



## Clocknut

This is another reason to prove why buying AMD sucks, why AMD driver sucks. If u are not playing DX9-DX11, only play DX12/mantle games u could buy AMD...........

AMD's decision to NOT bringing DX9-DX11 up to Nvidia level is the reason why my next GPU is Nvidia no matter how price premium it gets. I so hate my 7790 now. the minimum fps just sucks on this GPU.


----------



## Mas

Quote:


> Originally Posted by *PostalTwinkie*
> 
> The question is if you should do the 5820K or the 5930K - do you need the extra PCI-E lanes the 5930K providers over a 5820K? Are you going beyind SLi (Tri or Quad?) AND a PCI-E SSD?


Well, only 1-2 cards to start, but I'd like to keep my options open. I try and upgrade my CPU/mobo/ram as seldom as possible (have been running i7 930 for over 5 years) and just swap out the GPU every year or two.


----------



## delboy67

Has any site done power consumption dx11 vs 12?


----------



## Cyro999

Quote:


> Originally Posted by *delboy67*
> 
> Has any site done power consumption dx11 vs 12?


It generally rises with dx12 in situations where there's any difference because dx11 might be getting 50fps with GPU at 50% load, while dx12 gets 100FPS with GPU at 100% load.

The reduced cost of CPU work per frame is offset by preparing a ton more frames and the GPU not being idle any more - so power improvements are mostly seen when you have a situation with a locked framerate and you're now having your hardware be idle for long periods of time that it couldn't be previously (so.. not utilizing it, which is something that people want to avoid unless you're running on a battery)


----------



## Ganf

Quote:


> Originally Posted by *Clocknut*
> 
> This is another reason to prove why buying AMD sucks, why AMD driver sucks. If u are not playing DX9-DX11, only play DX12/mantle games u could buy AMD...........
> 
> AMD's decision to NOT bringing DX9-DX11 up to Nvidia level is the reason why my next GPU is Nvidia no matter how price premium it gets. I so hate my 7790 now. the minimum fps just sucks on this GPU.


Well, you can have one manufacturer that moves too quickly to new platforms and abandons the old a little too early but keeps optimizing their hardware, or you can have another manufacturer that focuses more on the platforms but likes to abandon their hardware to encourage you to upgrade. Your choice.

They can both slag off as far as I'm concerned. If I had moneybux I'd angel invest in a 3rd manufacturer startup just on sheer principles, I wouldn't care if it was some guys in a garage in Kansas, they'd suddenly have billions in equity and there'd be a lot of publicity with my face on TV using both AMD and Nvidia's name in between a lot of bleeped out words, followed by both my and the startup's bankruptcy months later.


----------



## Derp

Quote:


> Originally Posted by *Clocknut*
> 
> This is another reason to prove why buying AMD sucks, why AMD driver sucks. If u are not playing DX9-DX11, only play DX12/mantle games u could buy AMD...........
> 
> AMD's decision to NOT bringing DX9-DX11 up to Nvidia level is the reason why my next GPU is Nvidia no matter how price premium it gets. I so hate my 7790 now. the minimum fps just sucks on this GPU.


You're not alone.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Clocknut*
> 
> This is another reason to prove why buying AMD sucks, why AMD driver sucks. If u are not playing DX9-DX11, only play DX12/mantle games u could buy AMD...........
> 
> AMD's decision to NOT bringing DX9-DX11 up to Nvidia level is the reason why my next GPU is Nvidia no matter how price premium it gets. I so hate my 7790 now. the minimum fps just sucks on this GPU.


I have seen you complain about your HD 7790 a lot. I don't think you understand how CPU overhead works. People with high end GPUs and CF get a slap in the face but HD 7790....


----------



## Slomo4shO

Quote:


> Originally Posted by *Ha-Nocri*
> 
> ut to more than 2x faster compared to dx11, 390x. Very nice


Speaks more of the inferiority of the drivers from AMD than the capacity of Hawaii...
Quote:


> Originally Posted by *mtcn77*
> 
> I'll bite.
> 
> 
> Spoiler: Warning: Spoiler!


Number of draw calls and FPS are two very different benchmarks...


----------



## m0n4rch

Quote:


> Originally Posted by *Ganf*
> 
> Not my point.
> 
> If an i7 is recommended that just lends credence to the theory that the game is a CPU smasher, which isn't a big leap of logic to make, it's a new RTS with a huge number of units on field and takes multithreading to it's full potential. DX12's claim to fame is the extra draw calls, it's supposed to boost performance most significantly for CPU bound applications and this is one of those scenarios. If there is a game that can choke a CPU on DX11, it should be this game, and with it being targeted to DX12 and apparently receiving the most optimization in DX12, the extra draw calls should show the most on this game above all others.
> 
> So in other words, if you want to represent the performance boost between DX11 and DX12, AotS is a perfect case scenario. Some lower end CPU's should really bring it out.


But how does DX12 increase amount of draw calls? By enabling multiple cores to talk to the GPU, right? Multi-threaded rendering. DX12 is supposed to actually utilize high-end CPUs, not to make lower end CPU's perform amazing. If you have a slow CPU that's choking on game logic and not on making draw calls, then low-level API is useless in that case.


----------



## Clocknut

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I have seen you complain about your HD 7790 a lot. I don't think you understand how CPU overhead works. People with high end GPUs and CF get a slap in the face but HD 7790....


try playing a game that is only use single or dual core that requires a lot of draw calls then u get what I mean. Most of the indies, MMO falls into this category.

As soon as the price gap between a used 7790 and a used 750ti get smaller, I am switching back to Nvidia.(I not gonna do side-grade by topping up too much $), I am hoping the release of GTX950 could make this happen


----------



## fewness

This is fun, I need to try it myself....What's the way to get this Alpha build ? It's not exclusive to those review sites I hope...


----------



## SuchOverclock

Quote:


> Originally Posted by *mtcn77*
> 
> I'll bite.
> 
> 
> Spoiler: Warning: Spoiler!


That is not relevant as the 980ti or the Fury X or 390X wasnt even released when he said that.


----------



## mtcn77

Quote:


> Originally Posted by *SuchOverclock*
> 
> That is ill-relevant as the 980ti or the Fury X or 390X wasnt even released when he said that.


If you trace the review, they say they can emulate an infinitely fast 'virtual gpu' to test out the cpu. That makes me think they knew what they would get all along.
Quote:


> Ashes of the Singularity also includes a CPU benchmark that can be used to simulate an infinitely fast GPU - useful for measuring how GPU-bound any given segment of the game actually is.


[Extremetech]


----------



## PostalTwinkie

Quote:


> Originally Posted by *Clocknut*
> 
> try playing a game that is only use single or dual core that requires a lot of draw calls then u get what I mean. Most of the indies, MMO falls into this category.
> 
> As soon as the price gap between a used 7790 and a used 750ti get smaller, I am switching back to Nvidia.(I not gonna do side-grade by topping up too much $), I am hoping the release of GTX950 could make this happen


I would expect the release of a 950 Ti to shift the used 750 Ti market, even though the 950 should be slightly faster than the 750 Ti. Mainly because people like that "Ti", and based off my purely anecdotal data, aren't as popular. In other words, I see a lot more people discussing and purchasing the 650 Ti, 750 Ti, etc a lot more than their non-Ti sibling.

Then you have the other argument that people in that segment don't switch but every several years, and it takes failures or major changes to get them to move. Meaning a much more static used market that isn't as likely to shift generation to generation, as compared to the higher tier cards and their market.


----------



## Liranan

Can't wait for this game to be released and hope this gets ported to Vulkan too, would be a shame if it's DX only.

The most important things that become apparent reading these benchmarks is that AMD's DX11 drivers are terrible (DX12 are good) and their CPU's are weak though that weakness is being made up by DX12 being threaded.

Now I can't wait for the SC2 brigade to come in and claim that RTS can't be multi threaded and they need to use x87.


----------



## ChronoBodi

Quote:


> Originally Posted by *Liranan*
> 
> Can't wait for this game to be released and hope this gets ported to Vulkan too, would be a shame if it's DX only.
> 
> The most important things that become apparent reading these benchmarks is that AMD's DX11 drivers are terrible (DX12 are good) and their CPU's are weak though that weakness is being made up by DX12 being threaded.
> 
> Now I can't wait for the SC2 brigade to come in and claim that RTS can't be multi threaded and they need to use x87.


Why would SC2 brigade claim so? Isn't RTS the genre that has the most potential for multi threading, considering all the units and everything you normally see in RTS gameplay?


----------



## biz1

Quote:


> Originally Posted by *ChronoBodi*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Liranan*
> 
> Can't wait for this game to be released and hope this gets ported to Vulkan too, would be a shame if it's DX only.
> 
> The most important things that become apparent reading these benchmarks is that AMD's DX11 drivers are terrible (DX12 are good) and their CPU's are weak though that weakness is being made up by DX12 being threaded.
> 
> Now I can't wait for the SC2 brigade to come in and claim that RTS can't be multi threaded and they need to use x87.
> 
> 
> 
> Why would SC2 brigade claim so? Isn't RTS the genre that has the most potential for multi threading, considering all the units and everything you normally see in RTS gameplay?
Click to expand...

i think the argument is that if there are 2 threads that want to move a unit to the same spot, then whichever thread executes first determines which unit actually gets to go there
and that order of execution can't be guaranteed to be the same across all machines in a multiplayer setting

the real problem is that there's a primitive pathfinding algorithm instead of an actual AI deciding which unit moves where... that's a SC2 problem. not a RTS problem


----------



## Defoler

Pre beta without fully supported drivers. I will wait until it is out before judging anything regarding performance and DX12 actual performance on the cards, as the game seems to be very little CPU bound, and 4 cores act as good as 12.


----------



## Klocek001

While this probably won't affect 980ti as holding the fastest card spot in early DX12 games until Pascal/next r9 with HBM2, I suspect 960/970 to start trailing to 380/390 noticeably. It would actually be good if Fury X caught up with 980ti, you'd have a choice whether you want to stay at current performance and get great noise/temp levels with Fury X or push for 5-10% more performance at either greater cost or higher noise level with overclocking 980ti. A choice is always a good thing.


----------



## Liranan

Quote:


> Originally Posted by *ChronoBodi*
> 
> Why would SC2 brigade claim so? Isn't RTS the genre that has the most potential for multi threading, considering all the units and everything you normally see in RTS gameplay?


Quote:


> Originally Posted by *biz1*
> 
> i think the argument is that if there are 2 threads that want to move a unit to the same spot, then whichever thread executes first determines which unit actually gets to go there
> and that order of execution can't be guaranteed to be the same across all machines in a multiplayer setting
> 
> the real problem is that there's a primitive pathfinding algorithm instead of an actual AI deciding which unit moves where... that's a SC2 problem. not a RTS problem


There are some who claim that RTS simply can't be multi threaded due to the nature of RTS itself. They then claim that Supreme Commander suffering from desync is proof that RTS needs to be single threaded in nature but finally we have a game that, even in early beta, shows that RTS need to be multi threaded as there simply is no CPU fast enough to render hundreds of units on screen fast enough to maintain smooth game play. Even Supreme Commander sees fps plummet when 500 units start going at each other and that's an ancient game.

All hail our new Mantle/Vulkan/DX12 overlord.


----------



## epic1337

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am almost suspect of the gains for AMD. They bring up more questions than they answer, at least for myself.
> 
> Why are they so massive? Is AMD just that bad at writing drivers for DX 11? Is it intentional? Or does it happen to be that lower level APIs like the particular AMD architecture design? Are the results even valid?


if you give it a way to access the lower level codes for far higher utilization and efficiency, of course the GPUs would end up showing it's full potential.
both AMD and Nvidia has this benefits, in particular i'm pretty sure Fiji's shader cores aren't being fully utilized as well.

plus, Nvidia has a much more massive architecture than it looks at first glance.
if you look at how it is in GPGPU, Nvidia is under-utilized.


----------



## Wishmaker

Surprised to see so few with 'I told you so, AMD roxxxxx'







!


----------



## Silent Scone

Quote:


> Originally Posted by *BinaryDemon*
> 
> To me it just looks like the Fury X is massively under-performing on DX11.


I have to agree. The gains seen here are promising, obviously this is the very early birth of something very good - wide adoption by developers and updating existing tool chains / engines is still very much a long haul endeavour and it will be a year or two before users will really reap these benefits. Right now given the limited viewing window we have into the API, this test really shows just how much overhead there really is on AMD's DX11 drivers, and it's seriously crippling Fiji at lower resolutions. Arguably, this is a non issue now which works convinently in AMD's favour. However you can bet NVIDIA will have been working hard to reverse engineer a lot of the work to favour the new API.

The results certainly put the Fury X in a much better light at UHD as a single card solution than before, I would be interested to see both XDMA and SLI results both in the Oxide benchmark and in other DX12 application. We still need to see how developers cope with supporting this natively.


----------



## error-id10t

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Maybe it does boil down to Nvidia just not throwing enough resource or talent at DX 12 yet. If that is the case, that is really scary for AMD actually. Even ignoring Pascal, what happens when Maxwell gets proper DX 12 drivers - using your scenario? AMD gets absolutely destroyed if that is the case.


Dude, pull your head out.

How can you twist Nvidia lose to a win, by ignoring fact and twisting logic. This shows AMD winning the competition by a mile, the first test. Why are they winning? Because DX11 performance was so poor they couldn't compete, enter DX12 and suddenly it's not their HW that's poor anymore, it's equal and/or better. Plenty more to come but right now Nvidia is losing.


----------



## Klocek001

Quote:


> Originally Posted by *error-id10t*
> 
> Plenty more to come but right now Nvidia is losing.


define "right now".


----------



## Silent Scone

Quote:


> Originally Posted by *error-id10t*
> 
> Dude, pull your head out.
> 
> How can you twist Nvidia lose to a win, by ignoring fact and twisting logic. This shows AMD winning the competition by a mile, the first test. Why are they winning? Because DX11 performance was so poor they couldn't compete, enter DX12 and suddenly it's not their HW that's poor anymore, it's equal and/or better. Plenty more to come but right now Nvidia is losing.


Actually this statement is equally as twisting (as you put it) only likely out of ignorance. AMD's CPU overhead has been an issue for a very long time, your statement is discrediting all of that work from NVIDIA over the generations in favour of what is at the present moment a remedy for this facer - least until uptake and development improves in the next couple of years.


----------



## Klocek001

I'm puzzled with the number of people actually believing that this is a trustworthy indication of anything. I mean dx11 runs better than dx12 for nvidia cards while dx12 gives amd a 60% boost. DX12 isn't a turbo mode for AMD or any card for that matter, it's an API, like dx11.
What sort of crazy morons must the developers be if they want me to believe that this indicates dx12 performance. And *do* they actually say it's an indication of dx12 performance, cause this is the first question that should be asked. In the same benchmarks results Fury X runs 50% slower than 980ti in dx11, but people see nothing wrong with that.


----------



## Clocknut

Quote:


> Originally Posted by *error-id10t*
> 
> Dude, pull your head out.
> 
> How can you twist Nvidia lose to a win, by ignoring fact and twisting logic. This shows AMD winning the competition by a mile, the first test. Why are they winning? Because DX11 performance was so poor they couldn't compete, enter DX12 and suddenly it's not their HW that's poor anymore, it's equal and/or better. Plenty more to come but right now Nvidia is losing.


well..... he could be right too.

AMD hasnt been put much resources in fixing DX9-11 drivers. Have they even bother to do that, each and every GCN card will get a hefty performance bump.


----------



## Wishmaker

[at work so cannot really make the meme]

*Success Kid* : Won one random DX 12 test, AMD is better than NVIDIA!


----------



## iLeakStuff

The really important question here that I don`t see anyone asking is (if AMD does better on DX12):
Why does the AMD GPUs do better in DX12?
Why does the AMD GPUs do worse in DX11?

....


----------



## Silent Scone

Quote:


> Originally Posted by *iLeakStuff*
> 
> The really important question here that I don`t see anyone asking is (if AMD does better on DX12):
> Why does the AMD GPUs do better in DX12?
> Why does the AMD GPUs do worse in DX11?
> 
> ....


This has been addressed, it's almost a legacy type affair so certain members may not even feel the need to mention it. AMD's DX11 drivers have sizeable CPU overhead under DX11. DX12 removes this overhead, as Mantle did before it.


----------



## error-id10t

Quote:


> Originally Posted by *Klocek001*
> 
> define "right now".


As in "right now".

When there are more benches, games, whatever we revisit the situation and see where everything stands. Nothing more or less.

The problem with this site is continuous bashing of AMD. It's never good news for them, everything is always twisted. It's like they stole everyone's baby and the nerds are up in arms trying to bring their baby back.

If you've lived under a rock and didn't know AMD drivers sucked under DX11, well shame on you, I can't help you. Right now I'm happy to see we may have competition.


----------



## MonarchX

Quote:


> Originally Posted by *Silent Scone*
> 
> Actually this statement is equally as twisting (as you put it) only likely out of ignorance. AMD's CPU overhead has been an issue for a very long time, your statement is discrediting all of that work from NVIDIA over the generations in favour of what is at the present moment a remedy for this facer - least until uptake and development improves in the next couple of years.


That makes a lot of sense. NVidia technically losing in DirectX 12, BUT their DirectX 11 performance that beats AMD shows how incredibly optimized their DirectX 11 drivers are when it comes to CPU utilization. Not only that, but their cards were also well engineered to perform so efficiently in DirectX 11. Don't get me wrong - I am more than certain *NVidia's future drivers will improve their DirectX 12 situation, but their current drivers were already optimized/profiled. for this benchmark if you read their release notes
*
AMD's DirectX 11 drivers and cards sucked, apparently due to drivers' and cards' poor CPU utilization. DirectX 12 fixed that utilization and now AMD cards perform much better. *Have you forgotten Mantle?* Why do you think AMD decided to work hard to create their own API that is VERY CLOSE to DirectX 12 when it comes to CPU utilization??? They obviously saw that their DirectX 11 performance was crap, but with Mantle or DirectX 12, their cards and drivers would get rid of that CPU utilization bottleneck. *It all makes perfect sense.*

This is a GOOD thing because now NVidia will have to work harder at driver optimization, work closer with game developers, COMPETE-COMPETE-COMPETE and do so aggressively. Competition is great for us = better performance, lower costs, faster future hardware.


----------



## Pen_Cap_Chew

Quote:


> Originally Posted by *Forceman*
> 
> Those resources Nvidia were allocating to DX11 development don't just go away, so even if that was the case I think you can reasonably expect Nvidia to start putting some of their DX11 effort into DX12 development. And let's not get too carried away based on one benchmark from one developer.


You seem to be fundamentally confused here. This is overclock.net... where wild claims are born... concrete, unmoving opinions/conclusions and hard facts are formed based on the slightest amount of possible rumors and hearsay because someones blind, deaf and mute great-uncle MIGHT have read something on The Onion. Reminding people to "not get too carried away based on one benchmark from one developer" is craziness! Sound ideas like that do not belong here sir!


----------



## MonarchX

Quote:


> Originally Posted by *Silent Scone*
> 
> This has been addressed, it's almost a legacy type affair so certain members may not even feel the need to mention it. AMD's DX11 drivers have sizeable CPU overhead under DX11. DX12 removes this overhead, as Mantle did before it.


It may not be drivers, but the way cards were engineered or a combination of both drivers and engineering... It is just hard to fathom that after so many years of AMD's hard work, they could not even optimize CPU utilization for DirectX 11 for their cards.


----------



## Olivon

Quote:


> Originally Posted by *p4inkill3r*
> 
> And so it starts.


What did you expect ? Oxyde work closer with AMD and was beyond Mantle demos too.
For the moment, this DX12 demo means absolutely nothing.


----------



## sugarhell

Amd never optimized dx11 for oxide engine. At first to claim big mantle increase over dx11. Now the same for dx12.

They have more to win over dx12 than nvidia.

Also a big shader array + dx11 its not optimal.


----------



## GekzOverlord

Any benchmarks on where DX12 may actually shine like in the lower/mid range market? It's nice to see that the big dogs are getting more gains, but what about the mass market where it would make a difference?  (haven't checked all the links yet, there is just too much recycled data everywhere)


----------



## Klocek001

Quote:


> Originally Posted by *error-id10t*
> 
> As in "right now".
> 
> If you've lived under a rock and didn't know AMD drivers sucked under DX11, well shame on you, I can't help you. Right now I'm happy to see we may have competition.


Right now 980ti is destroying Fury X at my resolution (1440p), I can't see why you're calling this benchmark as "right now". Does it reflect the performance status of Fury and 980ti as of right now ?
If you could just click my sig once you'd see I've owned a 7870 and 290(s) too. I know what the situation is in dx11.


----------



## Ganf

Quote:


> Originally Posted by *m0n4rch*
> 
> But how does DX12 increase amount of draw calls? By enabling multiple cores to talk to the GPU, right? Multi-threaded rendering. DX12 is supposed to actually utilize high-end CPUs, not to make lower end CPU's perform amazing. If you have a slow CPU that's choking on game logic and not on making draw calls, then low-level API is useless in that case.


No, because one of the biggest problems is how the API's prioritize draw calls. Not even low end CPU's are getting "too many" draw calls on DX11, the draw calls are just lower priority, in too few of numbers and disorganized. In DX12 even single threaded performance sees a 50% increase in draw calls and the way the processor handles draw calls is changed significantly.

It's been known for a while now that DX12 will benefit low end rigs the most, and the enthusiast level the least. That's why some of these tests are turning off hyperthreading to simulate an older quad core.


----------



## error-id10t

I don't get what part is hard to understand especially once you read the thread title?

Just so we're clear, I don't think I was putting Nvidia down here so not sure why anyone's pants should be in a twist. Anyway, this arguing is boring me so I'll leave it at that. Like I said, for me this is promising news but early days.


----------



## Serios

Quote:


> Originally Posted by *Kand*
> 
> You want to believe but its limited by the hardware. These are still just glorified 7870s with GCN 1.0.


What are you taking about man?? The HD 7870 or 7850 officialy support DX12.
You are simply wrong, deal whit it.


----------



## Klocek001

Quote:


> Originally Posted by *Serios*
> 
> What are you taking about man?? The HD 7870 or 7850 officialy support DX12.
> You are simply wrong, deal whit it.


They do support dx12 with 11_1 feature level.


----------



## Xuper

The Game is at alpha stage but the game engine is mature So those benchmarks are Valid.there is little room for nvidia to improve FPS.Funny that AMD is Silent.also Nvidia is not happy, wonder why ? Because Nvidia says Why AMD's cards are close to our cards , this shouldn't be because We're God!


----------



## adogg23

if this is any indication of what dx13 will be than intel hd graphics are gonna beat AMD and NVIDIA


----------



## poii

Quote:


> Originally Posted by *sugarhell*
> 
> Amd never optimized dx11 for oxide engine. At first to claim big mantle increase over dx11. Now the same for dx12.
> 
> They have more to win over dx12 than nvidia.
> 
> Also a big shader array + dx11 its not optimal.


Amen, they did the same in Civ:BE too.
http://www.computerbase.de/2014-10/civilization-beyond-earth-benchmarks-grafikkarten-mantle/#diagramm-grafikkarten-benchmarks-in-1920-1080
see AMD Mantle vs. DirectX bei CPU-Last 1280x768

AMD Mantle > AMD DX11
but it's 50/50 on AMD Mantle vs Nvidia DX11

And the shown results in these Ashes of the Singularity DX12 Benchmarks should all be about AMD DX12 vs Nvidia DX12.
Not about gains DX11 to DX12 on AMD/Nvidia.


----------



## Defoler

Quote:


> Originally Posted by *error-id10t*
> 
> Dude, pull your head out.
> 
> How can you twist Nvidia lose to a win, by ignoring fact and twisting logic. This shows AMD winning the competition by a mile, the first test. Why are they winning? Because DX11 performance was so poor they couldn't compete, enter DX12 and suddenly it's not their HW that's poor anymore, it's equal and/or better. Plenty more to come but right now Nvidia is losing.


Competition? By a mile?
From a game with no high core support? From a game still in _pre-beta_ and no full drivers support?
Showing similar performance results is a mile for you?

This was even marked by both AMD _and_ Nvidia:
Quote:


> We talked with NVIDIA and AMD about the benchmark and both noted that this is alpha software and we took that as they felt it *might not be an accurate measurement* of DX12 at this point in time.


Dude... talking about pull heads out...

DX12 drivers with full API support and full on optimisation hadn't even been out yet. This game is by far the worst definition to either a win or even to a "competition" you can compare with.

When games with better CPU utilisation and better GPU drivers come out, than we can talk about a competition, let alone a "win". Until then, this is just a "maybe" indication if we are super generous to it.

pulling heads out.... you should get yours back in


----------



## Silent Scone

Why even rise to it.


----------



## Defoler

Quote:


> Originally Posted by *Serios*
> 
> What are you taking about man?? The HD 7870 or 7850 officialy support DX12.
> You are simply wrong, deal whit it.


Both of these cards only support DX12 legacy mode, and not the full API. That means partial lower level API, and none of the new features, as in faster textures and better visuals.
The gain in them in DX12 will be minimal at best compared to DX11.


----------



## Superplush

Quote:


> Originally Posted by *KyadCK*
> 
> You want to give that another go?


Heh, thanks for pointing that out.

Thats what happens when looking after a puppy whilst working and getting 5 hrs of sleep.
I'm sure people know what I meant though Kyad


----------



## Ganf

Quote:


> Originally Posted by *Defoler*
> 
> When games with better CPU utilisation and better GPU drivers come out, than we can talk about a competition, let alone a "win". Until then, this is just a "maybe" indication if we are super generous to it.


Quote:


> Curiously, the Ashes of the Singularity benchmark also has a DX12 benchmark purely for the CPU - a frame-rate average that completely eliminates the GPU from contention, giving an idea of theoretical top-end performance.


http://www.eurogamer.net/articles/digitalfoundry-2015-ashes-of-the-singularity-dx12-benchmark-tested

Just sayin'... Every reviewer so far has scrutinized the CPU usage pretty thoroughly, no one has found it lacking, and many have praised it's optimization.

You won't find a better example for CPU optimization than a multithreaded DX12 uber-scale RTS with thousands of units on screen. They simply don't make them.


----------



## Xuper

AMDMatt :



Wccftech :



Different :

Setting for Fury X Is heavier than Titanx.

Temporal AA Duration : 12

Multisample Anti-Aliasing : 4x

Terrian Shading Samples : 16 Million

AMDMatt confirmed that this game engine uses all Cores at 100%. here:


----------



## sugarhell

Quote:


> Originally Posted by *Defoler*
> 
> Both of these cards only support DX12 legacy mode, and not the full API. That means partial lower level API, and none of the new features, as in faster textures and better visuals.
> The gain in them in DX12 will be minimal at best compared to DX11.


Dx12 is not even out and we have legacy mode.lol

Feature levels are mostly for new more efficient rendering methods.

The most important hardware(you see the difference?) feature for dx12 is resource binding something that GCN and maxwell 2 support fully.

Anything with dx11_x feature level support 100% the dx12.


----------



## Ganf

Thinking about buying the founder's pack just for the benchmarking... It's been so long since I've enjoyed an RTS I doubt I could get into this but guys.... DX12 benchmarks and epeen.... Amirite?


----------



## p4inkill3r

Quote:


> Originally Posted by *Ganf*
> 
> Thinking about buying the founder's pack just for the benchmarking... It's been so long since I've enjoyed an RTS I doubt I could get into this but guys.... DX12 benchmarks and epeen.... Amirite?


I've seen worse logic trying to justify preordering, go for it!


----------



## Silent Scone

Quote:


> Originally Posted by *sugarhell*
> 
> Dx12 is not even out and we have legacy mode.lol
> 
> Feature levels are mostly for new more efficient rendering methods.
> 
> The most important hardware(you see the difference?) feature for dx12 is resource binding something that GCN and maxwell 2 support fully.
> 
> Anything with dx11_x feature level support 100% the dx12.


Yeah, to be honest though you can see where the confusion comes from. It's almost deliberately confusing but the media have instigated that. The feature level support is practically of no importance as long as the card is 11.1 compliant. It's difficult to fathom at this point how certain 12.0 features will effect performance on non compliant cards (GCN 1.0, fermi etc). For instance tiled resources could have a large detrimental impact on things like geometry streaming if the performance is hard hitting enough without a specific tier level or what this implies, or whether the game will even run. Sounds maybe slightly extreme case but as we've not seen these features implemented yet it's hard to say.

Whatever the outcome, I guess users should be grateful these changes are finally here and man up and upgrade


----------



## sugarhell

Quote:


> Originally Posted by *Silent Scone*
> 
> Yeah, to be honest though you can see where the confusion comes from. It's almost deliberately confusing but the media have instigated that. The feature level support is practically of no importance as long as the card is 11.1 compliant. It's difficult to fathom at this point how certain 12.0 features will effect performance on non compliant cards (GCN 1.0, fermi etc). For instance tiled resources could have a large detrimental impact on things like geometry streaming if the performance is hard hitting enough without a specific tier level or what this implies, or whether the game will even run. Sounds maybe slightly extreme case but as we've not seen these features implemented yet it's hard to say.
> 
> Whatever the outcome, I guess users should be grateful these changes are finally here and man up and upgrade


We will see. Tiled resources kinda failed with dx11.1 . Hope to get better with dx12.

By the way if we wait for example 2 years for dx12 to mature (drivers,engines,logic etc etc etc) 7970 will be 5+ years old. I will not even care about feature levels because my 7970 will not even play a demanding game.


----------



## Forceman

Quote:


> Originally Posted by *Klocek001*
> 
> I'm puzzled with the number of people actually believing that this is a trustworthy indication of anything. I mean dx11 runs better than dx12 for nvidia cards while dx12 gives amd a 60% boost. DX12 isn't a turbo mode for AMD or any card for that matter, it's an API, like dx11.
> What sort of crazy morons must the developers be if they want me to believe that this indicates dx12 performance. And *do* they actually say it's an indication of dx12 performance, cause this is the first question that should be asked. In the same benchmarks results Fury X runs 50% slower than 980ti in dx11, but people see nothing wrong with that.


I think this is an important point. There is no reason why a 390X should suddenly gain that much FPS just from going to DX12, and really no reason why a 980 is 90% faster in DX11. They have overhead problems, sure, but that seems extreme. If you compare just the DX12 performance numbers themselves, things are not that much different than they are with DX11, some improvement for AMD but nothing earth-shattering. The huge DX11-DX12 gains for AMD are coloring the issue, and a conspiracy theorist might even come to believe AMD purposely borked their DX11 numbers just to provide some splashy news. It's pretty convenient how this AMD affiliated benchmark fits so neatly into their rhetoric about big DX12 gains for their cards.

And for anyone about to post "no, no, it's just their DX11 overhead", are you really expecting people to believe that AMD could have doubled 290X/390X performance in some DX11 situations and they never bothered to throw an engineer or two at it? And if the answer to that is "well, it's only 1 or 2 cases where it makes that much difference", then we are back full-circle to let's not read too much into one benchmark from one developer.


----------



## Kuivamaa

Nothing surprising really. I seriously doubt AMD optimized their drivers for DX11 path in games that mantle was available, why would they? Same goes for DX11 here. Since early 2014 they were preparing for post DX11 APIs. They see gains in this engine because they obviously didn't put any effort in their DX11 solution.


----------



## Gunderman456

I'm not sure why all the fuss?

It is clear that NVIDIA is always up to the task when it comes to their DX11 and DX12 drivers and that is why we see the cards perform at their max potential.

AMD fumbled on DX11, which is unfair to everyone who owns AMD, and did a good job on DX12, which shows in the results.

We got an early preview with Mantel and since DX12 functions like Mantel, well this also gets reflected in Ashes of the Singularity.

I'm sure despite what NVIDIA and Oxide says, these results will be typical in DX12 games.


----------



## mav451

Quote:


> Originally Posted by *Gunderman456*
> 
> I'm not sure why all the fuss?
> 
> It is clear that NVIDIA is always up to the task when it comes to their DX11 and DX12 drivers and that is why we see the cards perform at their max potential.
> 
> AMD fumbled on DX11, which is unfair to everyone who owns AMD, and did a good job on DX12, which shows in the results.
> 
> We got an early preview with Mantel and since DX12 functions like Mantel, well this also gets reflected in Ashes of the Singularity.
> 
> I'm sure despite what NVIDIA and Oxide says, *these results will be typical in DX12 games.*


Hmm - I'm not as sure about that.
If you read Oxide's statement in full (especially removing the 'misinformation' segment), it's actually a good write-up on the state of benchmarking in 2015. When you see a developer touching on GPU/CPU-limited situations and then talking about weighted frame-rates, I do have some confidence that there is developer awareness on this representing something tangible.

A step-forward is a step-forward, despite the rhetoric we have see haha.

I think until we see more games, particularly ones *not* highlighted in AMD literature, will I really conclude that this is typical DX12 behavior.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Gunderman456*
> 
> I'm not sure why all the fuss?
> 
> It is clear that NVIDIA is always up to the task when it comes to their DX11 and DX12 drivers and that is why we see the cards perform at their max potential.
> 
> AMD fumbled on DX11, which is unfair to everyone who owns AMD, and did a good job on DX12, which shows in the results.
> 
> We got an early preview with Mantel and since DX12 functions like Mantel, well this also gets reflected in Ashes of the Singularity.
> 
> I'm sure despite what NVIDIA and Oxide says, these results will be typical in DX12 games.


In normal games you will not see much difference but AMD this time will not fall behind Nvidia because of CPU overhead. AMD and Nvidia have to do a lot less work optimizing DX12 games unlike DX11.


----------



## iLeakStuff

Quote:


> Originally Posted by *Silent Scone*
> 
> This has been addressed, it's almost a legacy type affair so certain members may not even feel the need to mention it. AMD's DX11 drivers have sizeable CPU overhead under DX11. DX12 removes this overhead, as Mantle did before it.


The hell does this mean? CPU overhead involves the CPU and not the GPU.
How is the CPU part of any videodriver?


----------



## Olivon

Quote:


> Originally Posted by *Gunderman456*
> 
> AMD fumbled on DX11, which is unfair to everyone who owns AMD, and did a good job on DX12, which shows in the results.


AMD did a good job on DX12 ? How do you know it ?
AMD did spit at their users for DX11 lackluster support, yeah, we know it for sure but for DX12 we know nothing.


----------



## sugarhell

Quote:


> Originally Posted by *iLeakStuff*
> 
> The hell does this mean? CPU overhead involves the CPU and not the GPU.
> How is the CPU part of any videodriver?


Search draw calls and dx11 driver thread.

In general when a gpu calls the cpu for the rendering the API does most of the work. Now iirc amd drivers can do 1 queue per time. Nvidia can do 3 or 4.

That is a different bottleneck than a pure cpu bottleneck. In general you will never see a bottleneck because your cpu is slow.

Most of the times the dx11 driver thread is too slow and it use at max 1 and a half core. Nvidia optimize this thread better than amd thats why they have less cpu overhead. One of the problems with amd drivers is the slow shader compiler.


----------



## Cyro999

Quote:


> The most important things that become apparent reading these benchmarks is that AMD's DX11 drivers are terrible (DX12 are good) and their CPU's are weak though that weakness is being made up by DX12 being threaded.
> 
> Now I can't wait for the SC2 brigade to come in and claim that RTS can't be multi threaded


Typically there is a lot of difficulty making RTS multi-threaded. The game devs put a huge focus on it here and a Haswell i3 is still faster than an fx8370 under dx12.
Quote:


> fx6300 ~28fps
> fx8370 ~32fps
> i3 4330 ~37fps
> i7 6700k ~ 72fps


^from Ashes of the Singularity dx12 alpha.

sc2 is not well optimized, it has more than a few engine issues. It's only one example on a list of RTS games, though.
Quote:


> That is a different bottleneck than a pure cpu bottleneck. In general you will never see a bottleneck because your cpu is slow.
> 
> Most of the times the dx11 driver thread is too slow and it use at max 1 and a half core. Nvidia optimize this thread better than amd thats why they have less cpu overhead. One of the problems with amd drivers is the slow shader compiler.


If there's too much CPU time being spent you have a CPU bottleneck no matter if it comes from a slow CPU or inefficient code/drivers/api - and on that note, i'm more concerned about why AMD never fixed or adressed the problem than what the problem specifically is. Their dx11 drivers are clearly inferior and not by a small amount in a way that affects literally millions of gamers in a notable way.


----------



## Kuivamaa

Quote:


> Originally Posted by *Cyro999*
> 
> If there's too much CPU time being spent you have a CPU bottleneck no matter if it comes from a slow CPU or inefficient code/drivers/api - and on that note, i'm more concerned about why AMD never fixed or adressed the problem than what the problem specifically is. Their dx11 drivers are clearly inferior and not by a small amount in a way that affects literally millions of gamers in a notable way.


No, it is different when the CPU is weak and gets overwhelmed by the computational load than when there are software limitations, or I/O bottleneck.


----------



## Ganf

Quote:


> Originally Posted by *iLeakStuff*
> 
> The hell does this mean? CPU overhead involves the CPU and not the GPU.
> How is the CPU part of any videodriver?


I can't even fathom how this comment came to exist.







Short story?

Realllly short story?

The GPU doesn't fart without asking the CPU what it should smell like.


----------



## Cyro999

Quote:


> Originally Posted by *Kuivamaa*
> 
> No, it is different when the CPU is weak and gets overwhelmed by the computational load than when there are software limitations, or I/O bottleneck.


Everything is either waiting for the CPU because the CPU is weak or waiting for the CPU because the software is bad and loading it too much, it doesn't make much difference to the end user in practice. If it's somebody elses game/code/drivers then they can't fix it and they'll get performance gains from a stronger CPU or losses from a weaker one.

There shouldn't be a scenario where one side has such a huge advantage for years. AMD is literally ignoring tweets and posts about the issue in the hope that nobody else notices it and people continue buying their products without them having to put in the work to fix the problem.


----------



## escksu

Quote:


> Originally Posted by *Olivon*
> 
> What did you expect ? Oxyde work closer with AMD and was beyond Mantle demos too.
> For the moment, this DX12 demo means absolutely nothing.


Who cares? All I care is that I see that 390x is faster than GTX980 and Fury X beats 980ti. Thats all it matters. I am expecting the gap in CF vs SLI to be even wider for Fury X because Fury X scales very well in CF under DX11. I expect to see the same trend in DX12.

I can't help to wonder who some Nvidia fans feel butt hurt to see their favourite brand losing out in a benchmark battle...lol..


----------



## escksu

Btw one more thing to take note, since DX12 supports SFR, this means a pair of Fury X will have 8GB of RAM.


----------



## sugarhell

Quote:


> Originally Posted by *escksu*
> 
> Btw one more thing to take note, since DX12 supports SFR, this means a pair of Fury X will have 8GB of RAM.


SFR is older than AFR. And no it doesnt stack vram.

Look SFR with mantle with CIV.


----------



## Cyro999

Quote:


> Who cares? All I care is that I see that 390x is faster than GTX980 and Fury X beats 980ti. Thats all it matters. I am expecting the gap in CF vs SLI to be even wider for Fury X because Fury X scales very well in CF under DX11. I expect to see the same trend in DX12.


Performance of the driver, API and CPU under dx12 are almost entirely unrelated to multi-GPU scaling.

By your logic, a gtx750ti beats a Fury X - because it does in those dx11 games when CPU bound where the driver performance difference between nvidia and AMD comes into play.


----------



## Kuivamaa

Quote:


> Originally Posted by *Cyro999*
> 
> Everything is either waiting for the CPU because the CPU is weak or waiting for the CPU because the software is bad and loading it too much, it doesn't make much difference to the end user in practice. If it's somebody elses game/code/drivers then they can't fix it and they'll get performance gains from a stronger CPU or losses from a weaker one.
> 
> There shouldn't be a scenario where one side has such a huge advantage for years. AMD is literally ignoring tweets and posts about the issue in the hope that nobody else notices it and people continue buying their products without them having to put in the work to fix the problem.


Still,these are different situations with different roots and different conditions apply. That's why the DX11 overhead is not prominent everywhere (eg Radeons work fine in Witcher 3,they get destroyed at WoW and they trounce GeForces at CS:GO).


----------



## ChronoBodi

Hm, just how close to the metal is DX12, compared to consoles? I'm curious.


----------



## Silent Scone

Quote:


> Originally Posted by *ChronoBodi*
> 
> Hm, just how close to the metal is DX12, compared to consoles? I'm curious.


Typically the comparison can't be made directly due to the differences in hardware. XBONE's derived development kit and API closely resemble that of D3D however the hardware is very different being that it's an APU with multiple different streaming capabilities both visually and audibly. Basically dedicated platforms will likely always be closer 'to the metal' as the saying goes being that DirectX on the PC is hardware agnostic.


----------



## sugarhell

Quote:


> Originally Posted by *ChronoBodi*
> 
> Hm, just how close to the metal is DX12, compared to consoles? I'm curious.


Closer than dx11. But not as close as GNM(ps4 API).

But dx12 will never reach the optimizations of a console. Targeting the same specs vs targeting a huge amount of different specs


----------



## ZealotKi11er

Quote:


> Originally Posted by *sugarhell*
> 
> Closer than dx11. But not as close as GNM(ps4 API).
> 
> But dx12 will never reach the optimizations of a console. Targeting the same specs vs targeting a huge amount of different specs


And most console games will never reach same optimizations are in-house developers making console exclusives.


----------



## escksu

Quote:


> Originally Posted by *sugarhell*
> 
> Closer than dx11. But not as close as GNM(ps4 API).
> 
> But dx12 will never reach the optimizations of a console. Targeting the same specs vs targeting a huge amount of different specs


But I bet it will be enough for AMD cards to finally trounce Nvidia ones.


----------



## Wishmaker

Quote:


> Originally Posted by *escksu*
> 
> But I bet it will be enough for AMD cards to finally trounce Nvidia ones.


You been drinking the AMD kool-aid like a baws







. NVIDIA will just sit down and do nothing in the DX 12 era *rofl*. Funny how people make statements like the above.


----------



## Themisseble

Quote:


> Originally Posted by *Wishmaker*
> 
> You been drinking the AMD kool-aid like a baws
> 
> 
> 
> 
> 
> 
> 
> . NVIDIA will just sit down and do nothing in the DX 12 era *rofl*.


It will be good to see R9 290X vs GTX 780TI or 7970 vs GTX 680.


----------



## Cyro999

Quote:


> Originally Posted by *Themisseble*
> 
> It will be good to see R9 290X vs GTX 780TI or 7970 vs GTX 680.


For a test like this, the tier of GPU matters little. For GPU bound tests, there's a hundred different ways to test those GPU's that you listed already


----------



## orlfman

according to ashes of the singularity website, they are planning a vulkan, steam os port. will be interesting to see how vulkan compares to dx12 once its complete.


----------



## Themisseble

Quote:


> Originally Posted by *Cyro999*
> 
> For a test like this, the tier of GPU matters little. For GPU bound tests, there's a hundred different ways to test those GPU's that you listed already


Not that. I would like to see performanc eon Dx12 games and DX12 ONLY - with async shaders etc. Really want to know how well GTX 780Ti do against R9 290X - which was cheaper.

pcper.com is one of the worst site that make reviews like these.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Wishmaker*
> 
> You been drinking the AMD kool-aid like a baws
> 
> 
> 
> 
> 
> 
> 
> . NVIDIA will just sit down and do nothing in the DX 12 era *rofl*. Funny how people make statements like the above.


AMD did just that for the entirely of DX11. So why not have Nvidia do the same for DX12? After all we know this is a duoploy







so they should take turns and play nicely.


----------



## Redwoodz

Quote:


> Originally Posted by *DFroN*
> 
> Fury X and 980Ti neck and neck in the DX12 scores, compared to the Ti being 1.4x faster at 4K and 1.8x faster at 1080p in DX11.


1.4 to 1.8x faster?









Quote:


> Originally Posted by *PostalTwinkie*
> 
> Basically AMD sucks at developing drivers within the DX 11 environment; either by lack of ability or finances to fund it. Where as Nvidia is capable of providing, for whatever reason, the resources to really maximize DX 11.
> 
> EDIT:
> 
> Not sure anyone should be surprised by this.


Basically anyone that thinks AMD sucks at DX11 drivers better be complaining about Nvidia building such weak hardware,seeing as how AMD is able to build a better GPU with 1/10th the R&D budget and match Nvidia's "clearly" superior drivers.


----------



## PostalTwinkie

Quote:


> Originally Posted by *error-id10t*
> 
> Dude, pull your head out.
> 
> How can you twist Nvidia lose to a win, by ignoring fact and twisting logic. This shows AMD winning the competition by a mile, the first test. Why are they winning? Because DX11 performance was so poor they couldn't compete, enter DX12 and suddenly it's not their HW that's poor anymore, it's equal and/or better. Plenty more to come but right now Nvidia is losing.


Quote:


> Originally Posted by *error-id10t*
> 
> As in "right now".
> 
> When there are more benches, games, whatever we revisit the situation and see where everything stands. Nothing more or less.
> 
> The problem with this site is continuous bashing of AMD. It's never good news for them, everything is always twisted. It's like they stole everyone's baby and the nerds are up in arms trying to bring their baby back.
> 
> If you've lived under a rock and didn't know AMD drivers sucked under DX11, well shame on you, I can't help you. Right now I'm happy to see we may have competition.


Quote:


> Originally Posted by *error-id10t*
> 
> I don't get what part is hard to understand especially once you read the thread title?
> 
> Just so we're clear, I don't think I was putting Nvidia down here so not sure why anyone's pants should be in a twist. Anyway, this arguing is boring me so I'll leave it at that. Like I said, for me this is promising news but early days.


Yup, you barked up the wrong tree....

I would tell you to pull your head out, but it is apparently so lodged up there, no effort is going to remove it. Hopefully you have a window installed in your stomach so you can actually read and understand the words in front of you this time.

My entire conversation in this thread, until you decided to attack me, was a active discussion completely questioning everything about this entire article. It was an open minded and perfectly fine conversation, that is until you showed up. Myself and several others were simply discussing what would account for such extreme gains in certain scenarios. As some of these results are clearly questionable, given the known relationship between developer and AMD.

_Another_ member simply posited the theory that maybe Nvidia just didn't put any driver development effort in, and that is why in this one case AMD showed gains. My comment was simply that if the gains by AMD where simply driver related, that Nvidia would crush AMD once again when Nvidia developed their drivers. Again, all being simple ideas and questions we had been tossing around.

Quote:


> Originally Posted by *Redwoodz*
> 
> 1.4 to 1.8x faster?
> 
> 
> 
> 
> 
> 
> 
> 
> Basically anyone that thinks AMD sucks at DX11 drivers better be complaining about Nvidia building such weak hardware,seeing as how AMD is able to build a better GPU with 1/10th the R&D budget and match Nvidia's "clearly" superior drivers.


I complain about AMD drivers because I have AMD cards all over my house and in machines, and their drivers generally suck. While they haven't pulled an Nvidia, and flat killed a few cards, in a long time, AMD drivers still suck generally.

But, this conversation is over-rated and been hashed out a million times. They both could have far better drivers, neither one of them are perfect. The only reason drivers came up in this thread is via another member, but you can read above your quote for that conversation.


----------



## DFroN

Quote:


> Originally Posted by *Redwoodz*
> 
> *1.4 to 1.8x faster?*
> 
> 
> 
> 
> 
> 
> 
> 
> Basically anyone that thinks AMD sucks at DX11 drivers better be complaining about Nvidia building such weak hardware,seeing as how AMD is able to build a better GPU with 1/10th the R&D budget and match Nvidia's "clearly" superior drivers.


Yes from the ExtremeTech article in the OP, which was the only article linked to at the time of my post:

Quote:


> Without antialiasing enabled, Nvidia's GTX 980 Ti is 1.42x faster than AMD in 4K and 1.78x faster in 1080p


Please read the articles in question before using rolleyes


----------



## CrazyHeaven

So a game like civ would do great with dx12? But for games like witcher there isn't going to be much difference? Well even the Witcher still relies heavily on the cpu.

Is it true that unity games are cpu depended? I know with dreamworld chapters I didn't get any noticeable difference going from a 580 ti to a 970.


----------



## Ganf

Quote:


> Originally Posted by *CrazyHeaven*
> 
> So a game like civ would do great with dx12? But for games like witcher there isn't going to be much difference? Well even the Witcher still relies heavily on the cpu.
> 
> Is it true that unity games are cpu depended? I know with dreamworld chapters I didn't get any noticeable difference going from a 580 ti to a 970.


Not entirely, there's some new tricks like making the light modeling more CPU-dependent and taking other workloads off of the GPU to free up resources that we'll start to see used more effectively in other games, and the rasterization tricks they've got should see a fair bump in the average poly counts and reduce the need for AA.

Lots of tricks come with DX12 that don't lend to the development of an RTS.


----------



## delboy67

Haven't got into an rts game in years might get this for the amount of units it can use, hope it turns out a good game. The benchmarks are interesting but understandable, nvidia had an advantage in driver cpu optimization but now there's a lot less to optimize, much lower driver overhead so they wont have that advantage anymore. amd have played a blinder I don't think we would have seen this without gcn/x86 consoles, well played and thanks for not gimping my nvidia card in the process







although I dont think this will put amd ahead they will trade blows as usual depending on the game.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Ganf*
> 
> Not entirely, there's some new tricks like making the light modeling more CPU-dependent and taking other workloads off of the GPU to free up resources that we'll start to see used more effectively in other games, and the rasterization tricks they've got should see a fair bump in the average poly counts and reduce the need for AA.
> 
> Lots of tricks come with DX12 that don't lend to the development of an RTS.


This is good news. Loving the big push onto drivers and software for gains rather than hardware.

Now if someone can find a way to get my cpu gpu to do something. Physic x or something related.


----------



## Themisseble

Also MMOs should take huge advantages of DX12... can you image flawless battle 100vs100player in MMOs? or 200vs200? Or planetside 2/3 on new DX12 game engine...


----------



## pengs

Well it's also doing simulations, gravity, AI, trajectory all that calculative stuff which most likely doesn't fall under the low-level and draw call umbrella but still affected by the multithreading enhancements (which only can go so far). In that case you're probably seeing some artificial limitation where the simulation can't do the math quick enough to allow the use of the almost limitless draw calls and where IPC still matters quite a bit (especially when it's backed up by a virtual core), from the low end AMD CPU's to the high end Intel there seems to be a hard stop - it's either a CPU limit with these benchmarks or the GPU limit. I have no doubt that 12's draw call potential is almost limitless however.

I doubt the physics part of the simulation is going to be affected by the API's overhead reduction and multithreading enhancements as much the object, graphical and draw-call intensive part of the scene. When you strip the game of being a... game







and running simulations you get the 



 which shows the potential of Vulcan and DX12 when it comes to the amount of objects it can display.


----------



## Ganf

Quote:


> Originally Posted by *CrazyHeaven*
> 
> This is good news. Loving the big push onto drivers and software for gains rather than hardware.
> 
> Now if someone can find a way to get my cpu gpu to do something. Physic x or something related.


This is where you'll love DX12.

There has been talk of cross-gpu compatibility or whatever you want to call it, basically using GPU's of different brands and varieties simultaneously. You can expect developers to take advantage of your iGPU with this because it's one of the best tools they've been given to bring up minimum framerates on low spec machines in at least a decade. Even if developers never touch Nvidia/AMD cross compatibility, there are sure to be tons of games that integrate AMD/Nvidia to iGPU compatibility.


----------



## Cyro999

Unfortunately a lot of the beneficial stuff that you could do by using dedicated GPU alongside iGPU come with drastically increased latency that would make most tech enthusiasts not want to use it. If you had something like iris pro and a gtx750 and don't care about the latency, it's much more helpful


----------



## iLeakStuff

Quote:


> Originally Posted by *sugarhell*
> 
> Search draw calls and dx11 driver thread.
> 
> In general when a gpu calls the cpu for the rendering the API does most of the work. Now iirc amd drivers can do 1 queue per time. Nvidia can do 3 or 4.
> 
> That is a different bottleneck than a pure cpu bottleneck. In general you will never see a bottleneck because your cpu is slow.
> 
> Most of the times the dx11 driver thread is too slow and it use at max 1 and a half core. Nvidia optimize this thread better than amd thats why they have less cpu overhead. One of the problems with amd drivers is the slow shader compiler.


So several questions pop up in my head:

- Why haven`t AMD made their DX11 driver much better when DX12 makes in painfully revealing that AMD had a sucky DX11 driver?
- Why shouldn`t Nvidia surpass AMD in DX12 drivers if Nvidia DX11 drivers was better than AMD`s DX11 drivers?

Sorry for these stupid questions, but I find it very odd that Nvidia optimize threads/drivers better and suddenly everyone praise AMD for this test when it seems that AMD had worse DX11 optimizations all along. You know, with the DirectX thats been here for many years now and have been used by many AMD cards....


----------



## sugarhell

Quote:


> Originally Posted by *iLeakStuff*
> 
> So several questions pop up in my head:
> 
> - Why haven`t AMD made their DX11 driver much better when DX12 makes in painfully revealing that AMD had a sucky DX11 driver?
> - Why shouldn`t Nvidia surpass AMD in DX12 drivers if Nvidia DX11 drivers was better than AMD`s DX11 drivers?
> 
> Sorry for these stupid questions, but I find it very odd that Nvidia optimize threads/drivers better and suddenly everyone praise AMD for this test when it seems that AMD had worse DX11 optimizations. You know, with the DirectX thats been here for many years now and have been used by many AMD cards....


Its easy to say that something sucks. But amd and nvidia drivers are not even close to suck.

For example a game has million lines of code. And amd,nvidia needs to check,analyze and predict all this code. We talk about an insane amount of work for a single game. And its a job that only they can do because their software engineer have the proper knowledge. Now if we calculate the amount of work,the amount of people that you need ,the amount of knowledge and dx11 being a pain to debugg this means a huge amount of money. Why do you think intel never joined gaming? Because the cost and the knowledge software side is insane.

Both are good just nvidia a lot better. More after they dropped active support for older series on their drivers so they could optimized better for fermi+.

In the end its not even their work,fixing broking games line by line costing them money etc etc. Its good that both amd and nvidia has such a good driver teams so actually we have working games.

Now if we think about a moment why IHVs spend all the money to fix broken games instead of adding more features or provide more stable drivers? Hope that with dx12 devs will actually manage to optimize their games without too much help from IHVs.

1.Now why amd cant optimize more their drivers? Probably money and time.and small driver team
2.Because now devs optimize their code instead each IHVs optimize the game inside their drivers. So you can see an average hardware agnostic API at least on the cpu performance.


----------



## geoxile

Quote:


> Originally Posted by *iLeakStuff*
> 
> So several questions pop up in my head:
> 
> - Why haven`t AMD made their DX11 driver much better when DX12 makes in painfully revealing that AMD had a sucky DX11 driver?
> - Why shouldn`t Nvidia surpass AMD in DX12 drivers if Nvidia DX11 drivers was better than AMD`s DX11 drivers?
> 
> Sorry for these stupid questions, but I find it very odd that Nvidia optimize threads/drivers better and suddenly everyone praise AMD for this test when it seems that AMD had worse DX11 optimizations all along. You know, with the DirectX thats been here for many years now and have been used by many AMD cards....


http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/#entry5215019

Read this post

tl;dr

DX12, because it is "closer to metal", gives more control to devs, but this also means it forces more work on them. That means AMD and Nvidia drivers have less of an impact on DX12 games

With DX11, most games were "shipped broken" and then fixed up by Nvidia and AMD via drivers. With DX12, Nvidia and AMD drivers have less control. Maybe Nvidia DX12 drivers will be better than AMD's DX12 drivers, but the total amount of influence the drivers have over DX12-based renderers is much smaller compared to DX11, and hence DX12 drivers should affect renderer performance less than with DX11.


----------



## Anarion

Quote:


> Originally Posted by *geoxile*
> 
> http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/#entry5215019
> 
> Read this post
> 
> tl;dr
> 
> DX12, because it is "closer to metal", gives more control to devs, but this also means it forces more work on them. That means AMD and Nvidia drivers have less of an impact on DX12 games
> 
> With DX11, most games were "shipped broken" and then fixed up by Nvidia and AMD via drivers. With DX12, Nvidia and AMD drivers have less control. Maybe Nvidia DX12 drivers will be better than AMD's DX12 drivers, but the total amount of influence the drivers have over DX12-based renderers is much smaller compared to DX11, and hence DX12 drivers should affect renderer performance less than with DX11.


Most games are broken ports already. I think we gonna have to wait to see some really good DX12 games but they gonna be very few judging for the current situation with stupid frame capped or chopped console ports.


----------



## geoxile

Quote:


> Originally Posted by *Anarion*
> 
> Most games are broken ports already. I think we gonna have to wait to see some really good DX12 games but they gonna be very few judging for the current situation with stupid frame capped or chopped console ports.


He's not talking about just broken in general, but specifically renderers not implementing the API properly, so AMD and Nvidia apply game-specific fixes to compensate.


----------



## Faithh

Quote:


> Originally Posted by *iLeakStuff*
> 
> So several questions pop up in my head:
> 
> - Why haven`t AMD made their DX11 driver much better when DX12 makes in painfully revealing that AMD had a sucky DX11 driver?
> - Why shouldn`t Nvidia surpass AMD in DX12 drivers if Nvidia DX11 drivers was better than AMD`s DX11 drivers?
> 
> Sorry for these stupid questions, but I find it very odd that Nvidia optimize threads/drivers better and suddenly everyone praise AMD for this test when it seems that AMD had worse DX11 optimizations all along. You know, with the DirectX thats been here for many years now and have been used by many AMD cards....


Obviously because; "why not do it now while we are making our DX12 drivers"? It would take AMD at least a few months to fix their DX11 drivers, but hey we're almost waiting 2 years for AMD to do this. The issue has been known since Nvidia responded to Mantle with their 337.50 drivers that reduced CPU overhead by a significant amount. http://pclab.pl/art57235-3.html

Not a lot of people knew this and it's not like a youtube celeb or a reputable source like Anandtech bothered to test it.

We never seen AMD mentioning anything related driver CPU overhead, until they decided to show-off their DX12 performance.



Anyways, seems like AMD has a serious DX12 CPU advantage;



Roughly 20%.


----------



## pengs

Quote:


> Originally Posted by *geoxile*
> 
> http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/#entry5215019
> 
> Read this post
> 
> tl;dr
> 
> DX12, because it is "closer to metal", gives more control to devs, but this also means it forces more work on them. That means AMD and Nvidia drivers have less of an impact on DX12 games
> 
> With DX11, most games were "shipped broken" and then fixed up by Nvidia and AMD via drivers. With DX12, Nvidia and AMD drivers have less control. Maybe Nvidia DX12 drivers will be better than AMD's DX12 drivers, but the total amount of influence the drivers have over DX12-based renderers is much smaller compared to DX11, and hence DX12 drivers should affect renderer performance less than with DX11.


Right, and then it becomes about the engine it's running on or the developers use of assets. There have been a few articles with developers who portrait DX12 to be quite an undertaking but worth the effort as it's a foundation laying process at the engine level which a game can be built on top. Epic has been working on implementing it for quite a while now.


----------



## Ganf

Quote:


> Originally Posted by *Cyro999*
> 
> Unfortunately a lot of the beneficial stuff that you could do by using dedicated GPU alongside iGPU come with drastically increased latency that would make most tech enthusiasts not want to use it. If you had something like iris pro and a gtx750 and don't care about the latency, it's much more helpful


No they don't. Benefits include things like using the iGPU to run physics as opposed to the CPU. Doing physics on the iGPU incurs no more latency than doing it with the CPU. AI can be shunted from the CPU to the iGPU, with the same result. Background simulations, certain types of post-processing, asset prep for storage in the dGPU's VRAM, etc...


----------



## Redwoodz

Quote:


> Originally Posted by *DFroN*
> 
> Yes from the ExtremeTech article in the OP, which was the only article linked to at the time of my post:
> 
> Please read the articles in question before using rolleyes


Please read the whole article and don't just cherry pick your comments to fit your agenda.
Quote:


>


So CLEARLY Nvidia just have crappy drivers for DX12, while everybody screams "Oh my god this proves AMD's DX11 driver suck!" Quite the opposite,Nvidia just has crappy drivers,the dev even states so.


----------



## HeavyUser

Quote:


> Originally Posted by *Redwoodz*
> 
> So CLEARLY Nvidia just have crappy drivers for DX12, while everybody screams "Oh my god this proves AMD's DX11 driver suck!" Quite the opposite,Nvidia just has crappy drivers,the dev even states so.


Um, AMD drivers do suck though, where have you been hiding?


----------



## Ganf

Quote:


> Originally Posted by *HeavyUser*
> 
> Um, AMD drivers do suck though, where have you been hiding?


Not for a couple years now, where have you been hiding?


----------



## PostalTwinkie

Quote:


> Originally Posted by *Ganf*
> 
> *Not for a couple years now*, where have you been hiding?


My 7970 Tri-Fire from less than two years ago would like to have a word with you. FreeSync and Crossfire would love to have a word with you, but they where JUST fixed.

If anything, AMD has being doing better in the last several months, coming out of years of terrible drivers.


----------



## DFroN

Quote:


> Originally Posted by *Redwoodz*
> 
> Please read the whole article and don't just cherry pick your comments to fit your agenda.
> 
> So CLEARLY Nvidia just have crappy drivers for DX12, while everybody screams "Oh my god this proves AMD's DX11 driver suck!" Quite the opposite,Nvidia just has crappy drivers,the dev even states so.


I have no idea what 'agenda' you think I have. I submitted one single sentence post with an observation that AMD is neck and neck in DX12 but behind in DX11 according to the article. That could be taken either way as a plus or minus for either team. I have no opinion, slant or bias as to why that is. Not every post has to have some underlying blinkered agenda. If you disagree with my observation and have a point to make then make it without sarcasm and accusations of cherry picking and agenda which only serves to bait fanboysim.


----------



## Redwoodz

Quote:


> Originally Posted by *DFroN*
> 
> I have no idea what 'agenda' you think I have. I submitted one single sentence post with an observation that AMD is neck and neck in *DX11* but *Nvidia is* behind in *DX12* according to the article. That could be taken either way as a plus or minus for either team. I have no opinion, slant or bias as to why that is. Not every post has to have some underlying blinkered agenda. If you disagree with my observation and have a point to make then make it without sarcasm and accusations of cherry picking and agenda which only serves to bait fanboysim.


See what I did there?


----------



## ToTheSun!

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Ganf*
> 
> *Not for a couple years now*, where have you been hiding?
> 
> 
> 
> My 7970 Tri-Fire from less than two years ago would like to have a word with you. FreeSync and Crossfire would love to have a word with you, but they where JUST fixed.
> 
> If anything, AMD has being doing better in the last several months, coming out of years of terrible drivers.
Click to expand...

My 7970 has weird framerate dips in old games that should run better on a potato.


----------



## PontiacGTX

Quote:


> Originally Posted by *ToTheSun!*
> 
> My 7970 has weird framerate dips in old games that should run better on a potato.


example?


----------



## Ganf

Quote:


> Originally Posted by *PostalTwinkie*
> 
> My 7970 Tri-Fire from less than two years ago would like to have a word with you. FreeSync and Crossfire would love to have a word with you, but they where JUST fixed.
> 
> If anything, AMD has being doing better in the last several months, coming out of years of terrible drivers.


Tri-SLI owners have their own woes. 3 card setups have been the bane of enthusiasts since their inception, you either do 2 or 4, 3 is no-man's-land.
Quote:


> Originally Posted by *ToTheSun!*
> 
> My 7970 has weird framerate dips in old games that should run better on a potato.


Can't say I've experienced the same on my 7970, and I used to play all kinds of weird old crap on that.


----------



## Cyro999

Quote:


> Originally Posted by *Ganf*
> 
> No they don't. Benefits include things like using the iGPU to run physics as opposed to the CPU. Doing physics on the iGPU incurs no more latency than doing it with the CPU. AI can be shunted from the CPU to the iGPU, with the same result. Background simulations, certain types of post-processing, asset prep for storage in the dGPU's VRAM, etc...


The only stuff that's actually been talked about and demonstrated in a game is situations like doing X part of rendering (like post processing or a HUD) on the iGPU. As a result you can get FPS improvements at the cost of a lot of latency.

There's a lot more you can do in theory, but it's still in theory and it has been for forever now


----------



## Cyro999

Quote:


> So CLEARLY Nvidia just have crappy drivers for DX12, while everybody screams "Oh my god this proves AMD's DX11 driver suck!" Quite the opposite


Once again:










+% shows performance lead to nvidia. Green is directx11. Clear trend of nvidia having HUGE performance lead under dx11 under any system.

-% shows performance lead to AMD. Clear trend of having a bit better performance under dx12, particularly when using 5 threads or more.

The dx11 advantage for nvidia is way bigger and affects orders of magnitude more popular games at the moment. You can't buy an AMD card to play those games with competitive performance, yet if you bought an nvidia card to play this game, you'd be missing out on probably 5 or 10% performance when CPU bound. Not +50% which has been the reality for the last 16 months for Nvidia vs AMD graphics on dx11 in many games.

And before anybody else plays the "hey nvidia fangirl" card, i'm using this GPU as a result of the performance difference there and lack of response from AMD - it's not some made up issue to justify a purchase. Will happily use whatever performs best from either side. There are so many people that will get mad on forums or downvote because they don't like to see the truth.

On top of that, +50% for over 3 months is a hugely concerning issue. +5-10% when dealing with alpha software and early drivers is really not. It's a small lead that could be important in the future, but it won't make you sell your 980ti because an r9 270 would give 50% more FPS in that new game you play all the time whenever your CPU was stressed - that is a huge, huge problem that is absolutely real right now.


----------



## maltamonk

^There's quite a price discrepancy between those cards.


----------



## Cyro999

Quote:


> Originally Posted by *maltamonk*
> 
> ^There's quite a price discrepancy between those cards.


Then use a 970? It doesn't matter; the differences are in API performance, they're seen primarily when your GPU isn't fully loaded so performance differences between GPU's wouldn't exist. 960 and 980ti have the same min FPS in WoW, and it's still 1.5x higher than a Fury X.

API/driver performance is a very different thing to the performance of the GPU itself, though it's different because you're working with different brands that have completely different drivers and even varying levels of API support.


----------



## iRev_olution

i wonder if it's time to SLI my 980 and get a new monitor. DX12 looks promising. I wonder if existing AAA titles will get a DX12 patch.


----------



## iRev_olution

Quote:


> Originally Posted by *Ganf*
> 
> Tri-SLI owners have their own woes. 3 card setups have been the bane of enthusiasts since their inception, you either do 2 or 4, 3 is no-man's-land.
> Can't say I've experienced the same on my 7970, and I used to play all kinds of weird old crap on that.


i kinda disagree with that. SLI 4 cards will get massive bottlenecks from the CPU. I think you see more 2 and 3 card then 4 that's for sure.


----------



## Clocknut

No dX9-11 overhead optimization, no money from me to AMD. Even if u sell Fury X at GTX960 price, I would still pick 960. Pretty simple.

Majority of gamers are still playing DX9-11 games, what wrong with AMD. Cant they see the importance of this?


----------



## p4inkill3r

Quote:


> Originally Posted by *Clocknut*
> 
> No dX9-11 overhead optimization, no money from me to AMD. Even if u sell Fury X at GTX960 price, I would still pick 960. Pretty simple.
> 
> Majority of gamers are still playing DX9-11 games, what wrong with AMD. Cant they see the importance of this?


I think you've established that.


----------



## maltamonk

Quote:


> Originally Posted by *Cyro999*
> 
> Then use a 970? It doesn't matter; the differences are in API performance, they're seen primarily when your GPU isn't fully loaded so performance differences between GPU's wouldn't exist. 960 and 980ti have the same min FPS in WoW, and it's still 1.5x higher than a Fury X.
> 
> API/driver performance is a very different thing to the performance of the GPU itself, though it's different because you're working with different brands that have completely different drivers and even varying levels of API support.


I understand there's a difference between api's, but I also understand there's a difference in cards of certain prices. Comparing cards of significant different prices will most always result in expected variances.

That performance level, which is associated with price will be reflected in api/driver benchmarks. The only place it doesn't is when one variable is limited to which all results are the same.

You could bench a 960 vs 390x on this and expect the 390x to be ahead on both dx11 and 12 since they are difference level cards reflected in their prices. That's one of the reasons we can't compare the two to support any conclusions on which which brand does this or which brand does that.


----------



## Ganf

Quote:


> Originally Posted by *Cyro999*
> 
> The only stuff that's actually been talked about and demonstrated in a game is situations like doing X part of rendering (like post processing or a HUD) on the iGPU. As a result you can get FPS improvements at the cost of a lot of latency.
> 
> There's a lot more you can do in theory, but it's still in theory and it has been for forever now


That theory is being put into practice by DX12?

Really though, it's going to come down to which is more cost effective. Companies are going to want the best bling for their buck, if it takes more time to optimize than it does to implement iGPU utilization, we're going to get iGPU utilization.


----------



## error-id10t

Quote:


> Originally Posted by *PostalTwinkie*
> 
> My entire conversation in this thread, until you decided to attack me, was a active discussion completely questioning everything about this entire article. It was an open minded and perfectly fine conversation, that is until you showed up. Myself and several others were simply discussing what would account for such extreme gains in certain scenarios. As some of these results are clearly questionable, given the known relationship between developer and AMD.


Nobody is attacking anyone and if you feel I did, ping a mod. You better edit your post too if you're so sensitive though.

I almost bothered to point out the initial post you made about how these findings were very scary for AMD which was your "conversation starter" but I can't be bothered. Again, here, you've stated you're questioning everything about this topic because AMD has made a gain which you're not happy about for some reason.

Like I said previously, it's possible the situation is reversed in the next test, bench, drivers. I'm not bleeding one camps blood so I don't really care beyond what I've said already which has been Vendor agnostic. If you feel the need to respond to this, that's fine but from my end I'll end it here.


----------



## Liranan

Quote:


> Originally Posted by *Themisseble*
> 
> Also MMOs should take huge advantages of DX12... can you image flawless battle 100vs100player in MMOs? or 200vs200? Or planetside 2/3 on new DX12 game engine...


EVE takes your 200 vs 200 and scoffs at it with the slaughter at B-R; 4000 people battling for hours with 250 thousand USD worth in items destroyed.

While client side you might get a boost in fps server side is a totally different story. It's so hard to do that CCP have had to use heavily overclocked single core CPU's to be able to run massive hundred man battles in EVE Online. When thousands of people start shooting at each other (not rare) no matter what they do the server starts crawling. Hopefully in the future these kind of loads can be made fully threaded so that these battles can be run smoothly as it would definitely affect the rest of software and gaming in general.

EVE is an amazing game but sadly no time for it any longer.

The bloodbath of B-R.


----------



## Cyro999

Quote:


> You could bench a 960 vs 390x on this and expect the 390x to be ahead on both dx11 and 12 since they are difference level cards reflected in their prices.


If the CPU frametime was always longer than the GPU frametime in both cases, there would be no difference between GPU's twice as powerful as eachother.

The weak one would be waiting for 10% of the time before working on a new frame while the strong one would be waiting for over half of the time.

There's plenty of examples out there. If the weaker GPU is always faster than the CPU frametime, it'll perform equally to the stronger one. API performance differences primarily show up when the CPU frame time is longer than GPU frametime at least some of the time, but they show up the most when you're entirely bound by the CPU.


----------



## MonarchX

In the end, it will be half way through 2016 before DirectX 12 utilization begins. Are there many upcoming games built off DirectX 12 scheduled for release in 2015? Next Assassin's Creed? No. Next StarCraft 2? No. Any other games that will be released in 2015? No!

FORGET DirectX 12 for now! Benchmark it when it all drivers are RIPE and optimized, not some early drivers that support DirectX 12 JUST because they need to work on Windows 10!


----------



## Evil Penguin

Just something that has been irking me reading this thread...

It's not right to call Vulkan/DX12 a low-level API.
They just happen to be a much better abstraction of modern GPU architectures than their predecessors.
Much of the GPU overhead came from error checking and poor abstraction of modern hardware along with being CPU limited (single-threaded).

It should also be noted that much less work goes into the drivers than say the game engine with these new APIs.
Game devs have far more control over the GPU now and they are much less dependent on driver optimizations (hacks, basically).


----------



## ZealotKi11er

Quote:


> Originally Posted by *Evil Penguin*
> 
> Just something that has been irking me reading this thread...
> 
> It's not right to call Vulkan/DX12 a low-level API.
> They just happen to be a much better abstraction of modern GPU architectures than their predecessors.
> Much of the GPU overhead came from error checking and poor abstraction of modern hardware along with being CPU limited (single-threaded).
> 
> It should also be noted that much less work goes into the drivers than say the game engine with these new APIs.
> Game devs have far more control over the GPU now and they are much less dependent on driver optimizations (hacks, basically).


They are lower-level API. Even consoles these days dont seem as close to metal as they used to be considering a PC with similar specs as a current console is very close to their performance while in the past you had to have a PC with 2-3x the specs to match consoles.


----------



## Blameless

Quote:


> Originally Posted by *Ha-Nocri*
> 
> ut to more than 2x faster compared to dx11, 390x. Very nice


I'm 25% impressed with DX12's improvement, 75% disappointed in AMD's DX11 overhead.
Quote:


> Originally Posted by *CasualCat*
> 
> Well I hope this is a poor example of DX12 then otherwise it seems pretty underwhelming.
> 
> Net result is Nvidia gains little over DX11.


Why would NVIDIA's gains be as dramatic? They already have low enough overhead for it not to be a significant factor in most titles with capable CPUs.
Quote:


> Originally Posted by *Redwoodz*
> 
> So CLEARLY Nvidia just have crappy drivers for DX12, while everybody screams "Oh my god this proves AMD's DX11 driver suck!" Quite the opposite,Nvidia just has crappy drivers,the dev even states so.


Except that most DX11 tests suggest otherwise.

AMD typically has much greater DX11 overhead than NVIDIA.


----------



## GorillaSceptre

Considering that most of the gains are because of AMD's poor DX11 overhead, the performance difference between 11 and 12 is pretty underwhelming tbh.

Was this game built for DX12 or was it updated from 11?


----------



## escksu

Quote:


> Originally Posted by *HeavyUser*
> 
> Um, AMD drivers do suck though, where have you been hiding?


so Nvidia drivers have been perfect? Then why am I reading complains about Nvidia drivers?


----------



## mtcn77

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Considering that most of the gains are because of AMD's poor DX11 overhead, the performance difference between 11 and 12 is pretty underwhelming tbh.
> 
> Was this game built for DX12 or was it updated from 11?


I couldn't find it right now, but he said it would have been faster if they had gone purely on the Directx 12 Asynchronous shaders path.


----------



## Liranan

Quote:


> Originally Posted by *Themisseble*
> 
> Also MMOs should take huge advantages of DX12... can you image flawless battle 100vs100player in MMOs? or 200vs200? Or planetside 2/3 on new DX12 game engine...


Both AMD and nVidia drivers have their bugs and problems. The main difference is that AMD have much higher overhead in DX11 than nVidia, something that can't be denied.
Quote:


> Originally Posted by *mtcn77*
> 
> I couldn't find it right now, but he said it would have been faster if they had gone purely on the Directx 12 Asynchronous shaders path.


Let's hope that DX12 and Vulkan get adopted much faster than DX10 and 11 have been. It's taken years for us to finally have DX10 only games and DX11 is still just a joke.


----------



## mtcn77

He says it here - I guess I mistakenly presumed it would run even faster, but maybe it was on another source:
Quote:


> Whereas on the PC we have Ashes of the Singularity. It is a game that's been optimized for DirectX 11 and updated for DirectX 12, and you can run them side by side on the same hardware and get a 70% boost on DirectX 12 over DirectX 11.
> Read more at http://gamingbolt.com/interview-with-brad-wardell-ps4xbox-one-differences-directx-12-ashes-of-the-singularity-and-more#4UrxAUZOZVdibDip.99


----------



## Clocknut

Quote:


> Originally Posted by *Liranan*
> 
> Let's hope that DX12 and Vulkan get adopted much faster than DX10 and 11 have been. It's taken years for us to finally have DX10 only games and DX11 is still just a joke.


We are still majority DX9. I think DX12 will need another 5-8yrs to reach majority share.


----------



## Defoler

[/quote]
Quote:


> Originally Posted by *sugarhell*
> 
> Anything with dx11_x feature level support 100% the dx12.


It's like saying that everything with a wheel is considered a F1 competitor race car


----------



## Defoler

Quote:


> Originally Posted by *Themisseble*
> 
> Also MMOs should take huge advantages of DX12... can you image flawless battle 100vs100player in MMOs? or 200vs200? Or planetside 2/3 on new DX12 game engine...


MMO is still server side bound, not client side. Because all the calculations, the movements, the actions people do, has to operate under the server and the client only gets the results. So the client will not be able to fully utilise the calculation ability of DX12.


----------



## Wishmaker

Quote:


> Originally Posted by *Defoler*


It's like saying that everything with a wheel is considered a F1 competitor race car







[/quote]

...but it is!!!!!!


----------



## Cyro999

Quote:


> Originally Posted by *Defoler*
> 
> MMO is still server side bound, not client side. Because all the calculations, the movements, the actions people do, has to operate under the server and the client only gets the results. So the client will not be able to fully utilise the calculation ability of DX12.


Planetside 2 and many other MMO's are still bound heavily by client side CPU work. Hell, there were already very significant performance gains shown from nvidia's dx11 cpu optimization.


----------



## darealist

Meh. Dx12 or not this game looks hideous and boring.


----------



## jologskyblues

I wouldn't hesitate to switch over to the red team if they can keep up this advantage when DX12 games become the norm.

I don't think Nvidia will sit around and do nothing though. They're probably working on improving their DX12 performance as we speak.


----------



## sugarhell

Quote:


> Originally Posted by *Defoler*
> 
> It's like saying that everything with a wheel is considered a F1 competitor race car


Stupid analogy that adds nothing to my comment, just trying to prove your wrong idea.


----------



## Olivon

People seems to have really short memories.
When Mantle was out, Oxide provided Star Swarn demo for Mantle.
At start, nVidia performance was plain awfull and it was an advocacy (or an ad as you want) for Mantle.
And what happens next ? nVidia just showed that they can beat Mantle with DX11 API and a new driver, they did the same for BF4 too.

http://www.tomshardware.com/news/nvidia-geforce-337.50-driver-benchmarks,26473.html

This Oxide DX12 benchmark means nothing.


----------



## Silent Scone

It's like saying rabbits eat lettuce... (am I doing this right?







)

People seem to be deliberately not reading up on the differences between 12_0 or 12_1 and the basic 11_0 FL support. All cards that are quoted as supporting 11_0 will have full support, the 12_0 and 12_1 features are not mandatory, in actual fact I'd be surprised if games utilise these feature levels at all for quite some time till card adoption picks up.

It's not as clear cut as previous DX iterations as DX12 isn't an iteration, it's extensively reworked. In some cases from the ground up.


----------



## sugarhell

Quote:


> Originally Posted by *Olivon*
> 
> People seems to have really short memories.
> When Mantle was out, Oxide provided Star Swarn demo for Mantle.
> At start, nVidia performance was plain awfull and it was an advocacy (or an ad as you want) for Mantle.
> And what happens next ? nVidia just showed that they can beat Mantle with DX11 API and a new driver, they did the same for BF4 too.
> 
> http://www.tomshardware.com/news/nvidia-geforce-337.50-driver-benchmarks,26473.html
> 
> This Oxide DX12 benchmark means nothing.


Wait so because on Star Swarm nvidia released better drivers for dx11 somehow it means that this benchmark means nothing?

Its not even a benchmark because its not fixed. It change the workload each time.


----------



## Olivon

Quote:


> Originally Posted by *sugarhell*
> 
> Wait so because on Star Swarm nvidia released better drivers for dx11 somehow it means that this benchmark means nothing?
> 
> Its not even a benchmark because its not fixed. It change the workload each time.


It just means that people need to wait before jump to conclusions.
Last time, we've a lot of comment : "Mantle will win" and what happened ? Mantle is not supported by AMD anymore and market share are around 20% right now for AMD desktop AIB.


----------



## sugarhell

Quote:


> Originally Posted by *Olivon*
> 
> It just means that people need to wait before jump to conclusions.
> Last time, we've a lot of comment : "Mantle will win" and what happened ? Mantle is not supported by AMD anymore and market share are around 20% right now for AMD desktop AIB.


So instead to say wait for more results you say that this benchmark means nothing. I am with you just word it better


----------



## epic1337

Quote:


> Originally Posted by *Clocknut*
> 
> because we play games in DX9-11, not DX12.


well, true.

though rather than compare GPUs, DX12's primary benefit is making CPUs more profound isn't it?
the gap between an i3 and an i5 pretty much shrunk for example.


----------



## Themisseble

Quote:


> Originally Posted by *Clocknut*
> 
> because we play games in DX9-11, not DX12.


many games will use DX12, because of its potential.
- Ark Survival
- Also expect PlanetSide 2
...


----------



## Wishmaker

Quote:


> Originally Posted by *epic1337*
> 
> i must have missed it.


5-6 years living under a rock?


----------



## epic1337

Quote:


> Originally Posted by *Wishmaker*
> 
> 5-6 years living under a rock?


no, pretty much because theres so much other stuffs that i hardly notice things like this.


----------



## mutantmagnet

Quote:


> Originally Posted by *epic1337*
> 
> i must have missed it.


Maybe because people rarely said AMD has bad drivers compared to saying Nvidia has better drivers. People were defending AMD having as good drivers most of the time but still the implication was clear that AMD drivers were maligned for years.


----------



## HeavyUser

I'm happy Fury X got a few more extras frames than the Ti, AMD owners really need a bone thrown to them. It's not right to be kicked by AMD over and over and after giving them your money just to watch Nvidia users pull ahead in every game. Enjoy AMD users, this is your golden cup of a game as of now, the game that you have been so long deserved.


----------



## Kand

Quote:


> Originally Posted by *Themisseble*
> 
> many games will use DX12, because of its potential.
> - Ark Survival
> - Also expect PlanetSide 2
> ...


We will all have new graphics cards by the time that happens.


----------



## p4inkill3r

Quote:


> Originally Posted by *Kand*
> 
> We will all have new graphics cards by the time that happens.


That's quite the projection.


----------



## Defoler

Quote:


> Originally Posted by *sugarhell*
> 
> Stupid analogy that adds nothing to my comment, just trying to prove your wrong idea.


With the fact that DX12 is not just DX 11_1, that actually proves that you have no idea what is DX12


----------



## sugarhell

Quote:


> Originally Posted by *Defoler*
> 
> With the fact that DX12 is not just DX 11_1, that actually proves that you have no idea what is DX12


lol,

You are kinda confused. Dx feature levels are rendering methods that can improve performance if the gpu can support it. Thats why you have a standard feature level and if the developer support some of the optional features some gpus will use it. Key word:if the developer will use it. 11_x is the standard features that dx12 requires for rendering and other stuffs.

Now on the hardware side all the gpus must support resource binding and some other things. Thats the minimal that dx12 requires. Now combine 11_1 feature levels and you have dx12. Anything else is OPTIONAL

Its not that hard to understand i believe..

or check here : https://msdn.microsoft.com/en-us/library/windows/desktop/ff476876%28v=vs.85%29.aspx


----------



## CrazyHeaven

I did hear it mention that devs had to support it so most older games will not work. Witcher 3 said they may be interested in going back in to add that support later.


----------



## CrazyHeaven

Quote:


> Originally Posted by *mutantmagnet*
> 
> Maybe because people rarely said AMD has bad drivers compared to saying Nvidia has better drivers. People were defending AMD having as good drivers most of the time but still the implication was clear that AMD drivers were maligned for years.


AMD is like the Cleveland Browns. This year is going to be their year. People kept supporting despite the loses but wait and see the greatness of this year.

I'd still recommend the green team. The vast majority of current games will not support dx 12 so you'd be stuck with the reds teams performance. And also just because amd is doing better doesn't mean they are crashing Nvidia. Right now I'd say they were competing. I for one am interested to see what this does to future prices. Assuming amd is amd to release their cards on time...


----------



## SpeedyVT

Quote:


> Originally Posted by *CrazyHeaven*
> 
> AMD is like the Cleveland Browns. This year is going to be their year. People kept supporting despite the loses but wait and see the greatness of this year.
> 
> I'd still recommend the green team. The vast majority of current games will not support dx 12 so you'd be stuck with the reds teams performance. And also just because amd is doing better doesn't mean they are crashing Nvidia. Right now I'd say they were competing. I for one am interested to see what this does to future prices. Assuming amd is amd to release their cards on time...


All the best games will be DX12 this Christmas time. Even some of the old games are getting an update to DX12.

There was an another benchmark recently putting the stock 290x just below the stock 980ti, 1 or 2 frames. I'd feel incredibly embarrassed as NVidia right now. Not offense.

I knew for a long time NVidia's tech was just not up to par. Everyone knew AMD's drivers were not up to par, but as soon as you let developers work with graphics chips themselves.... BAM! POWER!

NVidia is aiming to strong-arm it's customers, that's how it's always been. It's not that their tech is inferior it's not! It's super futuristic. They are just anal about people writing directly to their hardware. They are also giving a 980ti with fewer cores than the 980. I some how feel the 980 ti should've just been the 970 ti.

They are selling you something for more that should've been for less and now AMD is going to profit from NVidia's dishonesty.


----------



## CrazyHeaven

Quote:


> Originally Posted by *SpeedyVT*
> 
> All the best games will be DX12 this Christmas time. Even some of the old games are getting an update to DX12.
> 
> There was an another benchmark recently putting the stock 290x just below the stock 980ti, 1 or 2 frames. I'd feel incredibly embarrassed as NVidia right now. Not offense.
> 
> I knew for a long time NVidia's tech was just not up to par. Everyone knew AMD's drivers were not up to par, but as soon as you let developers work with graphics chips themselves.... BAM! POWER!
> 
> NVidia is aiming to strong-arm it's customers, that's how it's always been. It's not that their tech is inferior it's not! It's super futuristic. They are just anal about people writing directly to their hardware. They are also giving a 980ti with fewer cores than the 980. I some how feel the 980 ti should've just been the 970 ti.
> 
> They are selling you something for more that should've been for less and now AMD is going to profit from NVidia's dishonesty.


When we have one clear winner prices are whatever they set them at. This is not a good thing for anyone beyond the one on top. I'll leave Nvidia in an instant if amd offered better performance across the board. I use to be a strong amd guy until c2d showed up ready for battle.


----------



## EthanKing

Even though I just left Amd to buy a 970, I am glad for Amd. This should influence Nvidia's prices and encourage them to step up their game which in turn should have Amd working hard to produce even better Gpu's.

Can't wait to see what this leads to.

Sent from my GT-I8200N using Tapatalk


----------



## Defoler

Quote:


> Originally Posted by *sugarhell*
> 
> lol,
> 
> You are kinda confused. Dx feature levels are rendering methods that can improve performance if the gpu can support it. Thats why you have a standard feature level and if the developer support some of the optional features some gpus will use it. Key word:if the developer will use it. 11_x is the standard features that dx12 requires for rendering and other stuffs.
> 
> Now on the hardware side all the gpus must support resource binding and some other things. Thats the minimal that dx12 requires. Now combine 11_1 feature levels and you have dx12. Anything else is OPTIONAL
> 
> Its not that hard to understand i believe..
> 
> or check here : https://msdn.microsoft.com/en-us/library/windows/desktop/ff476876%28v=vs.85%29.aspx


Again you show either reading disorder, or you are just trying an unsuccessful trolling expedition.

You stated:
Quote:


> Originally Posted by *sugarhell*
> 
> Anything with dx11_x feature level support 100% the dx12.


Now regarding that "100%":
The following is part of DX12_0: Tier 2 hardware binding
And the following is part of DX12_1: Pixel collision detection, transparency rendering order.

The new resource level allows to bind tasks directly to the GPU without having to go through middle-man software (the "optional" in 11_1 which requires drivers as middle-man, which is CPU bound, aka, not supported hardware level like DX12).
The new ordered views allow to multi task calculating shaders and perform rendering on transparent textures, which in 11_1 requires serial rendering, or a pre-order and breakdown must be made pre-rendered on the drivers side in order to simulate this in 11_1 (hence the "optional").
It also allows better guessing of pixel data, which in turn if done correctly, can reduce needs of AA, or if used with upscaling AA, can give better quality for lower upscaling.
11_1 cards will also not enjoy the option to use tiled resources, which will allow less on duplicated surfaces, because it is a 12_0 only rescue

So to conclude, your statement, "100%", is incorrect. And attempts to tell us that optional 11_3 features, and call them 11_1, and claim "100%", shows little understanding in DX12 besides buying what AMD are saying...

This just means that if you take two cards, with the exact same GPU, but one supporting DX12 in full and one supporting DX11_1, you will not get the same 100% picture, and the DX12 card will work less hard overall, or allowed to be fully utilised to give better FPS.

And regarding Ashes of Singularity, which is using tier 2 binding in order to unload to the GPU parallel rendering, will not be possible with cards which only support 11_1, like the 7870 referred to earlier, so it will again, have not as much gain as the cards fully supporting 12_0 and 12_1 and it will not give as good FPS benefit because it just won't be able to do the same things.

So next time you say "100%", be sure it is 100% and not 99.9% or 90% or whatever you were "no but I mean" saying later, and linking to a place which specifically shows when looking into details, a good difference between them, if you only bothered to read.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Wishmaker*
> 
> The more AMD gets beaten to the ground, the more people on OCN defend it. I just hope you guys are right and DX 12 is the best thing since sliced bread. In the event that DX12 still keeps NVIDIA on top, I will be sitting here in the corner handing out tissues to all the AMD fanboys.


The big difference with DX12 is that HIV have little to do to optimize the game unlike DX11. This means that Nvidias manpower advantage will show far less.


----------



## Cherryblue

Quote:


> Originally Posted by *Wishmaker*
> 
> The more AMD gets beaten to the ground, the more people on OCN defend it. I just hope you guys are right and DX 12 is the best thing since sliced bread. In the event that DX12 still keeps NVIDIA on top, I will be sitting here in the corner handing out tissues to all the AMD fanboys.


Such meaning less hate. You are wasting your time by arguing against one of the two teams with







, and I feel sorry for that.

But well, regarding the discussion about performance with DX12, I don't really care if NVIDIA is ahead of AMD (if it's just by a few fingers).

What is meaningful, is that DX12 will probably make games need less optimizations in the drivers, since the aim of the api is to improve the use of the hardware.

And in this event, it should bring more help to AMD which can not invest time in driver optimization. It's a fact AMD performances are poor because of their drivers without optimization for games.


----------



## SpeedyVT

Quote:


> Originally Posted by *Cherryblue*
> 
> Such meaning less hate. You are wasting your time by arguing against one of the two teams with
> 
> 
> 
> 
> 
> 
> 
> , and I feel sorry for that.
> 
> But well, regarding the discussion about performance with DX12, I don't really care if NVIDIA is ahead of AMD (if it's just by a few fingers).
> 
> What is meaningful, is that DX12 will probably make games need less optimizations in the drivers, since the aim of the api is to improve the use of the hardware.
> 
> And in this event, it should bring more help to AMD which can not invest time in driver optimization. It's a fact AMD performances are poor because of their drivers without optimization for games.


The real benefit comes to the game developer to exploit the the APIs rather than to create something and hope the vendors support it.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Wishmaker*
> 
> The more AMD gets beaten to the ground, the more people on OCN defend it. I just hope you guys are right and DX 12 is the best thing since sliced bread. In the event that DX12 still keeps NVIDIA on top, I will be sitting here in the corner handing out tissues to all the AMD fanboys.


I'll be listing deals on Nvidia cards and helping them make the change over. It isn't going to be easy going into the unknown but I have faith that they can do it.

At least right now dx12 is looking good for amd. 2016 will be an interesting year.


----------



## SpeedyVT

Quote:


> Originally Posted by *CrazyHeaven*
> 
> I'll be listing deals on Nvidia cards and helping them make the change over. It isn't going to be easy going into the unknown but I have faith that they can do it.
> 
> At least right now dx12 is looking good for amd. 2016 will be an interesting year.


That'll devalue the cards far before they maintain their value.

That's like selling MGSV on pre-order for $60 and then selling it for $30 the next day.


----------



## sugarhell

Quote:


> Originally Posted by *Defoler*
> 
> snip


Holy cow. You are something else.

You understand what do you describe is optional features? If the dev support them then the cards with these features will be faster because their algorithm will be faster.

We say the same thing but i can understand the optional part. All these things are rendering methods that devs needs to support on their engines.

So i can probably say so you can even understand. A card with feature level 11_1 will support dx12 (new pipeline,new resource binding etc etc) but not all the mostly optional features.

Even if a card support dx12.0 feature level or whatever you want its up to dev if they use for example tiled resources. And for a second just think a dev will spend time to develop something that applies only to maxwell or fury? Anyone even used dx11.1 new features? Nah

These things will matter after 2-3 years that every single gpu will support all these features


----------



## CrazyHeaven

Quote:


> Originally Posted by *SpeedyVT*
> 
> That'll devalue the cards far before they maintain their value.
> 
> That's like selling MGSV on pre-order for $60 and then selling it for $30 the next day.


Deals get posted on new games all the time. I don't understand your point. The Witcher 3 didn't take any time at all before it's first sale.

Are you saying that putting new gpus on sale somehow devalues them? Sales happen on new tech all the time.

My main point was the 2016 cards are suppose to be many times better than our current ones so I was looking forward to it. This is why I'm waiting to go sli. If the new cards aren't so great I'll grab another 980ti. If they are I'll sell mine and pick up the new one.


----------



## the9quad

Quote:


> Originally Posted by *Wishmaker*
> 
> The more AMD gets beaten to the ground, the more people on OCN defend it. I just hope you guys are right and DX 12 is the best thing since sliced bread. In the event that DX12 still keeps NVIDIA on top, I will be sitting here in the corner handing out tissues to all the AMD fanboys.


I'd be embarrassed to be a fanboy of either camp unless I owned stock. I hope you own stock...


----------



## Themisseble

Quote:


> Originally Posted by *epic1337*
> 
> does it matter which card is superior? even fanboys knows best that perf/$ or raw performance means jack-shet if it isn't their favorite brand.
> so long as the card can satisfy the customer, is there even anything else to reconsider?


Huh... not long time ago I recommend someone R9 290 for 235$ (turbo duo)... yet someone convinced him to buy GTX 970 for 340$. NVIDIA fanboys are trolling...


----------



## epic1337

Quote:


> Originally Posted by *Themisseble*
> 
> Huh... not long time ago I recommend someone R9 290 for 235$ (turbo duo)... yet someone convinced him to buy GTX 970 for 340$. NVIDIA fanboys are trolling...


well thats a whole other issue.


----------



## Kand

Quote:


> Originally Posted by *p4inkill3r*
> 
> That's quite the projection.


Let's say you had a 5870 when Unigine wowed us with Dx11. There were no Dx11 games at the time but AMD was on top due to dx11 support over NV.

Skip 6 years later. There are hardly any games out that truly make use of DX11 and if you're still on the 5870 and happen across a game that does. It's a slideshow.


----------



## p4inkill3r

Quote:


> Originally Posted by *Kand*
> 
> Let's say you had a 5870 when Unigine wowed us with Dx11. There were no Dx11 games at the time but AMD was on top due to dx11 support over NV.
> 
> Skip 6 years later. There are hardly any games out that truly make use of DX11 and if you're still on the 5870 and happen across a game that does. It's a slideshow.


How many gamers on this website are still using 58**? Some, I'm sure, but there are a lot more gtx 7**/9** or r9 2** than anything else.


----------



## Kand

Quote:


> Originally Posted by *p4inkill3r*
> 
> How many gamers on this website are still using 58**? Some, I'm sure, but there are a lot more gtx 7**/9** or r9 2** than anything else.


Exactly. You will not be on a 980 ti or Fury X by the time dx12 becomes truly relevant.


----------



## Themisseble

Quote:


> Originally Posted by *Kand*
> 
> Exactly. You will not be on a 980 ti or Fury X by the time dx12 becomes truly relevant.


Yeah but you dont want to be on GTX 780Ti anymore...


----------



## p4inkill3r

Quote:


> Originally Posted by *Kand*
> 
> Exactly. You will not be on a 980 ti or Fury X by the time dx12 becomes truly relevant.


That I do not believe.
DX12 will be relevant within a couple of months and there will be no better cards than the Ti or the Fury for longer than that.


----------



## ku4eto

What AMD needs to do now is direct its driver team for DX12 and Vulkan improvement. Why on those 2 ? Because they are future products, and if in benchmarks the AMD is showing better results on DX12, it will sell better. DX12 is already here, future games will only start from DX11 and in 1 year the majority will be starting on DX12.


----------



## Kand

Quote:


> Originally Posted by *p4inkill3r*
> 
> That I do not believe.
> DX12 will be relevant within a couple of months and there will be no better cards than the Ti or the Fury for longer than that.


That's the same tune they had with DX11.


----------



## Themisseble

Quote:


> Originally Posted by *Kand*
> 
> That's the same tune they had with DX11.


I just cant remember .. but did DICE director said same thing for DX11?

Project cars - DX12
Ark Survival - Dx12
WoW - may get DX12
BF series will be on DX12
battlefront may support DX12(mantle)
Crytek - making VR DX12 game....


----------



## bluewr

Quote:


> Originally Posted by *Kand*
> 
> That's the same tune they had with DX11.


Perhaps, but DX11 didn't have alot of support, even among console, unlike DX12, in which there are multiple games that are planning or will support it, and one of the current gen console Xbone will support it.

So you're comparing apple to orange, the situation for both is different.


----------



## Kand

Quote:


> Originally Posted by *Themisseble*
> 
> I just cant remember .. but did DICE director said same thing for DX11?
> 
> Project cars - DX12 - bad game
> Ark Survival - Dx12 - not snother survival game
> WoW - may get DX12 - dx11 did nothing for it
> BF series will be on DX12 - still had hexagonal tires despite dx11
> battlefront may support DX12(mantle) - bf4 mantpe didnt benefit much. Not holding my breath for this.
> Crytek - making VR DX12 game - crytek makes benchmarks, not games


Fixed that for you.


----------



## Kand

Quote:


> Originally Posted by *bluewr*
> 
> Perhaps, but DX11 didn't have alot of support, even among console, unlike DX12, in which there are multiple games that are planning or will support it, and one of the current gen console Xbone will support it.
> 
> So you're comparing apple to orange, the situation for both is different.


Consoles are on pitcairn, gcn 1.0 which support Dx12 feature level 11_1.

All the mantle like improvements are on dx12 feature level 12_1. Consoles will not benefit from any of the improvements you're seeing here.

If anythinf, we should be seeing a rise in dx11 games now that current gen consoles are here... Yet we aren't.


----------



## Blameless

Quote:


> Originally Posted by *mutantmagnet*
> 
> Maybe because people rarely said AMD has bad drivers compared to saying Nvidia has better drivers. People were defending AMD having as good drivers most of the time but still the implication was clear that AMD drivers were maligned for years.


Overhead is not the be all and end all of driver quality.
Quote:


> Originally Posted by *Kand*
> 
> Let's say you had a 5870 when Unigine wowed us with Dx11. There were no Dx11 games at the time but AMD was on top due to dx11 support over NV.
> 
> Skip 6 years later. There are hardly any games out that truly make use of DX11 and if you're still on the 5870 and happen across a game that does. It's a slideshow.


Yes.
Quote:


> Originally Posted by *p4inkill3r*
> 
> How many gamers on this website are still using 58**? Some, I'm sure, but there are a lot more gtx 7**/9** or r9 2** than anything else.


I have a lot of systems that occasionally run games...a few have slower gpus than a 5870 in it.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Themisseble*
> 
> Huh... not long time ago I recommend someone R9 290 for 235$ (turbo duo)... yet someone convinced him to buy GTX 970 for 340$. NVIDIA fanboys are trolling...


I did something similar to this and it was a good move for me. The games I play/played are heavily sided with Nvidia. So I made my choice based on the games I wanted to play.


----------



## p4inkill3r

Quote:


> Originally Posted by *Kand*
> 
> Fixed that for you.


Fixed it by adding your opinions?


----------



## Themisseble

Quote:


> Originally Posted by *Kand*
> 
> Fixed that for you.


BF4 mantle did benefit much on CPU side.


----------



## p4inkill3r

Quote:


> I have a lot of systems that occasionally run games...a few have slower gpus than a 5870 in it.


Sure, but _Primary_ certainly isn't running one.


----------



## Kand

Quote:


> Originally Posted by *p4inkill3r*
> 
> Fixed it by adding your facts?


It has been fixed.


----------



## Kand

Quote:


> Originally Posted by *Themisseble*
> 
> BF4 mantle did benefit much on CPU side.


But noe enough. People hyped 2x increase but reality was 1.10x.


----------



## Kuivamaa

Quote:


> Originally Posted by *Themisseble*
> 
> I just cant remember .. but did DICE director said same thing for DX11?
> 
> Project cars - DX12
> Ark Survival - Dx12
> WoW - may get DX12
> BF series will be on DX12
> battlefront may support DX12(mantle)
> Crytek - making VR DX12 game....


My most anticipated game of the year will support DX12. Deus Ex Mankind Divided. I would even upgrade for this game but I am sure there will be no need.


----------



## Themisseble

Quote:


> Originally Posted by *Kuivamaa*
> 
> My most anticipated game of the year will support DX12. Deus Ex Mankind Divided. I would even upgrade for this game but I am sure there will be no need.


That title may even support asynch. shaders

TOMB RAIDER uses it.


----------



## Kand

Quote:


> Originally Posted by *Kuivamaa*
> 
> My most anticipated game of the year will support DX12. Deus Ex Mankind Divided. I would even upgrade for this game but I am sure there will be no need.


Game of next year. Release date, 2016.


----------



## Kuivamaa

Still only a few months away.


----------



## p4inkill3r

Quote:


> Originally Posted by *Kuivamaa*
> 
> Still only a few months away.


An eternity for him, evidently.


----------



## Kand

Quote:


> Originally Posted by *Kuivamaa*
> 
> Still only a few months away.


Deus Ex may be a launch bundle wirh Nv pascal.


----------



## Kuivamaa

Quote:


> Originally Posted by *Kand*
> 
> Deus Ex may be a launch bundle wirh Nv pascal.


Good try but no. It is actually a gaming evolved title. AMD sponsored,and it will be out like 7-9 months before Pascal probably.

http://www.amd.com/en-us/markets/game/featured/deus-ex#


----------



## Kand

Quote:


> Originally Posted by *Kuivamaa*
> 
> Good try but no. It is actually a gaming evolved title. AMD sponsored,and it will be out like 7-9 months before Pascal probably.
> 
> http://www.amd.com/en-us/markets/game/featured/deus-ex#


Im expecting it to drop around Q2.


----------



## Kuivamaa

Quote:


> Originally Posted by *Kand*
> 
> Im expecting it to drop around Q2.


I wouldn't be that optimistic. 14nm, HBM, I see July next year.


----------



## Kand

Quote:


> Originally Posted by *Kuivamaa*
> 
> I wouldn't be that optimistic. 14nm, HBM, I see July next year.


Yeah, sounds about right for Deus Ex to release.


----------



## Kuivamaa

Quote:


> Originally Posted by *Kand*
> 
> Yeah, sounds about right for Deus Ex to release.


You can preorder it with arctic islands I bet


----------



## Kand

Quote:


> Originally Posted by *Kuivamaa*
> 
> You can preorder it with arctic islands I bet


Putting Deus ex at a q4 release.


----------



## Kuivamaa

Quote:


> Originally Posted by *Kand*
> 
> Putting Deus ex at a q4 release.


Well there is a good chance pascal will be after AI no matter what,so take your pick.


----------



## Redwoodz

This can all be summed by the statements that AMD has from the beginning of GCN developed an architecture that is DESIGNED to rely on a high number of draw calls. In order to combat the latency effects in DX11, they designed a new low level API to implement their strategy. They forced they implementation of DX12 by creating Mantle and demonstrating the clear future in API's which rely heavily in multi-threaded hardware. AMD didn't waste time focusing on DX11 because they knew what was coming,and could alleviate the draw call effect in DX11 with Mantle. There is nothing new to see here except the results of AMD's efforts being realized in DX12,period.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Redwoodz*
> 
> This can all be summed by the statements that AMD has from the beginning of GCN developed an architecture that is DESIGNED to rely on a high number of draw calls. In order to combat the latency effects in DX11, they designed a new low level API to implement their strategy. They forced they implementation of DX12 by creating Mantle and demonstrating the clear future in API's which rely heavily in multi-threaded hardware. AMD didn't waste time focusing on DX11 because they knew what was coming,and could alleviate the draw call effect in DX11 with Mantle. There is nothing new to see here except the results of AMD's efforts being realized in DX12,period.


Hoping this is true. I never cared for dx 11. 12 is another story altogether. I've been looking forward to this for a long time. Come on amd. Knock Nvidia off of their horse. Q2 I'll have about 700 usd with someone's name on it. Who will be the one to take it from me?

I want to see a close race with threads popping up all over ocn asking which brand they should get. But I'm not doing anything until I see pascal gains with the Witcher 3. As funny as this may sound I'm not looking forward to anything else yet. Could be wrong but I'm doubting shenmue 3 is going to push the envelope on graphics. Most of my time is spent playing games like My Lifemom Is Strange.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Ganf*
> 
> Tri-SLI owners have their own woes. 3 card setups have been the bane of enthusiasts since their inception, you either do 2 or 4, 3 is no-man's-land.
> Can't say I've experienced the same on my 7970, and I used to play all kinds of weird old crap on that.


My issues didn't go away when I went from Tri, to Cross, and then eventually down to one. I literally moved back to Nvidia, from AMD, in my own rig because of the AMD driver issues.

EDIT: For clarity and disclosure, as I said several pages back, AMD has gotten a lot better with drivers the last several months. If only to be catching up, improvement is still improvement.

Oh, and Nvidia still will possibly blow up your GPU with each driver update as it is.









Quote:


> Originally Posted by *escksu*
> 
> so Nvidia drivers have been perfect? Then why am I reading complains about Nvidia drivers?


So in order for AMD's drivers to suck, Nvidia's have to be perfect?

What in the Hell kind of logic is that?

Quote:


> Originally Posted by *Redwoodz*
> 
> This can all be summed by the statements that AMD has from the beginning of GCN developed an architecture that is DESIGNED to rely on a high number of draw calls. In order to combat the latency effects in DX11, *they designed a new low level API to implement their strategy. They forced they implementation of DX12 by creating Mantle and demonstrating the clear future in API's* which rely heavily in multi-threaded hardware. AMD didn't waste time focusing on DX11 because they knew what was coming,and could alleviate the draw call effect in DX11 with Mantle. There is nothing new to see here except the results of AMD's efforts being realized in DX12,period.





















Hold on, let me catch my breath....










Oh, that was good....

Now; You might want to update your understanding of the word "Future". Because AMD was only about 16+ years LATE to the "low level" party. They were beaten out by Microsoft with Direct X back in 1998/1999 on the original XBox. As well as 3DFX with 3DFX Glide back in the mid 90s as well.

AMD, literally, is copying what others have done long ago. They didn't force or push anything new, because frankly they don't have the Marketshare or dominance to do any of that. They don't even have the financial resources to drive forward a major industry standard these days. Why do you think Mantle died, and was dumped off as Vulkan? Because AMD doesn't have the resources to handle it. They struggle enough as it is with the hardware and software side of things as it is.


----------



## SoloCamo

*290x beating a 980ti?*

Didn't see this posted on the first page...

http://arstechnica.co.uk/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/

Sorry if it's a repost, didn't see it in this thread


----------



## mtcn77

Quote:


> Originally Posted by *SoloCamo*
> 
> *290x beating a 980ti?*
> 
> Didn't see this posted on the first page...
> 
> http://arstechnica.co.uk/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/
> 
> Sorry if it's a repost, didn't see it in this thread


Unreal! 980 Ti gets wrecked in frame times. It is almost at a lower resolution tier next to 290X in Dx12.


----------



## Xuper

Nvidia Released Driver For Ashes, Now Did AMD release new driver for Only Ashes?


----------



## ZealotKi11er

Quote:


> Originally Posted by *Xuper*
> 
> Nvidia Released Driver For Ashes, Now Did AMD release new driver for Only Ashes?


I dont think AMD did.


----------



## SlackerITGuy

Quote:


> Originally Posted by *PostalTwinkie*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hold on, let me catch my breath....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Oh, that was good....
> 
> Now; You might want to update your understanding of the word "Future". Because AMD was only about 16+ years LATE to the "low level" party. They were beaten out by Microsoft with Direct X back in 1998/1999 on the original XBox. As well as 3DFX with 3DFX Glide back in the mid 90s as well.
> 
> AMD, literally, is copying what others have done long ago. They didn't force or push anything new, because frankly they don't have the Marketshare or dominance to do any of that. They don't even have the financial resources to drive forward a major industry standard these days. Why do you think Mantle died, and was dumped off as Vulkan? Because AMD doesn't have the resources to handle it. They struggle enough as it is with the hardware and software side of things as it is.



Microsoft develops and announces DirectX 11.2 back in June of 2013, claiming that it would only be available on Windows 8.1 and the XBOX One - June 2013
AMD announces Mantle - September 2013
Microsoft ships its XBOX One Console with DirectX 11.x - November 2013
Microsoft then announces both DirectX 12 and DirectX 11.3, saying that it wouldn't have been possible without "an industry wide collaboration" behind it - March 2014
But somehow @PostalTwinkie wants us to think that Microsoft started developing DirectX 12 as a the low level API it is now long before AMD announced Mantle and that Mantle had no effect on it, yet they couldn't even implement DirectX 12 on their Next Generation Console *(one fixed hardware setup..... ONE)*.

Of course Microsoft developed a low level DirectX API for their original XBOX console, what were they supposed to do? ship a mass market console with a high level API? Jeez....


----------



## SpeedyVT

Quote:


> Originally Posted by *PostalTwinkie*
> 
> My issues didn't go away when I went from Tri, to Cross, and then eventually down to one. I literally moved back to Nvidia, from AMD, in my own rig because of the AMD driver issues.
> 
> EDIT: For clarity and disclosure, as I said several pages back, AMD has gotten a lot better with drivers the last several months. If only to be catching up, improvement is still improvement.
> 
> Oh, and Nvidia still will possibly blow up your GPU with each driver update as it is.
> 
> 
> 
> 
> 
> 
> 
> 
> So in order for AMD's drivers to suck, Nvidia's have to be perfect?
> 
> What in the Hell kind of logic is that?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Hold on, let me catch my breath....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Oh, that was good....
> 
> Now; You might want to update your understanding of the word "Future". Because AMD was only about 16+ years LATE to the "low level" party. They were beaten out by Microsoft with Direct X back in 1998/1999 on the original XBox. As well as 3DFX with 3DFX Glide back in the mid 90s as well.
> 
> AMD, literally, is copying what others have done long ago. They didn't force or push anything new, because frankly they don't have the Marketshare or dominance to do any of that. They don't even have the financial resources to drive forward a major industry standard these days. Why do you think Mantle died, and was dumped off as Vulkan? Because AMD doesn't have the resources to handle it. They struggle enough as it is with the hardware and software side of things as it is.


Don't want to call you stupid but Gamecube launched the same year Xbox did, same day too. Gamecube had better graphics. ATI existed long before AMD owned them and they had hardware, 1989 ATI and 3DFX 1992. Get your facts straight because your fanboy is showing.


----------



## Cyro999

Quote:


> AMD didn't waste time focusing on DX11 because they knew what was coming


It's been coming for a long time. You're saying that they screwed over everyone who bought their hardware in 2012, 2013, 2014 and most of 2015 (about 4 years from gcn release to dx12 adoption) on purpose?


----------



## ZealotKi11er

Quote:


> Originally Posted by *Cyro999*
> 
> It's been coming for a long time. You're saying that they screwed over everyone who bought their hardware in 2012, 2013, 2014 and most of 2015 (about 4 years from gcn release to dx12 adoption) on purpose?


Lucky DX12 does not need new cards.


----------



## SpeedyVT

Quote:


> Originally Posted by *Cyro999*
> 
> It's been coming for a long time. You're saying that they screwed over everyone who bought their hardware in 2012, 2013, 2014 and most of 2015 (about 4 years from gcn release to dx12 adoption) on purpose?


Wouldn't call it screwing over if the past hardware matched the current NVidia was capable of putting out. We knew the teraflop power of the GPUs, we knew AMD's DX11 drivers sucked. NVidia never provided a greater than competent product worthy of it's price till now that totally got it's socks rocked by older hardware. Legit enough statement.


----------



## Kpjoslee

Quote:


> Originally Posted by *SpeedyVT*
> 
> Don't want to call you stupid but Gamecube launched the same year Xbox did, same day too. Gamecube had better graphics. ATI existed long before AMD owned them and they had hardware, 1989 ATI and 3DFX 1992. Get your facts straight because your fanboy is showing.


Um, ATI didn't develop Gamecube GPU. Artx actually developed it before getting acquired by ATI back in 2000. ATI was the supplier but its GPU design was done before ATI acquisition.


----------



## mcg75

*Thread closed temporarily.

I strongly suggest that all involved in the discussion here stop making personal comments about other users in an effort to get them riled up. It's not going to be tolerated period.

Debate with facts and respect for other posters or you will be removed from discussion in this thread.*


----------



## mcg75

Unlocked.


----------



## Cyro999

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Lucky DX12 does not need new cards.


No; but they can't leave dx11 performance like that forever - or can they? People don't really seem to care, i just hate seeing people on my forums who got recommended the wrong hardware for some situations for lack of knowledge. A 5 or 10% gap is fine but 50% is just not, it should never be acceptable. Where is the outrage?


----------



## Woundingchaney

Quote:


> Originally Posted by *SpeedyVT*
> 
> Wouldn't call it screwing over if the past hardware matched the current NVidia was capable of putting out. We knew the teraflop power of the GPUs, we knew AMD's DX11 drivers sucked. NVidia never provided a greater than competent product worthy of it's price till now that totally got it's socks rocked by older hardware. Legit enough statement.


I'm not sure what the subject matter is, but during that generation the Gamecube didn't have better graphics than the Xbox. The Xbox was the most impressive graphical console during that time frame.


----------



## Meta|Gear

gains are nice but is it worth putting up with w10 spying etc


----------



## Cherryblue

Quote:


> Originally Posted by *Meta|Gear*
> 
> gains are nice but is it worth putting up with w10 spying etc


Completely agreed.

Back into 2000's, I remember guys "hacking" dx10 so it would run on XP.

How I'd like to see this on Seven..


----------



## Woundingchaney

Quote:


> Originally Posted by *Meta|Gear*
> 
> gains are nice but is it worth putting up with w10 spying etc


For me it is.

I used to go to lengths on my devices and software to stop "spying" and data collection. That is until I noticed just how common these things are in not only the hardware and software market, but all facets of life. Nearly everything I do is recorded in some way if I am interacting with some type of device. I can completely understand how people are outraged by such things, its simply that one would almost need to live a hermit like existence.

I personally think that consumers should receive more benefits if their usage history and other forms of data are so valuable.


----------



## sugarhell

Quote:


> Originally Posted by *Cyro999*
> 
> No; but they can't leave dx11 performance like that forever - or can they? People don't really seem to care, i just hate seeing people on my forums who got recommended the wrong hardware for some situations for lack of knowledge. A 5 or 10% gap is fine but 50% is just not, it should never be acceptable. Where is the outrage?


Dx11 performance is per game. Dont compare the performance here vs all the dx11 games. They are behind on cpu overhead with dx11 games but not as much as here with 80% difference.


----------



## diggiddi

Quote:


> Originally Posted by *Themisseble*
> 
> I just cant remember .. but did DICE director said same thing for DX11?
> 
> *Project cars - DX12*
> Ark Survival - Dx12
> WoW - may get DX12
> *BF series will be on DX12*
> battlefront may support DX12(mantle)
> *Crytek* - making VR DX12 game....


Are these going to be patches for current titles?


----------



## diggiddi

Quote:


> Originally Posted by *Themisseble*
> 
> *That title may even support asynch. shaders*
> 
> TOMB RAIDER uses it.


What exactly does that mean? Even more perf for GCN

Quote:


> Originally Posted by *Cherryblue*
> 
> Completely agreed.
> 
> Back into 2000's, I remember guys "hacking" dx10 so it would run on XP.
> 
> How I'd like to see this on Seven..


Amen to that


----------



## Ganf

Quote:


> Originally Posted by *diggiddi*
> 
> Are these going to be patches for current titles?


Nope. From Software taught us that you can lock DX features behind a paywall in your DLC and fanboys will buy it anyways, so everyone is going to be charging $20 for new lightning and post-processing effects.


----------



## Mrzev

Sorry I didnt feel like reading trough 3000000 messages, but why did AMD get a HUGE improvement and nVidia did not? I am glad to see the Fury X on par with nVidia... but at the same time... I thought they were close as is... so im a bit confused about this now.


----------



## Cherryblue

Quote:


> Originally Posted by *Mrzev*
> 
> Sorry I didnt feel like reading trough 3000000 messages, but why did AMD get a HUGE improvement and nVidia did not? I am glad to see the Fury X on par with nVidia... but at the same time... I thought they were close as is... so im a bit confused about this now.


- Game Optimization made by Nvidia in their driver for DX11 dos not work on DX12. Same for AMD, but since they do not do much optimization in their driver compared to NVIDIA.. this gives a new advantage to AMD.
- AMD GCN architecture is oriented drawcalls, which was under-used on DX11. DX12 gives them air to breath.


----------



## ToTheSun!

Quote:


> Originally Posted by *Cherryblue*
> 
> - AMD GCN architecture is oriented drawcalls, which was under-used on DX11. DX12 gives them air to breath.


I doubt that's the case. On early DX12 draw calls benchmarks, most modern cards were scoring above 10 million draw calls per second.
Unless there's a game out there called "how many non-complex units can we fit inside this tiny box?", that won't be the actual bottleneck.


----------



## Ganf

Quote:


> Originally Posted by *ToTheSun!*
> 
> I doubt that's the case. On early DX12 draw calls benchmarks, most modern cards were scoring above 10 million draw calls per second.
> Unless there's a game out there called "how many non-complex units can we fit inside this tiny box?", that won't be the actual bottleneck.


Isn't that exactly what a large scale RTS with AI is?


----------



## ToTheSun!

Quote:


> Originally Posted by *Ganf*
> 
> Isn't that exactly what a large scale RTS with AI is?


Yes, it is!
But is each unit as simple as the ones in the DX12 draw call benchmark?


----------



## Serios

Quote:


> Originally Posted by *Defoler*
> 
> Both of these cards only support DX12 legacy mode, and not the full API. That means partial lower level API, and none of the new features, as in faster textures and better visuals.
> The gain in them in DX12 will be minimal at best compared to DX11.


GCN also supports Binding Tier 3 so DX12's biggest advantages will be there for all GCN cards.
And Xbox one is also based on GCN, there is no doubt DX12 will work great whit GCN.

The gains look more minimal on Nvidia's side to be honest.


----------



## Ganf

Quote:


> Originally Posted by *ToTheSun!*
> 
> Yes, it is!
> But is each unit as simple as the ones in the DX12 draw call benchmark?


The draw call benchmark is a mess and doesn't accurately represent what can be done with that number of draw calls, it's just a brute force method of forcing as many draw calls as possible.


----------



## ToTheSun!

Quote:


> Originally Posted by *Ganf*
> 
> it's just a brute force method of forcing as many draw calls as possible.


That's the point, though. In actual games, the difference between Nvidia and AMD cards in maximum draw calls per unit of time won't be dictating which performs better any time soon; optimizations and hardware will.

We've been doing amazing things with sub-million draw calls per second in DX11 titles. It's a little early to expect actual rendering from current hardware to be bottlenecked by 10x that amount in DX12.


----------



## Ganf

Quote:


> Originally Posted by *ToTheSun!*
> 
> That's the point, though. In actual games, the difference between Nvidia and AMD cards in maximum draw calls per unit of time won't be dictating which performs better any time soon; optimizations and hardware will.
> 
> We've been doing amazing things with sub-million draw calls per second in DX11 titles. It's a little early to expect actual rendering from current hardware to be bottlenecked by 10x that amount in DX12.


Don't be so sure, it looks like this game is trying damn hard and nobody has even removed the plastic wrap from DX12 yet. Having a stupid amount of draw calls available will, to most developers, mean that they have to spend less time optimizing them, not continue to use them as efficiently as they have and not spend that money and time on the rest of the game.


----------



## ToTheSun!

Quote:


> Originally Posted by *Ganf*
> 
> Don't be so sure, it looks like this game is trying damn hard and nobody has even removed the plastic wrap from DX12 yet. Having a stupid amount of draw calls available will, to most developers, mean that they have to spend less time optimizing them, not continue to use them as efficiently as they have and not spend that money and time on the rest of the game.


Perhaps, but optimizing batches always frees up CPU time, regardless of how little is being used with DX12, which, in turn, can be used for other things. I think the compromise will always be there.


----------



## Ganf

Quote:


> Originally Posted by *ToTheSun!*
> 
> Perhaps, but optimizing batches always frees up CPU time, regardless of how little is being used with DX12, which, in turn, can be used for other things. I think the compromise will always be there.


Now that we've got proper multithreading and 16 core CPUs coming out next year? Naaaah....


----------



## epic1337

if only all games could linearly scale to "moar coars!" right? we'd be seeing a much more massive boost as the drawcall overhead isn't just the only CPU bottleneck.

on a side note, is there any announcement from Nvidia about their DX12 performance?


----------



## Kuivamaa

Quote:


> Originally Posted by *sugarhell*
> 
> Dx11 performance is per game. Dont compare the performance here vs all the dx11 games. They are behind on cpu overhead with dx11 games but not as much as here with 80% difference.


This. Overhead isn't paramount,there are DX11 games that radeons perform perfectly. Typically the games that have the worst overhead problem are those that AMD didn't optimize their drivers for. These are usually mantle games (and now DX12 games, in both cases DX11 path becomes obsolete) and gameworks games where AMD simply can't do it. There are outliers like WoW where radeons suck, I have no idea why AMD just won't approach Blizz to resolve this (they may be nvidia partners for years but there is no excuse for not improving your performance) and CS:GO which although it is CPU intensive, Radeons simply scream with performance.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Cyro999*
> 
> No; but they can't leave dx11 performance like that forever - or can they? People don't really seem to care, i just hate seeing people on my forums who got recommended the wrong hardware for some situations for lack of knowledge. A 5 or 10% gap is fine but 50% is just not, it should never be acceptable. Where is the outrage?


If they have not done anything until now they will never fix it. Also as bad as DX11 CPU overhead in with AMD not all games are CPU limited. Also 50% is just this game engine.


----------



## Kuivamaa

Quote:


> Originally Posted by *Ganf*
> 
> Don't be so sure, it looks like this game is trying damn hard and nobody has even removed the plastic wrap from DX12 yet. Having a stupid amount of draw calls available will, to most developers, mean that they have to spend less time optimizing them, not continue to use them as efficiently as they have and not spend that money and time on the rest of the game.


You know what we say in games industry. Coders optimize so artists can add more things, then coders have to optimize again and so on. Talented studios will take better advantage of the tools at their disposal than less talented studios, regardless of API but the overall quality should rise anyway.


----------



## pengs

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *Mrzev*
> 
> Sorry I didnt feel like reading trough 3000000 messages, but why did AMD get a HUGE improvement and nVidia did not? I am glad to see the Fury X on par with nVidia... but at the same time... I thought they were close as is... so im a bit confused about this now.


Quote:


> Originally Posted by *Ganf*
> 
> Don't be so sure, it looks like this game is trying damn hard and nobody has even removed the plastic wrap from DX12 yet. Having a stupid amount of draw calls available will, to most developers, mean that they have to spend less time optimizing them, not continue to use them as efficiently as they have and not spend that money and time on the rest of the game.


Quote:


> Originally Posted by *ToTheSun!*
> 
> Perhaps, but optimizing batches always frees up CPU time, regardless of how little is being used with DX12, which, in turn, can be used for other things. I think the compromise will always be there.






Of course when you start being able to add multitudes of extra assets without running out of CPU resources your still literally adding triangles for the GPU to process in the end. So it's not as if removing the draw call cap is going to instantly yield tons of results because the GPU is still the guy at the end of the assembly line who is now dealing with as many assets as a developer can throw at it and under DX12 the CPU has no hesitation to pass as much as possible and solidly saturate the GPU with work - a GPU limitation will be seen more quickly with a proper game running on DX12. The main advantage being that that specific limitation has been removed and the CPU can relax with the help of the multi-threading enhancements and the overhead reduction to keep all of it's work loads in sync _while_ ridding humanity of the stutter fest PC gaming has become over the last few years







. It's best to put the load on the GPU because frame pacing works much better when it's saturated and not reliant on the CPU - it's also better for frame times when the CPU is generally 'relaxed'.

I think the asynchronous compute engines from both camps will probably help saturate the GPU with a rendering work load beyond what was seen with DX11 and yield quite a strong enhancement (which is probably what we are seeing when a 290X necks up to a 980Ti) but the CPU side of this is really what matters. Nvidia may be having difficulty with their implementation or it may just simply be because the architecture was not as pre-emptively engineered to accommodate for this type of low level access which would make sense considering the Mantle Vulkan and 12 similarities.







was aiming for 3 sentences


----------



## mtcn77

I think the interesting bit is the totally flexible execution of the batches of draw calls. The developer could initiate normal calls continually and aim for fps and latency, or render all in big batches and alleviate cpu time for maximum throughput, I suppose. This type of performance optimisation might not have been on the table before and offers an extra layer of improvements. Looking forward to the RTS spectacles coming forth.


----------



## ZealotKi11er

Quote:


> Originally Posted by *mtcn77*
> 
> I think the interesting bit is the totally flexible execution of the batches of draw calls. The developer could initiate normal calls continually and aim for fps and latency, or render all in big batches and alleviate cpu time for maximum throughput, I suppose. This type of performance optimisation might not have been on the table before and offers an extra layer of improvements. Looking forward to the RTS spectacles coming forth.


As far as i know the biggest RTS right now SC2 is still using DX9. Really hard to to Blizzard supporting new DX unless its something like WoW which gets updated and new expansions very often.


----------



## escksu

Quote:


> Originally Posted by *ZealotKi11er*
> 
> As far as i know the biggest RTS right now SC2 is still using DX9. Really hard to to Blizzard supporting new DX unless its something like WoW which gets updated and new expansions very often.


I am not sure how many pple are still playing SC2.....


----------



## escksu

Quote:


> Originally Posted by *epic1337*
> 
> if only all games could linearly scale to "moar coars!" right? we'd be seeing a much more massive boost as the drawcall overhead isn't just the only CPU bottleneck.
> 
> on a side note, is there any announcement from Nvidia about their DX12 performance?


Yes, they were shell shocked about the performance of their products and blame the benchmark instead.


----------



## zipper17

Nvidia Physx Vs AMD APIs Optimization


----------



## PostalTwinkie

Quote:


> Originally Posted by *SpeedyVT*
> 
> Don't want to call you stupid but Gamecube launched the same year Xbox did, same day too. Gamecube had better graphics. ATI existed long before AMD owned them and they had hardware, 1989 ATI and 3DFX 1992. Get your facts straight because your fanboy is showing.


What does this have to do with the price of tea in China?

Literally. What do your dates of the companies you are talking about (Which I am well aware of, as I was well and alive during their founding) have to do with my statements and the accuracy of them? The first time DX went "low level" was on the original Xbox back in 1998/1999.

I highly recommend that YOU know what YOU are talking about, before you butt in. Nothing about what you said makes what I said incorrect or changes it in any other way.

It is high time people stop screaming like a teenage girl at a Backstreet Boys concert, or more extreme an Elvis concert, about how AMD is first to low level. About how AMD pushed low level, blah blah blah. When AMD was beaten to the job by almost two decades!


----------



## Xuper

Nvidia says there is a MSAA bug, but what is this bench? Is bench flawed?

http://arstechnica.co.uk/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/3/

Quote:


> To help things along, the benchmark was run in three different resolutions: 1080p, 1440p, and 2160p (4K). All were run at the same "high" preset with MSAA *disabled*


Geforce 980 Ti completely Pawned by Radeon 290X in 99th percentile frame rate and AMD radeon is very close to 980TI.mind that Nvidia released latest driver for Ashes So I think it's because of this :

http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

NVIDIA Maxwell 2 (900 Series) has 1 Graphic + 32 Compute = 1 + 32 Queues

AMD CGN Gen 2 has 1 Graphic + 8 compute = 1+ 64 Queues


----------



## SpeedyVT

Quote:


> Originally Posted by *PostalTwinkie*
> 
> What does this have to do with the price of tea in China?
> 
> Literally. What do your dates of the companies you are talking about (Which I am well aware of, as I was well and alive during their founding) have to do with my statements and the accuracy of them? The first time DX went "low level" was on the original Xbox back in 1998/1999.
> 
> I highly recommend that YOU know what YOU are talking about, before you butt in. Nothing about what you said makes what I said incorrect or changes it in any other way.
> 
> It is high time people stop screaming like a teenage girl at a Backstreet Boys concert, or more extreme an Elvis concert, about how AMD is first to low level. About how AMD pushed low level, blah blah blah. When AMD was beaten to the job by almost two decades!


All consoles are low level, XBox's low level was specifically designed for the XBox. If anyone knew about low level it's Nintendo and Sony before Microsoft. Low level DX is irrelevant to the PC world at the time because you'd have to rebuild the operating systems from the ground up and that was far more money than Microsoft could invest. After unifying their platforms to the point of today and rebuilding the operating it is only possible to deploy a low level access on such ah OS tailored for productivity and users.

XBox was a heavily modified Windows NT for OS and even it's low level was limited in terms of low level.

If you want to talk about any companies pushing level on the PC platform you'd be best to discuss Commodore 64. A lot of the older machines had very rudimentary but still to the metal coding. That was lost in the transition to the GUI.

Note I have not mentioned once AMD.

However I will now. AMD is not responsible for low level hardware access as it has been around since the time of computers. AMD's mantle was also limited in it's low level effectiveness, although Mantle was much lower than DX11. Currently the lowest level of hardware access for rendering ever is DX12 period. It's why any FPS gains from DX12 will occur with the aid of WDDM 2.0.


----------



## CrazyHeaven

Quote:


> Originally Posted by *PostalTwinkie*
> 
> What does this have to do with the price of tea in China?
> 
> Literally. What do your dates of the companies you are talking about (Which I am well aware of, as I was well and alive during their founding) have to do with my statements and the accuracy of them? The first time DX went "low level" was on the original Xbox back in 1998/1999.
> 
> I highly recommend that YOU know what YOU are talking about, before you butt in. Nothing about what you said makes what I said incorrect or changes it in any other way.
> 
> It is high time people stop screaming like a teenage girl at a Backstreet Boys concert, or more extreme an Elvis concert, about how AMD is first to low level. About how AMD pushed low level, blah blah blah. When AMD was beaten to the job by almost two decades!


They are trying to grab onto whatever hope they can about the future of AMD. We either need them to step it up or sell or license their patents to others so hopefully we can get another player. These benchmarks here are a step in the right direction. Later on in the year we will see what is really going on with AMD and dx12. My wallet loves competition.

Longjing tea is thought of as being the best. My wallet hates this since it keeps the tea prices high.


----------



## mtcn77

Hey, look! There's more: http://www.g-truc.net/post-0666.html
260X, 290X and Fury have the same amount of ACE's, though I don't know Fury's triangle throughput - 260x'es is 2 per cycle and 290X is 4.


----------



## Kand

Quote:


> Originally Posted by *escksu*
> 
> I am not sure how many pple are still playing SC2.....


It's still the biggest RTS oit there next to the various MOBAs, which still use dx9.


----------



## Wishmaker

Could DX12 be the saviour of MMOs when you play with AMD cards







??


----------



## maltamonk

Quote:


> Originally Posted by *Wishmaker*
> 
> Could DX12 be the saviour of MMOs when you play with AMD cards
> 
> 
> 
> 
> 
> 
> 
> ??


they are mostly cpu bound, so not really an issue for most gpu's amd or nvidia.


----------



## PostalTwinkie

Quote:


> Originally Posted by *CrazyHeaven*
> 
> They are trying to grab onto whatever hope they can about the future of AMD. We either need them to step it up or sell or license their patents to others so hopefully we can get another player. These benchmarks here are a step in the right direction. Later on in the year we will see what is really going on with AMD and dx12. My wallet loves competition.
> 
> Longjing tea is thought of as being the best. My wallet hates this since it keeps the tea prices high.


Your answer is the best, about the tea.

I have managed to find myself at Tea of the Month. Which is pretty fantastic it seems, as I drink a lot of tea!

+Rep.


----------



## revro

is there a link to benchmark to download the benchmark, or its just something they gave those newsreporters to test out? thank you


----------



## Ganf

Quote:


> Originally Posted by *revro*
> 
> is there a link to benchmark to download the benchmark, or its just something they gave those newsreporters to test out? thank you


You can buy into the alpha for 50 bucks. That's likely where they got it.


----------



## knightsilver

DX12 won't mean a dern thing unless drops the bogusness with Win10...


----------



## Dudewitbow

Quote:


> Originally Posted by *maltamonk*
> 
> they are mostly cpu bound, so not really an issue for most gpu's amd or nvidia.


The point of the lower level access is supposed to decrease CPU bound situations and make a game more gpu bound. Mmos can make the most of the situation, but cannot only be dx12 simply because mmos aim for compatability for most number of users(why many mmos are dx9)


----------



## revro

Quote:


> Originally Posted by *knightsilver*
> 
> DX12 won't mean a dern thing unless drops the bogusness with Win10...


yep. i am waiting for the marketshare data report from september to see how it is faring

https://www.netmarketshare.com/operating-system-market-share.aspx?qprid=10&qpcustomd=0


----------



## Mahigan

Well I figured I'd create an account in order to explain away what you're all seeing in the Ashes of the Singularity DX12 Benchmarks. I won't divulge too much of my background information but suffice to say that I'm an old veteran who used to go by the handle ElMoIsEviL.

First off nVidia is posting their true DirectX12 performance figures in these tests. Ashes of the Singularity is all about Parallelism and that's an area, that although Maxwell 2 does better than previous nVIDIA architectures, it is still inferior in this department when compared to the likes of AMDs GCN 1.1/1.2 architectures. Here's why...

Maxwell's Asychronous Thread Warp can queue up 31 Compute tasks and 1 Graphic task. Now compare this with AMD GCN 1.1/1.2 which is composed of 8 Asynchronous Compute Engines each able to queue 8 Compute tasks for a total of 64 coupled with 1 Graphic task by the Graphic Command Processor. See bellow:



Each ACE can also apply certain Post Processing Effects without incurring much of a performance penalty. This feature is heavily used for Lighting in Ashes of the Singularity. Think of all of the simultaneous light sources firing off as each unit in the game fires a shot or the various explosions which ensue as examples.



This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under Direct X 12.

Therefore in game titles which rely heavily on Parallelism, likely most DirectX 12 titles, AMD GCN 1.1/1.2 should do very well provided they do not hit a Geometry or Rasterizer Operator bottleneck before nVIDIA hits their Draw Call/Parallelism bottleneck. The picture bellow highlights the Draw Call/Parallelism superioty of GCN 1.1/1.2 over Maxwell 2:



A more efficient queueing of workloads, through better thread Parallelism, also enables the R9 290x to come closer to its theoretical Compute figures which just happen to be ever so shy from those of the GTX 980 Ti (5.8 TFlops vs 6.1 TFlops respectively) as seen bellow:



What you will notice is that Ashes of the Singularity is also quite hard on the Rasterizer Operators highlighting a rather peculiar behavior. That behavior is that an R9 290x, with its 64 Rops, ends up performing near the same as a Fury-X, also with 64 Rops. A great way of picturing this in action is from the Graph bellow (courtesy of Beyond3D):



As for the folks claiming a conspiracy theory, not in the least. The reason AMDs DX11 performance is so poor under Ashes of the Singularity is because AMD literally did zero optimizations for the path. AMD is clearly looking on selling Asynchronous Shading as a feature to developers because their architecture is well suited for the task. It doesn't hurt that it also costs less in terms of Research and Development of drivers. Asynchronous Shading allows GCN to hit near full efficiency without even requiring any driver work whatsoever.

nVIDIA, on the other hand, does much better at Serial scheduling of work loads (when you consider that anything prior to Maxwell 2 is limited to Serial Scheduling rather than Parallel Scheduling). DirectX 11 is suited for Serial Scheduling therefore naturally nVIDIA has an advantage under DirectX 11. In this graph, provided by Anandtech, you have the correct figures for nVIDIAs architectures (from Kepler to Maxwell 2) though the figures for GCN are incorrect (they did not multiply the number of Asynchronous Compute Engines by 8):



People wondering why Nvidia is doing a bit better in DX11 than DX12. That's because Nvidia optimized their DX11 path in their drivers for Ashes of the Singularity. With DX12 there are no tangible driver optimizations because the Game Engine speaks almost directly to the Graphics Hardware. So none were made. Nvidia is at the mercy of the programmers talents as well as their own Maxwell architectures thread parallelism performance under DX12. The Devellopers programmed for thread parallelism in Ashes of the Singularity in order to be able to better draw all those objects on the screen. Therefore what were seeing with the Nvidia numbers is the Nvidia draw call bottleneck showing up under DX12. Nvidia works around this with its own optimizations in DX11 by prioritizing workloads and replacing shaders. Yes, the nVIDIA driver contains a compiler which re-compiles and replaces shaders which are not fine tuned to their architecture on a per game basis. NVidia's driver is also Multi-Threaded, making use of the idling CPU cores in order to recompile/replace shaders. The work nVIDIA does in software, under DX11, is the work AMD do in Hardware, under DX12, with their Asynchronous Compute Engines.

But what about poor AMD DX11 performance? Simple. AMDs GCN 1.1/1.2 architecture is suited towards Parallelism. It requires the CPU to feed the graphics card work. This creates a CPU bottleneck, on AMD hardware, under DX11 and low resolutions (say 1080p and even 1600p for Fury-X), as DX11 is limited to 1-2 cores for the Graphics pipeline (which also needs to take care of AI, Physics etc). Replacing shaders or re-compiling shaders is not a solution for GCN 1.1/1.2 because AMDs Asynchronous Compute Engines are built to break down complex workloads into smaller, easier to work, workloads. The only way around this issue, if you want to maximize the use of all available compute resources under GCN 1.1/1.2, is to feed the GPU in Parallel... in comes in Mantle, Vulcan and Direct X 12.

People wondering why Fury-X did so poorly in 1080p under DirectX 11 titles? That's your answer.

A video which talks about Ashes of the Singularity in depth: 




PS. Don't count on better Direct X 12 drivers from nVIDIA. DirectX 12 is closer to Metal and it's all on the developer to make efficient use of both nVIDIA and AMDs architectures.


----------



## JunkoXan

^^^^ someone give him a Cookie, seems logical to me.


----------



## p4inkill3r

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Well I figured I'd create an account in order to explain away what you're all seeing in the Ashes of the Singularity DX12 Benchmarks. I won't divulge too much of my background information but suffice to say that I'm an old veteran who used to go by the handle ElMoIsEviL.
> 
> First off nVidia is posting their true DirectX12 performance figures in these tests. Ashes of the Singularity is all about Parallelism and that's an area, that although Maxwell 2 does better than previous nVIDIA architectures, it is still inferior in this department when compared to the likes of AMDs GCN 1.1/1.2 architectures. Here's why...
> 
> Maxwell's Asychronous Thread Warp can queue up 31 Compute tasks and 1 Graphic task. Now compare this with AMD GCN 1.1/1.2 which is composed of 8 Asynchronous Compute Engines each able to queue 8 Compute tasks for a total of 64 coupled with 1 Graphic task by the Graphic Command Processor. See bellow:
> 
> 
> 
> This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under Direct X 12.
> 
> Therefore in game titles which rely heavily on Parallelism, likely most DirectX 12 titles, AMD GCN 1.1/1.2 should do very well provided they do not hit a Geometry or Rasterizer Operator bottleneck before nVIDIA hits their Draw Call/Parallelism bottleneck. The picture bellow highlights the Draw Call/Parallelism superioty of GCN 1.1/1.2 over Maxwell 2:
> 
> 
> 
> A more efficient queueing of workloads, through better thread Parallelism, also enables the R9 290x to come closer to its theoretical Compute figures which just happen to be ever so shy from those of the GTX 980 Ti (5.8 TFlops vs 6.1 TFlops respectively) as seen bellow:
> 
> 
> 
> What you will notice is that Ashes of the Singularity is also quite hard on the Rasterizer Operators highlighting a rather peculiar behavior. That behavior is that an R9 290x, with its 64 Rops, ends up performing near the same as a Fury-X, also with 64 Rops. A great way of picturing this in action is from the Graph bellow (courtesy of Beyond3D):
> 
> 
> 
> As for the folks claiming a conspiracy theory, not in the least. The reason AMDs DX11 performance is so poor under Ashes of the Singularity is because AMD literally did zero optimizations for the path. AMD is clearly looking on selling Asynchronous Shading as a feature to developers because their architecture is well suited for the task. It doesn't hurt that it also costs less in terms of Research and Development of drivers. Asynchronous Shading allows GCN to hit near full efficiency without even requiring any driver work whatsoever.
> 
> nVIDIA, on the other hand, does much better at Serial scheduling of work loads (when you consider that anything prior to Maxwell 2 is limited to Serial Scheduling rather than Parallel Scheduling). DirectX 11 is suited for Serial Scheduling therefore naturally nVIDIA has an advantage under DirectX 11. In this graph, provided by Anandtech, you have the correct figures for nVIDIAs architectures (from Kepler to Maxwell 2) though the figures for GCN are incorrect (they did not multiply the number of Asynchronous Compute Engine by 8):
> 
> 
> 
> People wondering why Nvidia is doing a bit better in DX11 than DX12. That's because Nvidia optimized their DX11 path in their drivers for Ashes of the Singularity. With DX12 there are no tangible driver optimizations because the Game Engine speaks almost directly to the Graphics Hardware. So none were made. Nvidia is at the mercy of the programmers talents as well as their own Maxwell architectures thread parallelism performance under DX12. The Devellopers programmed for thread parallelism in Ashes of the Singularity in order to be able to better draw all those objects on the screen. Therefore what were seeing with the Nvidia numbers is the Nvidia draw call bottleneck showing up under DX12. Nvidia works around this with its own optimizations in DX11 by prioritizing workloads and replacing shaders. Yes, the nVIDIA driver contains a compiler which re-compiles and replaces shaders which are not fine tuned to their architecture on a per game basis.
> 
> PS. Don't count on better Direct X 12 drivers from nVIDIA. DirectX 12 is closer to Metal and it's all on the developer to make efficient use or either nVIDIA or AMDs architecture.


Great post, very informative.


----------



## Glottis

www.ashesofthesingularity.com

notice AMD logo at the bottom of that page









like post above says. "Nvidia is at the mercy of the programmers talents", and since this game is sponsored/partnered with AMD... let's just say optimizing for nvidia probably wasn't on their priority lists. i think we'll have a clearer picture of DX12 performance when we have a lot larger sample of DX12 games/benchmarks to choose from.


----------



## mav451

If you're trying to make me question my 980Ti purchase, it's working great haha


----------



## mtcn77

Quote:


> Originally Posted by *Glottis*
> 
> www.ashesofthesingularity.com
> 
> notice AMD logo at the bottom of that page
> 
> 
> 
> 
> 
> 
> 
> 
> 
> like post above says. "Nvidia is at the mercy of the programmers talents", and since this game is sponsored/partnered with AMD... let's just say optimizing for nvidia probably wasn't on their priority lists. i think we'll have a clearer picture of DX12 performance when we have a lot larger sample of DX12 games/benchmarks to choose from.


What about the correlating theoretical benchmarks part?


----------



## zipper17

nvidia you better release Pascal card soon


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> www.ashesofthesingularity.com
> 
> notice AMD logo at the bottom of that page
> 
> 
> 
> 
> 
> 
> 
> 
> 
> like post above says. "Nvidia is at the mercy of the programmers talents", and since this game is sponsored/partnered with AMD... let's just say optimizing for nvidia probably wasn't on their priority lists. i think we'll have a clearer picture of DX12 performance when we have a lot larger sample of DX12 games/benchmarks to choose from.


The AMD logo is there because the developers first started to program their game using the AMD Mantle API. The game they wanted to build was pretty much impossible without Mantle. They built their game on AMD Mantle only to port it over to Direct X 12 afterwards (Mantle and Direct X 12 being incredibly similar).

The developer also worked closely with both nVIDIA and AMD. That's why you see nVIDIA's rather impressive DX11 performance. nVIDIA has had access to the code for over a year now (as have AMD). All of this is verifiable on the Developers blog: http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/
Quote:


> Unfortunately, we have to make some corrections because as always there is misinformation. There are incorrect statements regarding issues with MSAA. Specifically, that the application has a bug in it which precludes the validity of the test. We assure everyone that is absolutely not the case. Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Well I figured I'd create an account in order to explain away what you're all seeing in the Ashes of the Singularity DX12 Benchmarks. I won't divulge too much of my background information but suffice to say that I'm an old veteran who used to go by the handle ElMoIsEviL.
> 
> First off nVidia is posting their true DirectX12 performance figures in these tests. Ashes of the Singularity is all about Parallelism and that's an area, that although Maxwell 2 does better than previous nVIDIA architectures, it is still inferior in this department when compared to the likes of AMDs GCN 1.1/1.2 architectures. Here's why...
> 
> Maxwell's Asychronous Thread Warp can queue up 31 Compute tasks and 1 Graphic task. Now compare this with AMD GCN 1.1/1.2 which is composed of 8 Asynchronous Compute Engines each able to queue 8 Compute tasks for a total of 64 coupled with 1 Graphic task by the Graphic Command Processor. See bellow:
> 
> 
> 
> This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under Direct X 12.
> 
> Therefore in game titles which rely heavily on Parallelism, likely most DirectX 12 titles, AMD GCN 1.1/1.2 should do very well provided they do not hit a Geometry or Rasterizer Operator bottleneck before nVIDIA hits their Draw Call/Parallelism bottleneck. The picture bellow highlights the Draw Call/Parallelism superioty of GCN 1.1/1.2 over Maxwell 2:
> 
> 
> 
> A more efficient queueing of workloads, through better thread Parallelism, also enables the R9 290x to come closer to its theoretical Compute figures which just happen to be ever so shy from those of the GTX 980 Ti (5.8 TFlops vs 6.1 TFlops respectively) as seen bellow:
> 
> 
> 
> What you will notice is that Ashes of the Singularity is also quite hard on the Rasterizer Operators highlighting a rather peculiar behavior. That behavior is that an R9 290x, with its 64 Rops, ends up performing near the same as a Fury-X, also with 64 Rops. A great way of picturing this in action is from the Graph bellow (courtesy of Beyond3D):
> 
> 
> 
> As for the folks claiming a conspiracy theory, not in the least. The reason AMDs DX11 performance is so poor under Ashes of the Singularity is because AMD literally did zero optimizations for the path. AMD is clearly looking on selling Asynchronous Shading as a feature to developers because their architecture is well suited for the task. It doesn't hurt that it also costs less in terms of Research and Development of drivers. Asynchronous Shading allows GCN to hit near full efficiency without even requiring any driver work whatsoever.
> 
> nVIDIA, on the other hand, does much better at Serial scheduling of work loads (when you consider that anything prior to Maxwell 2 is limited to Serial Scheduling rather than Parallel Scheduling). DirectX 11 is suited for Serial Scheduling therefore naturally nVIDIA has an advantage under DirectX 11. In this graph, provided by Anandtech, you have the correct figures for nVIDIAs architectures (from Kepler to Maxwell 2) though the figures for GCN are incorrect (they did not multiply the number of Asynchronous Compute Engine by 8):
> 
> 
> 
> 
> 
> People wondering why Nvidia is doing a bit better in DX11 than DX12. That's because Nvidia optimized their DX11 path in their drivers for Ashes of the Singularity. With DX12 there are no tangible driver optimizations because the Game Engine speaks almost directly to the Graphics Hardware. So none were made. Nvidia is at the mercy of the programmers talents as well as their own Maxwell architectures thread parallelism performance under DX12. The Devellopers programmed for thread parallelism in Ashes of the Singularity in order to be able to better draw all those objects on the screen. Therefore what were seeing with the Nvidia numbers is the Nvidia draw call bottleneck showing up under DX12. Nvidia works around this with its own optimizations in DX11 by prioritizing workloads and replacing shaders. Yes, the nVIDIA driver contains a compiler which re-compiles and replaces shaders which are not fine tuned to their architecture on a per game basis. NVidia's driver is also Multi-Threaded, making use of the idling CPU cores in order to recompile/replace shaders. The work nVIDIA does in software, under DX11, is the work AMD do in Hardware with their Asynchronous Compute Engines.
> 
> But what about poor AMD DX11 performance? Simple. AMDs GCN 1.1/1.2 architecture is suited towards Parallelism. It requires the CPU to feed the graphics card work. This creates a CPU bottleneck, on AMD hardware, under DX11 and low resolutions (say 1080p and even 1600p for Fury-X), as DX11 is limited to 1-2 cores for the Graphics pipeline (which also needs to take care of AI, Physics etc). Replacing shaders or re-compiling shaders is not a solution for GCN 1.1/1.2 because AMDs Asynchronous Compute Engines are built to break down complex workloads into smaller, easier to work, workloads. The only way around this issue, if you want to maximize the use of all available compute resources under GCN 1.1/1.2, is to feed the GPU in Parallel... in comes in Mantle, Vulcan and Direct X 12.
> 
> People wondering why Fury-X did so poorly in 1080p under DirectX 11 titles? That's your answer.
> 
> PS. Don't count on better Direct X 12 drivers from nVIDIA. DirectX 12 is closer to Metal and it's all on the developer to make efficient use or either nVIDIA or AMDs architecture.


Judging by what you are saying, this would be a hardware limitation for Nvidia in nature. There probably is very limited room than for DX12 optimization in Maxwell or any of the previous architectures. I presume that they will address this in Pascal, along with introducing HBM2. Their main saving grace then, will be the fact that it will be years before DX12 proliferates and by then, most people will have swapped their GPUs to something newer.

As far as AMD, I have always wondered if AMD should have included 12 ACEs and 96 ROPs (versus 8 ACEs and 64 ROPs) on the Fury X. I think it would have been better had they cut the shaders to say, 3840 to make it happen. They kept these figures unchanged compared to the 290X from the Fury X. Can you comment on whether this was possible or not? I would hope that for the next architecture, they can make this happen on 16nm. Combined with HBM2, this would make for an impressive gain on DX12.

Barring major architectural changes to Nvidia, it would seem that by nature, DX12 (which shares numerous similarities with Mantle and if the claims here are correct, are at times, outright copied over) favor AMD. So too does Vulkan.

Edit:
I will note that the 290X had 2816 shaders on 64 ROPs (so 1 ROP per 44 shaders) and I believe 8 ACEs as well (can someone confirm?).

Edit 2:
Would Nvidia have known about this before? Because if the other rumor is true, Pascal has already taped out. A serious change like this could take months.


----------



## sugarhell

Quote:


> Originally Posted by *CrazyElf*
> 
> Judging by what you are saying, this would be a hardware limitation for Nvidia in nature. There probably is very limited room than for DX12 optimization in Maxwell or any of the previous architectures. I presume that they will address this in Pascal, along with introducing HBM2. Their main saving grace then, will be the fact that it will be years before DX12 proliferates and by then, most people will have swapped their GPUs to something newer.
> 
> As far as AMD, I have always wondered if AMD should have included 12 ACEs and 96 ROPs (versus 8 ACEs and 64 ROPs) on the Fury X. I think it would have been better had they cut the shaders to say, 3840 to make it happen. They kept these figures unchanged compared to the 290X from the Fury X. Can you comment on whether this was possible or not? I would hope that for the next architecture, they can make this happen on 16nm. Combined with HBM2, this would make for an impressive gain on DX12.


They cant. Gcn atm is limited to 16 ROPs per shader engine iirc so 64 ROPs max


----------



## mtcn77

I want Sanitarium with Tornado hallucinations atm. I hope new RPG and RTS games pop up with isometric viewpoint.


----------



## Mahigan

Quote:


> Originally Posted by *sugarhell*
> 
> They cant. Gcn atm is limited to 16 ROPs per shader engine iirc so 64 ROPs max


This.

There are 4 Render Back End per Shader Engine. There are 4 Shader Engines in the R9 290x/Fury-X. This means a total of 16 Render Back Ends which can each output 4 color ROPs per clock cycle for a total of 64 Rops.

AMD would need to create a lot of redundant hardware with GCN in order to scale up the number of Render Back Ends. This would result in a rather large GPU. I don't doubt that this is what we're going to see in 2016 with AMDs next update to GCN. I would probably guess at 32 Render Back Ends for a total of 128 ROPs. This is not feasible on the current node process upon which Fiji and Hawaii were built.


----------



## Clocknut

Quote:


> Originally Posted by *Dudewitbow*
> 
> The point of the lower level access is supposed to decrease CPU bound situations and make a game more gpu bound. Mmos can make the most of the situation, but cannot only be dx12 simply because mmos aim for compatability for most number of users(why many mmos are dx9)


the last fastest DX9 card is 7900. That is not even close to 8800gt, which is most games use 8800gt as minimum requirement nowadays. I wonder why they choose to stick to DX9 if the minimum req is already 8800gt.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Clocknut*
> 
> the last fastest DX9 card is 7900. That is not even close to 8800gt, which is most games use 8800gt as minimum requirement nowadays. I wonder why they choose to stick to DX9 if the minimum req is already 8800gt.


DX10 was only a step for DX11. Was only in high end games and thats it. It was either DX9 or DX11. They will make games in the future and say GTX780 min and DX11.


----------



## Mahigan

Quote:


> Originally Posted by *CrazyElf*
> 
> Edit:
> I will note that the 290X had 2816 shaders on 64 ROPs (so 1 ROP per 44 shaders) and I believe 8 ACEs as well (can someone confirm?).


Not exactly as each Shader Engine has 4 Render Back Ends and each Render Back End can process 4 color Rops per cycle for a total of 16 color Rops per cycle per Shader Engine. Each Shader Engine is comprised of 704 ALU (Shaders) which are split amongst 11 CUs (Compute Units). There are 4 Shader Engines total in Hawaii.

As for the Asynchronous Compute Units, there are 8 of them in Hawaii, which can queue up 8 workloads each (64 Queues total).



Quote:


> Edit 2:
> Would Nvidia have known about this before? Because if the other rumor is true, Pascal has already taped out. A serious change like this could take months.


I have little doubt that NVIDIA Pascal will behave more like GCN, on steroids mind you, than it will Maxwell 2. Maxwell 2, like Kepler, was patched up in haste in order to keep up with AMD, and the market, by allowing incremental degrees of Parallelism. NVIDIA is akin to a Conservative, taking little risks since the GeForce FX debacle, whereas AMD are akin to a Liberal... always taking risks which sometimes pan out well and other times don't (in the form of un-used features or architectures which are too forward thinking such as R600). That's the analogy I like to use.


----------



## ZealotKi11er

I know Nvidia came beat AMD in DX12 and i know they will. You will just need Pascal i am afraid.


----------



## WheelZ0713

After reading this entire thread and understanding at least 50% of it. I'm pretty happy that i have a Fury on it's way to me right now.

Of late i have favored AMD cards, with no real justification. This time around, i was legitimately considering either and settled on the Fury, purely for it's price point.

Pretty happy that i seem to have lucked out a little further.


----------



## error-id10t

Who is this person who comes here and suddenly starts making sense, don't see that too often. Nicely explained.


----------



## Exilon

If driver optimizations don't do much for DX12 games and GCN is inherently better than Maxwell V2 at parallel games, what's happening here?


----------



## SpeedyVT

Quote:


> Originally Posted by *Exilon*
> 
> If driver optimizations don't do much for DX12 games and GCN is inherently better than Maxwell V2 at parallel games, what's happening here?


DX12 was still a WIP while that was released. Current DX12 scales significantly better than Mantle against the card's flops.


----------



## Kpjoslee

While the benchmark results definitely shows AMD and its GCN architecture is indeed well suited for Directx 12, seems like it is too early to make conclusive judgement based on just 1 game on beta stage. DirectX 12 is more close to the metal than Directx 11 was, but I don't think it will render driver side optimization completely obsolete just yet.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Kpjoslee*
> 
> While the benchmark results definitely shows AMD and its GCN architecture is indeed well suited for Directx 12, seems like it is too early to make conclusive judgement based on just 1 game on beta stage. DirectX 12 is more close to the metal than Directx 11 was, but I don't think it will render driver side optimization completely obsolete just yet.


True but any disadvantage AMD had because of CPU overhead will be gone. AMD cards will get back 1080p performance.


----------



## CrazyElf

We need more games to just but I will say this if this is indicative of other games: Fury might be a bit more appealing now, although not by much. It still lacks the important posts (like DL DVI for me, although some custom PCBs have it).

More future proof with DX12
Crossfire Fury will scale better than SLI 980Ti even on DX11
Single GPU performance on average amongst all games at 4k doesn't make the Fury look that bad on DX11 and with DX12 should even lead
The big limitation I guess is that the Fury has only 4GB of HBM VRAM. I'm not sure how much VRAM DX12 games will need but I like 6GB more than 4GB for sure. The other is OC headroom of course - that would eliminate any single GPU gains, and leave AMD with just superior CF Fury scaling compared to 980Ti SLI scaling.

The other of course is that there's only going to be a handful of DX12 games. The problem I see is that by the time DX12 proliferates, all current GPUs will be obsolete. It may be a good time to wait.

Quote:


> Originally Posted by *Mahigan*
> 
> Not exactly as each Shader Engine has 4 Render Back Ends and each Render Back End can process 4 color Rops per cycle for a total of 16 color Rops per cycle per Shader Engine. Each Shader Engine is comprised of 704 ALU (Shaders) which are split amongst 11 CUs (Compute Units). There are 4 Shader Engines total in Hawaii.
> 
> As for the Asynchronous Compute Units, there are 8 of them in Hawaii, which can queue up 8 workloads each (64 Queues total).
> 
> 
> I have little doubt that NVIDIA Pascal will behave more like GCN, on steroids mind you, than it will Maxwell 2. Maxwell 2, like Kepler, was patched up in haste in order to keep up with AMD, and the market, by allowing incremental degrees of Parallelism. NVIDIA is akin to a Conservative, taking little risks since the GeForce FX debacle, whereas AMD are akin to a Liberal... always taking risks which sometimes pan out well and other times don't (in the form of un-used features or architectures which are too forward thinking such as R600). That's the analogy I like to use.


Thanks for the explanation. +Rep

So for a comparison then, Fiji would have 1024 ALU (4x so 4096 shaders), amongst 16 CU?


Looking at the Anandtech article:
Quote:


> Meanwhile the command processor/ACE structure remains unchanged from Hawaii. We're still looking at a single graphics command processor paired up with 8 Asynchronous Compute Engines here, and if AMD has made any changes to this beyond what is necessary to support the GCN 1.2 feature set (e.g. context switching, virtualization, and FP16), then they have not disclosed it. AMD is expecting asynchronous shading to be increasingly popular in the coming years, especially in the case of VR, so Fiji's front-end is well-geared towards the future AMD is planning for.
> 
> ...
> 
> Starting with the ROPs, the ROP situation for Fiji remains more or less unchanged from Hawaii. Hawaii shipped with 64 ROPs grouped in to 16 Render Backends (RBs), which at the time AMD told us was the most a 4 shader engine GCN GPU could support. And I suspect that limit is still in play here, leading to Fiji continuing to pack 64 ROPs. Given that AMD just went from 32 to 64 a generation ago, another jump seemed unlikely anyhow (despite earlier rumors to the contrary), but in the end I suspect that AMD had to consider architectural limits just as much as they had to consider performance tradeoffs of more ROPs versus more shaders.
> 
> ...
> Speaking of caches, Fiji's L2 cache has been upgraded as well. With Hawaii AMD shipped a 1MB cache, and now with Fiji that cache has been upgraded again to 2MB. Even with the increase in memory bandwidth, going to VRAM is still a relatively expensive operation, so trying to stay on-cache is beneficial up to a point, which is why AMD spent the additional transistors here to double the L2 cache. Both AMD and NVIDIA have gone with relatively large L2 caches in this latest round, and with their latest generation color compression technologies it makes a lot of sense; since the L2 cache can store color-compressed tiles, all of a sudden L2 caches are a good deal more useful and worth the space they consume.


This would corroborate your hypothesis that the AMD is more parallel oriented. The other interesting point here is the color compression (which drove the increase in cache size). I will note Nvidia for Maxwell 2 did the exact same as well - increased cache size. Finally of course, compared to Hawaii the addition of HBM is the real changer here.

If you think about it though, it's amazing that AMD has managed to even keep up to the extent that it has against Nvidia - especially when you consider the vast disparity in R&D spending between the two companies. Perhaps due to finances, AMD had no choice but to try new technologies? Then again they have been at the forefront a lot, GDDR5 comes to mind. HBM is not the first time they've tried to introduce something drastically new.

Next generation then for Nvidia:

Nvidia introduces HBM2 on its GPUs, presumably increasing the bandwidth a great deal
The front end becomes more parallel as you've described - so perhaps 128 compute units like AMD?
Another architectural revision? Will we see gains comparable from SMX (Kepler) to SMM (Maxwell)? Those were huge, combined with the cache increase.
Nvlink will be introduced?
I think that this could be one of the biggest gains in GPUs since the 8800 GTX was launched then if this comes to fruition.

Meanwhile at AMD:

Like Nvidia, AMD will release HBM2. Whether or not their experience with HBM will lead to a better memory controller than Nvidia's remains to be seen though.
A major revision of GCN is expected. See here (http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-readies-three-new-gpus-for-2016-greenland-baffin-and-ellesmere/)
As you've said, I presume that they'll add at least 128 ROPs (due to the higher transistor budget)

The big issue I see at AMD is that they lack the R&D money to keep up at this point. I hope AMD bounces back spectacularly though - we need 2 vendors.


----------



## rtikphox

Did they just necro'd Supreme Commander 2 from 2010. The graphics seem poorly optimized too. If they gonna showcase DX12 do it with a car game or space fight like Mantle.


----------



## Clocknut

Quote:


> Originally Posted by *ZealotKi11er*
> 
> DX10 was only a step for DX11. Was only in high end games and thats it. It was either DX9 or DX11. They will make games in the future and say GTX780 min and DX11.


Still no reason to use DX9 when ur customer have 8800gt & above.


----------



## SpeedyVT

Quote:


> Originally Posted by *rtikphox*
> 
> Did they just necro'd Supreme Commander 2 from 2010. The graphics seem poorly optimized too. If they gonna showcase DX12 do it with a car game or space fight like Mantle.


It's up to the developers. Not Microsoft or their platform. Developers have to support DX12.


----------



## Dudewitbow

Quote:


> Originally Posted by *Clocknut*
> 
> Still no reason to use DX9 when ur customer have 8800gt & above.


the DX9 requirement that exist in some games today was primarily to include computers that still run XP and Vista, which is just above like 10%. Its less of a hardware issue and more of a maximize target audience.


----------



## Exilon

Quote:


> Originally Posted by *SpeedyVT*
> 
> DX12 was still a WIP while that was released. Current DX12 scales significantly better than Mantle against the card's flops.


So if AMD loses in a tech demo using beta API, it's the beta API's fault. If AMD wins in an alpha version of a game, it's AMD's inherently superior hardware. LMAO.

Well, let's ask Oxide games on their opinion about driver optimizations for DX12:
Quote:


> So what is going on then? Our analysis indicates that any D3D12 problems are quite mundane. New API, new drivers. Some optimizations that the drivers are doing in DX11 just aren't working in DX12 yet. Oxide believes it has identified some of the issues with MSAA and is working to implement workarounds on our code. This in no way affects the validity of a DX12 to DX12 test, as the same exact workload gets sent to everyone's GPUs. This type of optimization is just the nature of brand new APIs with immature drivers.
> 
> Immature drivers are nothing to be concerned about. This is the simple fact that DirectX 12 is brand-new and it will take time for developers and graphics vendors to optimize their use of it. We remember the first days of DX11. Nothing worked, it was slower then DX9, buggy and so forth. It took years for it to be solidly better then previous technology. DirectX12, by contrast, is in far better shape then DX11 was at launch. Regardless of the hardware, DirectX 12 is a big win for PC gamers. It allows games to make full use of their graphics and CPU by eliminating the serialization of graphics commands between the processor and the graphics card.



Or what Intel says about driver optimizations in DX12:



In other words, reduction in API overhead and lower-level API give more opportunities for driver optimizations.
Quote:


> Originally Posted by *rtikphox*
> 
> Did they just necro'd Supreme Commander 2 from 2010. The graphics seem poorly optimized too. If they gonna showcase DX12 do it with a car game or space fight like Mantle.


And yes, this is TA reboot #4. Supreme Commander was awesome but after Supreme Commander 2 and PA I'm going to wait until release to buy in.


----------



## SpeedyVT

Quote:


> Originally Posted by *Exilon*
> 
> So if AMD loses in a tech demo using beta API, it's the beta API's fault. If AMD wins in an alpha version of a game, it's AMD's inherently superior hardware. LMAO.
> 
> Well, let's ask Oxide games on their opinion about driver optimizations for DX12:
> 
> Or what Intel says about driver optimizations in DX12:
> 
> 
> 
> In other words, reduction in API overhead and lower-level API give more opportunities for driver optimizations.
> And yes, this is TA reboot #4. Supreme Commander was awesome but after Supreme Commander 2 and PA I'm going to wait until release to buy in.


Don't spin it man. DX12 from over a year ago with weak drivers was weak. No matter the card. APIs, Drivers and DX12 go hand in hand. I'm not saying NVidia or AMD has anything superior. I'm just saying the current DX12 is better than from the Star Swarm comparison. I would like to add that both companies did an amazing job getting their GPUs ready for DX12.


----------



## Silent Scone

Quote:


> Originally Posted by *SpeedyVT*
> 
> Don't spin it man. DX12 from over a year ago with weak drivers was weak. No matter the card. APIs, Drivers and DX12 go hand in hand. I'm not saying NVidia or AMD has anything superior. I'm just saying the current DX12 is better than from the Star Swarm comparison. I would like to add that both companies did an amazing job getting their GPUs ready for DX12.


Personal preferences and 'fanboy' remarks aside, over 80% market share says something is superior.


----------



## mtcn77

Quote:


> Originally Posted by *Silent Scone*
> 
> Personal preferences and 'fanboy' remarks aside, over 80% market share says something is superior.


It could also mean victims cannot shake off their Stockholm Syndrome, but it is just my guess, really.


----------



## Silent Scone

Quote:


> Originally Posted by *mtcn77*
> 
> It could also mean victims cannot shake off their Stockholm Syndrome, but it is just my guess, really.


Yeah, I suppose it depends on your perspective.

Mines more realistic.


----------



## SpeedyVT

Quote:


> Originally Posted by *Silent Scone*
> 
> Personal preferences and 'fanboy' remarks aside, over 80% market share says something is superior.


Obviously you've never heard of the company Apple. Market share has nothing to do with why consumers buy something. Green is a very beautiful color, that alone can sway mindless masses to buy it.

http://www.livescience.com/34105-favorite-colors.html

A great example before AMD had purchased ATI, AMD used to be Green. Still wasn't beating Intel in shares, but significantly had far more! Switched to Red for FX series lost tons of shares. I'm totally leaving out performance and performance issues outside of the discussion.

Color Theory is really interesting.

Guarantee if Intel changed it's logo color to red... heck even yellow! It'd lose too many shares.


----------



## mtcn77

Quote:


> Originally Posted by *Silent Scone*
> 
> Yeah, I suppose it depends on your perspective.
> 
> Mines more *realistic*.


More likely relativistic, imo. You select a brand, hoping it does what you hope it will. That creates a favoured opinion, me thinks, that doesn't get disenchanted until you have to reevaluate it which doesn't happen quite often when you have ignored the plausibility of such an idea first and foremost(your expectation was that it was an absolute). Pursuing on your beliefs gives you solidarity, resolves the ambiguity.


----------



## ku4eto

Quote:


> Originally Posted by *Silent Scone*
> 
> Personal preferences and 'fanboy' remarks aside, over 80% market share says something is superior.


Over 80% ? Get real. dGPU is 76/24.


----------



## Exilon

Quote:


> Originally Posted by *mtcn77*
> 
> More likely relativistic, imo. You select a brand, hoping it does what you hope it will. That creates a favoured opinion, me thinks, that doesn't get disenchanted until you have to reevaluate it which doesn't happen quite often when you have ignored the plausibility of such an idea first and foremost(your expectation was that it was an absolute). Pursuing on your beliefs gives you solidarity, resolves the ambiguity.


Makes sense. Also explains why a bunch of people latched on to the idea that DX12 somehow doesn't and can't get driver optimizations (and therefore Nvidia is screwed), despite the Oxide game dev and Intel driver dev saying DX12 is full of opportunities for driver optimization.


----------



## Olivon

Quote:


> Originally Posted by *ku4eto*
> 
> Over 80% ? Get real. dGPU is 76/24.


Last numbers give 82/18 for dGPU according to Mercury Research :

http://www.techspot.com/news/61832-amd-market-share-continues-collapse.html


----------



## Mahigan

Quote:


> Originally Posted by *Exilon*
> 
> If driver optimizations don't do much for DX12 games and GCN is inherently better than Maxwell V2 at parallel games, what's happening here?


The Star Swarm test makes use of 100,000 Draw Calls which does not bottleneck either nVIDIAs Maxwell or AMDs GCN 1.1/1.2. That is just about the only DirectX 12 feature it uses (the ability to draw more units on the screen). More units means more triangles. Star Swarm does not make use of Asynchronous Shading (Parallel Shading) or the subsequent Post Processing effects as seen under Ashes of the Singularity.

As I mentioned in my first post, both the R9 290x an the Fury-X are bottlenecked at the Rasterizer Operators. They're both Triangle rate limited as a result of having less Render Back Ends than nVIDIAs Maxwell parts.



What you're seeing in this test is this Triangle rate limitation. Odds are that a Fury-X would perform nearly identical to a 290x under this Star Swarm test as it is not indicative of DX12 performance (making no use of any new DX12 features). It is basically a largescale Render Back End stress test.

Once you throw in Asynchronous Shading, nVIDIAs Serial architectures take a noticeable dive in performance while AMDs Parallel architectures take little to no performance hit. This is expressed in this slide provided by AMD (also included in my original post):


AMDs GCN 1.1/1.2's inclusion of Asychronous Compute Engines enables AMD to perform Post Processing effects onto queued workloads without the need to tax any other areas of the GPU. Basically each ACE is not only a Queue for Parallel workloads but also a Shader processor. nVIDIA lacks this feature, any Post Processing effects for nVIDIA need to be processed by their arrays of SPUs. This incurs a rather noticeable performance penalty on nVIDIA hardware because of a lack of dedicated hardware. So while Asynchronous time warp is also a capability of NVIDIA's Maxwell-based GPU and was already detailed ever since the GTX 980 launch, nVIDIA, unlike AMD, did not dedicate compute resources towards this task.

In the image bellow you see AMDs dedicated Asynchronous Compute Engines Highlighted:


In this image AMD discuss the Performance benefits:


Since both the Xbox One as well as the Playstation 4 include Asynchronous Compute Engines (2 for the Xbox one like those found in a Radeon 7970 and 8 for the PS4 like those found in Hawaii/Fiji) then it is almost a given that developers will be making use of this feature. Microsoft added this feature into Direct X 12 for this reason.

This is a case where AMDs close collaboration with consoles will surely pay off. This feature should be welcomed by any R9 200 series owner as it will breath new life into their aging hardware.


----------



## Serios

Quote:


> Originally Posted by *Olivon*
> 
> Last numbers give 82/18 for dGPU according to Mercury Research :
> 
> http://www.techspot.com/news/61832-amd-market-share-continues-collapse.html


Well than read this.
76% is the realistic number, 82% was just a spike it doesn't show the general state of dgpu market share for Q2 2015.


----------



## sugarhell

^ you are 90% correct Mahigan but you forget that they changed a bit the structure of ROPs on fury x. Now they can use more bandwidth than the hawaii ROPs to increase their output without the need to increase the count.

http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/4 -At the end of the page

Also :
The last bench
http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/23


----------



## Mahigan

Quote:


> Originally Posted by *sugarhell*
> 
> ^ you are 90% correct Mahigan but you forget that they changed a bit the structure of ROPs on fury x. Now they can use more bandwidth than the hawaii ROPs to increase their output without the need to increase the count.
> 
> http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/4 -At the end of the page
> 
> Also :
> The last bench
> http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/23


It would seem that you're correct. In terms of Pixel Fillrate, the Color Compression appears to help under 3D Mark Vantage Pixel Fill:


The likely explanation is that the Color Compression can only derive benefits under certain scenarios. Such senarios would be in terms of Pixel Fillrate (as the GPU is filling pixels with colors). Might be worth playing around with a Fury in order to obtain a more detailed analysis.

EDIT: I believe I understand what we're seeing, the Color Compression does not help the Peak Rasterization rate (expressed in GTris/s) but it does assist in filling pixels with colors (Pixel Fillrate GPixel/s). This is probably why this feature has little benefits under Ashes of the Singularity. The game is rendering many units, each made up of triangles, taxing both Hawaii and Fiji's ROPs to the point where they become the bottleneck. I think we can deduce that Ashes of the Singularity is bottlenecked at the Gtris/s Peak Rasterization rate for AMDs Hawaii and Fiji. Star Swarm would likely suffer the same fate seeing as it is also a test where many units are drawn onto the screen using triangles.


----------



## sugarhell

Quote:


> Originally Posted by *Mahigan*
> 
> It would seem that you're correct. Odd that this did not show up under the Beyond3D tests. It does, however, show up under 3D Mark Vantage Pixel Fill:
> 
> 
> The likely explanation is that the Color Compression can only derive benefits under certain scenarios. Might be worth playing around with a Fury in order to obtain a more detailed analysis.


It seems to be a miss or hit. We need better synthetics i think, not just an e-peen fps meter that boost your ego (firestrike) but synthetics with advanced statistics


----------



## Mahigan

Quote:


> Originally Posted by *sugarhell*
> 
> It seems to be a miss or hit. We need better synthetics i think, not just an e-peen fps meter that boost your ego (firestrike) but synthetics with advanced statistics


I edited my post to differentiate both Pixel Fillrate and Peak Rasterization Rate. This would explain what we're seeing.

Thank You for catching that









It is becoming increasingly clear that nVIDIAs Kepler and Maxwell architectures suffer from a Parallel Compute bottleneck whereas AMDs GCN 1.1/1.2 architectures suffer from a Peak Rasterization Rate bottleneck.

Varying on how DirectX12 games are programmed, and what features they utilize, we should see an overall parity between both nVIDIA and AMD in terms of average frames per second. Some titles might be programmed with less Compute in mind, benefitting nVIDIA, while others programmed with less Triangles, benefitting AMD.

One thing is for certain, nVIDIA will not be able to replace shaders, as they would do under DX11 in their drivers, which means that a longstanding nVIDIA advantage will vanish under DirectX12.


----------



## Lantian

Seriously why are people hyping async shaders so much, they have already been used on ps4 and even on pc thief used them in mantle, there was marginal benefit if any from using async shaders, so again where is the hype coming from or is this the same thing as fury x launch...


----------



## ku4eto

Quote:


> Originally Posted by *Lantian*
> 
> Seriously why are people hyping async shaders so much, they have already been used on ps4 and even on pc thief used them in mantle, there was marginal benefit if any from using async shaders, so again where is the hype coming from or is this the same thing as fury x launch...


The possibilites are important, not what it currently does, but what IT CAN do.


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> The AMD logo is there because the developers first started to program their game using the AMD Mantle API. The game they wanted to build was pretty much impossible without Mantle. They built their game on AMD Mantle only to port it over to Direct X 12 afterwards (Mantle and Direct X 12 being incredibly similar).
> 
> The developer also worked closely with both nVIDIA and AMD. That's why you see nVIDIA's rather impressive DX11 performance. nVIDIA has had access to the code for over a year now (as have AMD). All of this is verifiable on the Developers blog: http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/


"impossible without Mantle", that's just blatant marketing speech (talks like that already sets off red flags for me). as we can see game has DX11 mode, so not impossible without Mantle after all. funny how things change. but what I really wanted to say is this game is heavily influenced by AMD, so obviously they used AMD's strengths while ignoring Nvidia's strengths when optimizing this game's DX12 mode.

impressive DX11 performance? well ofcourse, 980Ti is magnitudes faster than 290X. but if we are to believe this benchmark and developers of this game that in DX12 290X is about equal performance to 980Ti. I'm sorry, but that's just absurd and I don't believe that for a second and anyone who believes this one benchmark as definitive DX12 performance standard is very naive.


----------



## ku4eto

Quote:


> Originally Posted by *Glottis*
> 
> "impossible without Mantle", that's just blatant marketing speech (talks like that already sets off red flags for me). as we can see game has DX11 mode, so not impossible without Mantle after all. funny how things change. but what I really wanted to say is this game is heavily influenced by AMD, so obviously they used AMD's strengths while ignoring Nvidia's strengths when optimizing this game's DX12 mode.
> 
> impressive DX11 performance? well ofcourse, 980Ti is magnitudes faster than 290X. be if we are to believe this benchmark and developers of this game that in DX12 290X is about equal performance to 980Ti. I'm sorry, but that's just absurd and I don't believe that for a second and anyone who believes this one benchmark as definitive DX12 performance standard is very naive.


Ehm, people laughed at some point in time when dual core CPUs were becoming a thing - who would need second core, that clocks at 50% of a single core ? Same is now, nVidia is using architecture that is good with linear way of doing things, where AMD are doing a architecture that is more paralel based, thus splitting the load more effectively. DX12 is designed for exactly this, for better load splitting, reducing CPU overhead. DX11 was build on top of DX 9 and DX 10, which were leaving quite a lot more to be wanted, and for this extra efforts are needed from both producers on drivers. DX 12 lessens the burden quite alot, more actually on AMD, as their hardware is better suited for this. nVidia will have to deal with this on software level or revamp totally Pascal.
Quote:


> Originally Posted by *Olivon*
> 
> Last numbers give 82/18 for dGPU according to Mercury Research :
> 
> http://www.techspot.com/news/61832-amd-market-share-continues-collapse.html


According exactly to Mercury Research, it is 75/25 ratio, not 82/18. Your link is from some side IT news site, and the slide is from nVidia. In another thread exactly about the AMD shares of Q2 the correct table is shown exactly by Mercury Research.


----------



## Glottis

Quote:


> Originally Posted by *ku4eto*
> 
> Ehm, people laughed at some point in time when dual core CPUs were becoming a thing - who would need second core, that clocks at 50% of a single core ? Same is now, nVidia is using architecture that is good with linear way of doing things, where AMD are doing a architecture that is more paralel based, thus splitting the load more effectively. DX12 is designed for exactly this, for better load splitting, reducing CPU overhead. DX11 was build on top of DX 9 and DX 10, which were leaving quite a lot more to be wanted, and for this extra efforts are needed from both producers on drivers. DX 12 lessens the burden quite alot, more actually on AMD, as their hardware is better suited for this. nVidia will have to deal with this on software level or revamp totally Pascal.


your analogy isn't working. why in this benchmark FuryX is almost twice as slow as 980Ti in DX11 mode, when we know that Fury X is about same performance as 980Ti in games we play today. but in DX12 mode FuryX = 980Ti. developers of this game were so fixated on AMD hardware and Mantle/DX12, that they even forgot to optimize AMD cards for DX11 mode. what I'm saying is we shouldn't use this one questionable benchmark as definitive DX12 performance showcase between AMD and Nvidia.


----------



## HMBR

Quote:


> Originally Posted by *Glottis*
> 
> your analogy isn't working. why in this benchmark FuryX is almost twice as slow as 980Ti in DX11 mode, when we know that Fury X is about same performance as 980Ti in games we play today. but in DX12 mode FuryX = 980Ti. developers of this game were so fixated on AMD hardware and Mantle/DX12, that they even forgot to optimize AMD cards for DX11 mode. what I'm saying is we shouldn't use this one questionable benchmark as definitive DX12 performance showcase between AMD and Nvidia.


because this game have a lot more draw calls and stuff than the typical DX11 game, it was made for this, and it highlights the work Nvidia has done with DX11 MT and so on.


----------



## Serios

Quote:


> Originally Posted by *Glottis*
> 
> your analogy isn't working. why in this benchmark FuryX is almost twice as slow as 980Ti in DX11 mode, when we know that Fury X is about same performance as 980Ti in games we play today. but in DX12 mode FuryX = 980Ti. developers of this game were so fixated on AMD hardware and Mantle/DX12, that they even forgot to optimize AMD cards for DX11 mode. what I'm saying is we shouldn't use this one questionable benchmark as definitive DX12 performance showcase between AMD and Nvidia.


Because AMD did 0 optimizations for DX11.
They also didn't optimize DX12 but the catch is DX12 doesn't need much driver optimization.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> "impossible without Mantle", that's just blatant marketing speech (talks like that already sets off red flags for me). as we can see game has DX11 mode, so not impossible without Mantle after all. funny how things change. but what I really wanted to say is this game is heavily influenced by AMD, so obviously they used AMD's strengths while ignoring Nvidia's strengths when optimizing this game's DX12 mode.


DirectX 12 highlights AMD strengths. DirectX 11 highlights nVIDIAs strengths. You cannot optimize for a lack of dedicated hardware on a low level API. That's only logical. You can maybe disable Post Processing Effects on nVIDIA hardware but that makes no sense to do on AMD hardware because it costs little to no penalty thanks to Asynchronous Shading.

Quote:


> impressive DX11 performance? well ofcourse, 980Ti is magnitudes faster than 290X. but if we are to believe this benchmark and developers of this game that in DX12 290X is about equal performance to 980Ti. I'm sorry, but that's just absurd and I don't believe that for a second and anyone who believes this one benchmark as definitive DX12 performance standard is very naive.


First off, nobody believes that this game benchmark is definitive of overall DirectX 12 performance. I am quite certain that I have mentioned this several times. I don't like to repeat myself but I understand that you feel emotionally compromised and therefore aren't likely taking the time to read all of the information I have shared. At the risk of sounding repetitive I will quote myself:

"Varying on how DirectX12 games are programmed, and what features they utilize, we should see an overall parity between both nVIDIA and AMD in terms of average frames per second. Some titles might be programmed with less Compute in mind, benefitting nVIDIA, while others programmed with less Triangles, benefitting AMD."

GTX 980 Ti is magnitudes faster in terms of Pixel Fillrate and Triangle rate. When it comes to theoretical Compute performance, both the 290x and the GTX 980 Ti are similar (with the GTX 980 Ti being a bit faster). What determines the overall compute output of either GPUs are the methods utilized to feed the Compute Units as well as the workload being fed to them. Some workloads perform better on the GTX 980 Ti while others perform better on the 290x. This, however, is not what is causing a performance hit for nVIDIA hardware. nVIDIAs Maxwell 2's compute ressources, in Ashes of the Singularity, are likely not being used as efficiently as AMDs GCN 1.1/1.2. This is not the developers fault. It is the result of Maxwell 2 having 31 Compute Queues at its disposal vs 64 for AMDs GCN 1.1/1.2. Therefore the parallel nature of DirectX 12, in how the CPU cores talk to the Graphics card, is best able to queue and prioritize work loads on AMDs GCN 1.1/1.2.

That's one part of the equation.

The second part is likely the largest culprit. That is the lack of dedicated hardware, on nVIDIAs part, for Asynchronous Shading (a.k.a Parallelism). AMDs GCN 1.1/1.2 includes 8 Asynchronous Compute Engines. These are Processors, each having the ability to queue up 8 threads of work. Asynchronous Shading, a DirectX 12 feature, allows the addition of Post Processing effects (such as Volumetric Lighting for example) to be added to games at a minimal performance cost provided you have the dedicated hardware for it.



Asynchronous Compute Engines do two important things:

1. They act like an 8 Core Highly Parralel Processor and thus prioritize Shader Workloads in their respective Queues (8 Queues per ACE which is like having 8 logical Hyperthreading Cores per ACE) feeding a steady stream of work to the Compute Units.
2. They can themselves process complex Post-Processing Effects separate from the Compute Units (additional Computational Power).

The various lightsources, in Ashes of the Singularity, are all handled by these ACEs under GCN 1.1/1.2. Because nVIDIAs Maxwell 2 has no dedicated Asynchronous Compute Engines. Maxwell 2 must use some of its Compute ressources for this task, something AMD GCN 1.1/1.2 does not need to do as the ACE's are separate from its Compute Units.

The GTX 980 Ti being "maginitudes faster" in terms of Pixel Fillrate and Triangle rate doesn't help it at all with the fact that it lacks dedicated Asynchronous Compute Engines. At the end of the day, any GPU is limited by its slowest component(s). Its achilles heel if you will.

This does not mean that all DirectX 12 titles will behave the same, I repeat. In game titles which are dependant on the amount of triangles drawn, moreso than compute performance, the GTX 980 Ti will walk away the victor.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> your analogy isn't working. why in this benchmark FuryX is almost twice as slow as 980Ti in DX11 mode, when we know that Fury X is about same performance as 980Ti in games we play today. but in DX12 mode FuryX = 980Ti. developers of this game were so fixated on AMD hardware and Mantle/DX12, that they even forgot to optimize AMD cards for DX11 mode. what I'm saying is we shouldn't use this one questionable benchmark as definitive DX12 performance showcase between AMD and Nvidia.


This was also answered.

I will quote myself again:

"As for the folks claiming a conspiracy theory, not in the least. The reason AMDs DX11 performance is so poor under Ashes of the Singularity is because AMD literally did zero optimizations for the path. AMD is clearly looking on selling Asynchronous Shading as a feature to developers because their architecture is well suited for the task. It doesn't hurt that it also costs less in terms of Research and Development of drivers. Asynchronous Shading allows GCN to hit near full efficiency without even requiring any driver work whatsoever."

AMD likely didn't optimize their DX11 path in Ashes of the Singularity. They did so to prove a point. Without any driver optimizations whatsoever, this is the performance of DirectX 11 vs DirectX 12. AMD is banking on selling DirectX 12 because DirectX 12 benefits their hardware. Simple as that.


----------



## ToTheSun!

Quote:


> Originally Posted by *HMBR*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Glottis*
> 
> your analogy isn't working. why in this benchmark FuryX is almost twice as slow as 980Ti in DX11 mode, when we know that Fury X is about same performance as 980Ti in games we play today. but in DX12 mode FuryX = 980Ti. developers of this game were so fixated on AMD hardware and Mantle/DX12, that they even forgot to optimize AMD cards for DX11 mode. what I'm saying is we shouldn't use this one questionable benchmark as definitive DX12 performance showcase between AMD and Nvidia.
> 
> 
> 
> because this game have a lot more draw calls and stuff than the typical DX11 game, it was made for this, and it highlights the work Nvidia has done with DX11 MT and so on.
Click to expand...

Do you happen to know how many draw calls the benchmark requires per unit of time? Unless you can prove this game or any other coming in the near future actually needs more than what a 980/980ti can deal with, this is all a myth with which to justify the disparity in results.

Refer to Mahigan's posts for more sensical conclusions.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> DirectX 12 highlights AMD strengths. DirectX 11 highlights nVIDIAs strengths. You cannot optimize for a lack of dedicated hardware on a low level API. That's only logical. You can maybe disable Post Processing Effects on nVIDIA hardware but that makes no sense to do on AMD hardware because it costs little to no penalty thanks to Asynchronous Shading.
> First off, nobody believes that this game benchmark is definitive of overall DirectX 12 performance. I am quite certain that I have mentioned this several times. I don't like to repeat myself but I understand that you feel emotionally compromised and therefore aren't likely taking the time to read all of the information I have shared. At the risk of sounding repetitive I will quote myself:
> 
> "Varying on how DirectX12 games are programmed, and what features they utilize, we should see an overall parity between both nVIDIA and AMD in terms of average frames per second. Some titles might be programmed with less Compute in mind, benefitting nVIDIA, while others programmed with less Triangles, benefitting AMD."
> 
> GTX 980 Ti is magnitudes faster in terms of Pixel Fillrate and Triangle rate. When it comes to theoretical Compute performance, both the 290x and the GTX 980 Ti are similar (with the GTX 980 Ti being a bit faster). What determines the overall compute output of either GPUs are the methods utilized to feed the Compute Units as well as the workload being fed to them. Some workloads perform better on the GTX 980 Ti while others perform better on the 290x. This, however, is not what is causing a performance hit for nVIDIA hardware. nVIDIAs Maxwell 2's compute ressources, in Ashes of the Singularity, are likely not being used as efficiently as AMDs GCN 1.1/1.2. This is not the developers fault. It is the result of Maxwell 2 having 31 Compute Queues at its disposal vs 64 for AMDs GCN 1.1/1.2. Therefore the parallel nature of DirectX 12, in how the CPU cores talk to the Graphics card, is best able to queue and prioritize work loads on AMDs GCN 1.1/1.2.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> That's one part of the equation.
> 
> The second part is likely the largest culprit. That is the lack of dedicated hardware, on nVIDIAs part, for Asynchronous Shading (a.k.a Parallelism). AMDs GCN 1.1/1.2 includes 8 Asynchronous Compute Engines. These are Processors, each having the ability to queue up 8 threads of work. Asynchronous Shading, a DirectX 12 feature, allows the addition of Post Processing effects (such as Volumetric Lighting for example) to be added to games at a minimal performance cost provided you have the dedicated hardware for it.
> 
> 
> 
> nVIDIAs Maxwell 2 has no dedicated Asynchronous Compute Engines. Therefore Maxwell 2 must use some of its Compute ressources for this task, something AMD GCN 1.1/1.2 does not need to do as the ACE's are separate from its Compute Units.
> 
> The GTX 980 Ti being "maginitudes faster" in terms of Pixel Fillrate and Triangle rate doesn't help it at all with the fact that it lacks dedicated Asynchronous Compute Engines. At the end of the day, any GPU is limited by its slowest component(s). Its achilles heel if you will.
> 
> 
> 
> This does not mean that all DirectX 12 titles will behave the same, I repeat. In game titles which are dependant on the amount of triangles drawn the GTX 980 Ti will walk away the victor.


There does seem to be 2 different "visions" for the future of GPUs I guess.

Nvidia went for tessellation and what you might call "geometrically complex" graphics (for lack of a better term), along with far more heavier driver-oriented optimizations (possible with DX11)
AMD has gone with more memory bandwidth and of course, asynchronous compute, although oddly they did not radically change that part of the GPU compared to Hawaii. This approach seems to be better with DX12 now.
I think that in the coming architectures, we will see a "convergence" with both sides trying to match the other

Depending on how many games support DX12 though in the future, the Fury may very well be a better card to buy, assuming it is even available to buy. Right now we need more DX12 games to compare and see. I suspect though that the incentive for Nvidia to try something like Hairworks also grows now or attempts to push heavy tessellation (see here: https://techreport.com/review/21404/crysis-2-tessellation-too-much-of-a-good-thing).

Actually, what do you think about my thoughts earlier about the next generation?


Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *CrazyElf*
> 
> Next generation then for Nvidia:
> 
> Nvidia introduces HBM2 on its GPUs, presumably increasing the bandwidth a great deal
> The front end becomes more parallel as you've described - so perhaps 128 compute units like AMD?
> Another architectural revision? Will we see gains comparable from SMX (Kepler) to SMM (Maxwell)? Those were huge, combined with the cache increase.
> Nvlink will be introduced?
> I think that this could be one of the biggest gains in GPUs since the 8800 GTX was launched then if this comes to fruition.
> 
> Meanwhile at AMD:
> 
> Like Nvidia, AMD will release HBM2. Whether or not their experience with HBM will lead to a better memory controller than Nvidia's remains to be seen though.
> A major revision of GCN is expected. See here (http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-readies-three-new-gpus-for-2016-greenland-baffin-and-ellesmere/)
> As you've said, I presume that they'll add at least 128 ROPs (due to the higher transistor budget)
> 
> The big issue I see at AMD is that they lack the R&D money to keep up at this point. I hope AMD bounces back spectacularly though - we need 2 vendors.






As I said, I think it's pretty amazing that AMD has so far managed to keep up as well as it has given the financial constraints that it has faced and in some cases, has led (HBM, Mantle, etc). We're probably looking at ~300mm^2 dies from both companies next year followed by 600mm^2 giants in 2017.


----------



## Horsemama1956

Quote:


> Originally Posted by *Ganf*
> 
> Hopefully won't be long before someone makes a tool that blocks all of the spyware. There's gotta be at least a dozen groups working on it already, it's not a small issue.


Everything you use that has some sort of connection is "spyware" of some kind.


----------



## Ganf

Quote:


> Originally Posted by *Horsemama1956*
> 
> Everything you use that has some sort of connection is "spyware" of some kind.


Good for that everything.

You'd be surprised how little of that Everything I have for that specific reason, and how much I modify what I do own to shut it up.


----------



## Clocknut

Quote:


> Originally Posted by *Mahigan*
> 
> AMD likely didn't optimize their DX11 path in Ashes of the Singularity. They did so to prove a point. Without any driver optimizations whatsoever, this is the performance of DirectX 11 vs DirectX 12. *AMD is banking on selling DirectX 12 because DirectX 12 benefits their hardware. Simple as that*


that is going to be their biggest mistake.

AMD banked for multicore support for their bulldozer, in the end it didnt really happen, they lost.

I am very sure they gonna be staying underdog yet again since their DX9-11 drivers are suck. DX12 are not going to support 50% of the games in the market in another 5years.


----------



## Mahigan

Quote:


> Next generation then for Nvidia:
> Nvidia introduces HBM2 on its GPUs, presumably increasing the bandwidth a great deal
> The front end becomes more parallel as you've described - so perhaps 128 compute units like AMD?
> Another architectural revision? Will we see gains comparable from SMX (Kepler) to SMM (Maxwell)? Those were huge, combined with the cache increase.
> Nvlink will be introduced?
> 
> I think that this could be one of the biggest gains in GPUs since the 8800 GTX was launched then if this comes to fruition.
> 
> Meanwhile at AMD:
> Like Nvidia, AMD will release HBM2. Whether or not their experience with HBM will lead to a better memory controller than Nvidia's remains to be seen though.
> A major revision of GCN is expected. See here (http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-readies-three-new-gpus-for-2016-greenland-baffin-and-ellesmere/)
> As you've said, I presume that they'll add at least 128 ROPs (due to the higher transistor budget)


I think that the changes between Maxwell and Maxwell 2 highlight the direction in which nVIDIA are headed and that's a more parallel architecture. Based on the inclusion of Asychronous Shading under Xbox One and Playstation 4 titles as well as it making its way into the DirectX12 primary feature list, I think that nVIDIA will likely dedicate hardare for this task. Based on how nVIDIA have done this in the past (Radeon HD 5000 series introduced Tessellation and then nVIDIA went crazy for it) I think that nVIDIA may in fact add at least twice the Asynchronous Compute Ressources of GCN 1.1/1.2.

nVIDIA also does a better job at handling their available memory bandwidth than AMD currently. Therefore HBM2 will likely benefit nVIDIA moreso than AMD. See graph bellow:



As for AMD, they'll likely double what is currently found in GCN 1.1/1.2 (probably not doubling the amount of Arithmetic Logic Units though). We're likely going to see 16 Asychronous Compute Engines (each 8 thread queues deep) for a total of 128 thread queues.

Pascal may find itself competing with an update to GCN. AMD doesn't need to make many changes to GCN compared to the amount of changes nVIDIA needs to make to Maxwell 2 (contrary to popular belief). Which should suit AMD well considering most of their Research and Development budget is focussed on AMD Zen anyway.


----------



## Mahigan

Quote:


> Originally Posted by *Clocknut*
> 
> that is going to be their biggest mistake.
> 
> AMD banked for multicore support for their bulldozer, in the end it didnt really happen, they lost.
> 
> I am very sure they gonna be staying underdog yet again since their DX9-11 drivers are suck. DX12 are not going to support 50% of the games in the market in another 5years.


The majority of people buy hardware based on new and upcoming titles. The majority of people don't buy a new Graphics card with the intent of playing Quake 3 at 1,000 Frames per Second.

Since the new titles will predominantly be DirectX 12 in nature... AMD doesn't need to bank on Multi-core support. AMD just needs to follow the direction in which the market is headed. Hopefully they won't continue making unnecessary risks.


----------



## Horsemama1956

Quote:


> Originally Posted by *Ganf*
> 
> Good for that everything.
> 
> You'd be surprised how little of that Everything I have for that specific reason, and how much I modify what I do own to shut it up.


Not anywhere near as much as you think. Unless you live in a jungle there are plenty of people that know everything about you, and much crappier people than MS.


----------



## Ganf

Quote:


> Originally Posted by *Horsemama1956*
> 
> Not anywhere near as much as you think. Unless you live in a jungle there are plenty of people that know everything about you, and much crappier people than MS.


Given. That doesn't mean I've got to flip my skirt up for every passer-by.


----------



## Mahigan

This is likely why AMD are pushing DirectX 12. They pretty much built it.

Mantle and DirectX 12 are, well, extremely similar:


----------



## SpeedyVT

Quote:


> Originally Posted by *Glottis*
> 
> "impossible without Mantle", that's just blatant marketing speech (talks like that already sets off red flags for me). as we can see game has DX11 mode, so not impossible without Mantle after all. funny how things change. but what I really wanted to say is this game is heavily influenced by AMD, so obviously they used AMD's strengths while ignoring Nvidia's strengths when optimizing this game's DX12 mode.
> 
> impressive DX11 performance? well ofcourse, 980Ti is magnitudes faster than 290X. but if we are to believe this benchmark and developers of this game that in DX12 290X is about equal performance to 980Ti. I'm sorry, but that's just absurd and I don't believe that for a second and anyone who believes this one benchmark as definitive DX12 performance standard is very naive.


At pushing pixels.


----------



## Kpjoslee

Again, seems like so many conclusions are drawn only based on 1 single benchmark on one particular game that is still on alpha stage. I can see AMD is little ahead in term of DX12 driver optimization than Nvidia and that is the only thing I can say for sure from results lol.


----------



## delboy67

Quote:


> Originally Posted by *Clocknut*
> 
> that is going to be their biggest mistake.
> 
> AMD banked for multicore support for their bulldozer, in the end it didnt really happen, they lost.
> 
> I am very sure they gonna be staying underdog yet again since their DX9-11 drivers are suck. DX12 are not going to support 50% of the games in the market in another 5years.


The world is a very different place now, we now have some really good devs baking dx12 into their engine then effectively give their work away for free, we also have the x86/gcn consoles, dx12 adoption rate will be like no other api imo.


----------



## epic1337

wouldn't the opposite also be possible?

the cards were designed in such away, that AMD's GCN were build around mantle so DX12 was a successful shift.
while Maxwell were designed around DX11, which makes the DX12 shift a can-of-worms.


----------



## ZealotKi11er

Quote:


> Originally Posted by *epic1337*
> 
> wouldn't the opposite also be possible?
> 
> the cards were designed in such away, that AMD's GCN were build around mantle so DX12 was a successful shift.
> while Maxwell were designed around DX11, which makes the DX12 shift a can-of-worms.


I dont think it has to do with DX12. With DX12 the difference will be very small between Nvidia and AMD and it will be down to which GPU is fast. You know how Fury X is about as fast as GTX980 Ti @ 4K because its all GPU there but get crushed 2 1080p and doe snto do as good @ 1440p. DX12 will bring the 4K performance to 1080p and 1440p to Fury X. Nvidia is not gaining much because in reality their current cards get enough drawcalls out of their DX11 driver. We have to wait for faster Nvidia GPUs to see how much fast DX12 is compare to DX11. Game engine is more of a determining factor which architecture is going to perform better but in average right now most games favor Nvidia.


----------



## delboy67

Quote:


> Originally Posted by *epic1337*
> 
> wouldn't the opposite also be possible?
> 
> the cards were designed in such away, that AMD's GCN were build around mantle so DX12 was a successful shift.
> while Maxwell were designed around DX11, which makes the DX12 shift a can-of-worms.


Where maxwell was designed for dx gaming, gcn was designed as a workstation gpu we just get the leftovers for cheap to game on.


----------



## Xuper

Mahigan any idea Why there is huge performance hit between Nvidia and AMD in BF4? Because of Pixel Fillrate or peak rasterization or something like that ?


----------



## SpeedyVT

Quote:


> Originally Posted by *Xuper*
> 
> Mahigan
> any idea Why there is huge performance hit between Nvidia and AMD in BF4? Because of Pixel Fillrate or peak rasterization or something like that ?


Probably the same reason as any other game. Mantle wasn't taking advantage of the ACEs at that point anyway. However when you moved over to the Mantle version of BF:Hardline the FPS gap between AMD and NVidia gets smaller. That is due to ACEs used in later Mantle, but not earlier. Mantle didn't get this utilized till just before they dropped it as a current project. DX12 is significantly better. Atleast for games, could be various applications for Mantle outside of gaming.

Unless you're referring to DX benchmarks. Well it is the frostbite engine for one thing...


----------



## PontiacGTX

Quote:


> Originally Posted by *Xuper*
> 
> Mahigan
> any idea Why there is huge performance hit between Nvidia and AMD in BF4? Because of Pixel Fillrate or peak rasterization or something like that ?


Because the cpus are being a bottleneck under directx 11?


----------



## PostalTwinkie

@Mahigan

To sum up what you are saying about DX12;

We have, by a large degree, gone from being at the mercy of the hardware vendors Nvidia and AMD, to the mercy of the developers?

I might actually stop gaming.

EDIT:

I also must point out, you are using _a lot_ of AMD provided material in your posts, which is a bit concerning. Your posts also bring more question; what happens in situations where the software title isn't built cherry picked to one specific hardware vendor? How far in the "other" direction could you swing these results? Do we as consumers now need to start REALLY babysitting developers for vendor shenanigans?

Or will it really not matter that much, considering the arrival of Pascal and new products from AMD? Having been around a long time, it sounds like we are truly at another epoch in the software/hardware world. One where we simply cut ties to the old, in order to move to the new. Done with a lot less hand holding of the "older" hardware. Ripping off the band-aid fast!

Because if everything you are saying is true, at least for Nvidia and even older AMD, these hardware limitations really are just that. Software (Drivers) won't "fix" that, and they will forever be behind.


----------



## Cyro999

Quote:


> Originally Posted by *sugarhell*
> 
> Dx11 performance is per game. Dont compare the performance here vs all the dx11 games. They are behind on cpu overhead with dx11 games but not as much as here with 80% difference.


50% on several games that i have played, both hugely popular. One of them spawned thousands of angry comments from AMD GPU users blaming the game developers for "poor optimization"

That's what people tend to do when performance isn't where it should be. They're far too slow to blame AMD and get them to fix their drivers - just a few guys at the front of the crowd saying HEY GAME DEVS FIX OPTIMIZATION and everybody else following because they're not tech enthusiasts to know about basic problems like this.


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> Mahigan
> any idea Why there is huge performance hit between Nvidia and AMD in BF4? Because of Pixel Fillrate or peak rasterization or something like that ?


I would assume that the AMD Mantle Path being used is quite old and not optimized, worked on, by the developer anymore. As far as DirectX 11 goes, you're likely seeing only 1 to 2 cores being utilized with any degree of frequency. Therefore under BF4 I would think that the AMD GCN 1.1/1.2 architectures are not being fed enough data quickly enough, in Parallel, in order to take advantage of the Asynchronous Compute Engines.

The Post Processing effects, used in the game, do not make use of Asynchronous Shading. Therefore these effects are likely being left to the Compute Units to process. Seeing as nVIDIA has a superior DirectX 11 driver development team, there is probably a lot of Shader replacements and optimizations going on in the nVIDIA Driver (AMD driver optimizations of the sorts are not as beneficial to AMDs Hardware) which tilts the balance greatly towards nVIDIAs favor.

Finally I think that the degree of Tessellation likely favors nVIDIAs superior Geometry Engine.

I don't think that Pixel Fill Rate or Triangle Rate are the main bottlenecks but I could be wrong.

Frostbite is moving towards DirectX 12 in 2016. We're likely to see what difference(s) this move yields. Since the PS4 version of Battlefield 4 features Asynchronous Shading, then it is likely that future games built on the Frostbite engine will feature Asynchronous Shading as well.


----------



## Kpjoslee

Quote:


> Originally Posted by *PostalTwinkie*
> 
> @Mahigan
> 
> To sum up what you are saying about DX12;
> 
> We have, by a large degree, gone from being at the mercy of the hardware vendors Nvidia and AMD, to the mercy of the developers?
> 
> I might actually stop gaming.
> 
> EDIT:
> 
> I also must point out, you are using _a lot_ of AMD provided material in your posts, which is a bit concerning. Your posts also bring more question; what happens in situations where the software title isn't built cherry picked to one specific hardware vendor? How far in the "other" direction could you swing these results? Do we as consumers now need to start REALLY babysitting developers for vendor shenanigans?
> 
> Or will it really not matter that much, considering the arrival of Pascal and new products from AMD? Having been around a long time, it sounds like we are truly at another epoch in the software/hardware world. One where we simply cut ties to the old, in order to move to the new. Done with a lot less hand holding of the "older" hardware. Ripping off the band-aid fast!
> 
> Because if everything you are saying is true, at least for Nvidia and even older AMD, these hardware limitations really are just that. Software (Drivers) won't "fix" that, and they will forever be behind.


While I enjoyed his analysis, I don't agree with his statement that driver optimization no longer matters, I believe driver side optimization is going to be more important, since driver is going to be more of a potential bottleneck since developers are getting closer to the metal.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I would assume that the AMD Mantle Path being used is quite old and not optimized, worked on, by the developer anymore. As far as DirectX 11 goes, you're likely seeing only 1 to 2 cores being utilized with any degree of frequency. Therefore under BF4 I would think that the AMD GCN 1.1/1.2 architectures are not being fed enough data quickly enough, in Parallel, in order to take advantage of the Asynchronous Compute Engines.
> 
> The Post Processing effects, used in the game, do not make use of Asynchronous Shading. Therefore these effects are likely being left to the Compute Units to process. Seeing as nVIDIA has a superior DirectX 11 driver development team, there is probably a lot of Shader replacements and optimizations going on in the nVIDIA Driver (AMD driver optimizations of the sorts are not as beneficial to AMDs Hardware) which tilts the balance greatly towards nVIDIAs favor.
> 
> Finally I think that the degree of Tessellation likely favors nVIDIAs superior Geometry Engine.
> 
> I don't think that Pixel Fill Rate or Triangle Rate are the main bottlenecks but I could be wrong.
> 
> Frostbite is moving towards DirectX 12 in 2016. We're likely to see what difference(s) this move yields. Since the PS4 version of Battlefield 4 features Asynchronous Shading, then it is likely that future games built on the Frostbite engine will feature Asynchronous Shading as well.


what about the incoming directx 12 games being a console port from Xbox one?

you dont think many developers already being lazy to properly use the cores with with their games engines will be the same under directx 12?in other words directx 12 wont do an improvement as high as could be done with an engine with good multithread support?

and this game seems to have not a good scaling with more cores, how many of them are the optimal to be used 4,6 or 8?with HT?


----------



## Devnant

This benchmark is just pure BS TBH. Have a look at this reddit:

__
https://www.reddit.com/r/3hl5fj/13_gpus_tested_on_dx12_vs_dx11_performance/cu8axjd%5B/URL
b) Reduced cores (2 core performance - regardless of clock rate) http://i.imgur.com/rwHiLoI.jpg

This is just an alpha benchmark, and assuming Nvidia is gonna fail on Dx12 is just silly.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Devnant*
> 
> This benchmark is just pure BS TBH. Have a look at this reddit:
> 
> __
> https://www.reddit.com/r/3hl5fj/13_gpus_tested_on_dx12_vs_dx11_performance/cu8axjd%5B/URL
> b) Reduced cores (2 core performance - regardless of clock rate) http://i.imgur.com/rwHiLoI.jpg
> 
> This is just an alpha benchmark, and assuming Nvidia is gonna fail on Dx12 is just silly.


So people can see without clicking the links:
Quote:


> NVIDIA did some testing with a GeForce GTX Titan X with an Intel Core i7 5930K processor and saw that performance drop across the board when moving from DX11 to DX12 with high image quality settings. When they dropped the image quality down (even though they were averaging nearly 40 FPS on the 1080p Heavy test) they found they could get performance improvement by moving to DX12 on the 1080p Heavy test. Instead of reducing the number of cores on the processor with this card the team over at NVIDIA dropped to clock speed of the Intel Core i7-5930K from 3.5 GHz to 2.0 GHz and saw huge performance gains by going from DX11 to DX12. This is all fine and dandy and shows that DX12 is working, but it's not realistic scenario for real world gamers. How many desktop gamers do you know that are running an Intel quad-core processor with clock speeds that low? It is really disappointing to see our results and then NVIDIA basically confirm them and then the test scenario that shows any benefit was to take a $1000 and put it on a 6-core processor and downclock the frequency to 2GHz in order to show that DX11 to DX12 performance gains can be had.
> 
> 
> 
> 
> 
> The only other test NVIDIA showed was dual-core performance and they found that some nice performance gains could be seen for on a dual-core platform
> 
> We talked with NVIDIA and AMD about the benchmark and both noted that this is alpha software and we took that as they felt it might not be an accurate measurement of DX12 at this point in time. The benchmark basically tells you how your system hardware will run a series of scenes from the alpha version of Ashes of the Singularity and that is about it. After talking with NVIDIA, they made it clear that they feel that there will be better examples of DirectX 12 performance coming out shortly that don't have as many issues as we experienced in this benchmark.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> @Mahigan
> 
> To sum up what you are saying about DX12;
> 
> We have, by a large degree, gone from being at the mercy of the hardware vendors Nvidia and AMD, to the mercy of the developers?
> 
> I might actually stop gaming.
> 
> EDIT:
> 
> I also must point out, you are using _a lot_ of AMD provided material in your posts, which is a bit concerning. Your posts also bring more question; what happens in situations where the software title isn't built cherry picked to one specific hardware vendor? How far in the "other" direction could you swing these results? Do we as consumers now need to start REALLY babysitting developers for vendor shenanigans?
> 
> Or will it really not matter that much, considering the arrival of Pascal and new products from AMD? Having been around a long time, it sounds like we are truly at another epoch in the software/hardware world. One where we simply cut ties to the old, in order to move to the new. Done with a lot less hand holding of the "older" hardware. Ripping off the band-aid fast!
> 
> Because if everything you are saying is true, at least for Nvidia and even older AMD, these hardware limitations really are just that. Software (Drivers) won't "fix" that, and they will forever be behind.


The DirectX 12 driver is basically what a driver ought to be. A piece of software which links the various GPU capabilities to the DirectX 12 API. You can optimize where DirectX 12 talks to the GPU, you cannot insert Shader Replacements. Therefore if a game makes a request for a particular shader command to be executed, that command will be executed. The driver cannot come in between the API and the GPU and say "Wait a minute, I don't like this command, lets do this instead".

Therefore driver Optimizations, though possible to some extent, are not what they have come to be. The onus is really on the developers to run the optimized commands for the right GPU architectures. It is more important to work with developers than to wait until a game releases and release a fix (shader replacements) in the driver for increased improvements.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kpjoslee*
> 
> While I enjoyed his analysis, I don't agree with his statement that driver optimization no longer matters, I believe driver side optimization is going to be more important, since driver is going to be more of a potential bottleneck since developers are getting closer to the metal.


Here is my thinking, sort of spurred by Mahigan's comments.....

With DX 11 the options for the developer in developing a game are;


The DX 11 way.
The DX 11 way.
Shenanigans with the DX 11 way.

However, with DX 12 it looks more like this;


Column A .
Column B.
Column C.
Some of All.
This other way.

Basically the options in how the game is developed and ultimately runs is really up to the developer. While on the surface this sounds really great, how great is it? If developers can basically choose to _really_ show hardware bias, what stops that from being bad? What stops "Batman Arkham World: Nvidia Edition" from happening? If developers are now in a situation of being able to truly program for one hardware methodology or another?

_This is not a GamesWorks or AMD's (whatever their version is) discussion!!_

Let me take it further! Please keep in mind the small little note above.......

AMD is literally on borrowed time and money (All you AMD people keep your panties on, it is just the truth) and Nvidia is sitting on significant cash. What in God's name is going to stop Nvidia from using their ~$4,500,000,000 in cash to just "sponsor" the Hell out of games?

Obviously this isn't as much of an issue if Nvidia and AMD both adopt similar hardware methodologies, but what happens if they take two separate paths entirely?

This whole thing reminds me of 3DFX honestly, at least in some ways.

Quote:


> Originally Posted by *Mahigan*
> 
> The DirectX 12 driver is basically what a driver ought to be. A piece of software which links the various GPU capabilities to the DirectX 12 API. You can optimize where DirectX 12 talks to the GPU, you cannot insert Shader Replacements. Therefore if a game makes a request for a particular shader command to be executed, that command will be executed. The driver cannot come in between the API and the GPU and say "Wait a minute, I don't like this command, lets do this instead".
> 
> Therefore driver Optimizations, though possible to some extent, are not what they have come to be. The onus is really on the developers to run the optimized commands for the right GPU architectures. It is more important to work with developers than to wait until a game releases and release a fix (shader replacements) in the driver for increased improvements.


Yup, scary stuff actually. If true....

I see Nvidia muscle being an issue.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> This benchmark is just pure BS TBH. Have a look at this reddit:
> 
> __
> https://www.reddit.com/r/3hl5fj/13_gpus_tested_on_dx12_vs_dx11_performance/cu8axjd%5B/URL
> b) Reduced cores (2 core performance - regardless of clock rate) http://i.imgur.com/rwHiLoI.jpg
> 
> This is just an alpha benchmark, and assuming Nvidia is gonna fail on Dx12 is just silly.


Exactly what you would expect from a Serial Architecture. The Faster the CPU (since you're only using 1-2 cores in DX11) the more information it can feed the GPU in a Serialized sequence. The Slower the CPU, the less.

When you move to DirectX 12 you're able to feed more information to the GPU, at a lower clock, because more cores are assisting in feeding the GPU.

The limitation comes in the form of Parallel queues. If your CPU is fast enough, under DirectX 11, it will be able to feed the GPU more information, in a serial manner, than the GPU could be fed under DirectX 12 if there aren't enough Parallel queues available.

As for GCN vs Maxwell. Under DirectX 12 you're using Maxwell 2's 31 Compute Queues for Parallelism vs GCN 1.1/1.2's 64 queues. Placing Maxwell 2 at a disadvantage to GCN 1.1/1.2 in that respect. This is not that surprising given that Maxwell 2 is a patched up Maxwell in this respect. Maxwell was never intended to be a Parallel architecture.

The conclusion is that Maxwell 2 is better at Serial than Parallel workloads, it benefits more from having a strong CPU under DirectX 11 than it does from having access to more CPU Cores, Parallelism, under DirectX 12.

nVIDIAs own figures are what enabled me to deduce what was happening with Maxwell 2. The figures you posted are the figures I relied on when coming to that conclusion (prior to joining overclock.net).

Hope that answers your query.


----------



## Themisseble

http://forums.anandtech.com/showthread.php?t=2443723


----------



## Kpjoslee

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Here is my thinking, sort of spurred by Mahigan's comments.....
> 
> With DX 11 the options for the developer in developing a game are;
> 
> 
> The DX 11 way.
> The DX 11 way.
> Shenanigans with the DX 11 way.
> 
> However, with DX 12 it looks more like this;
> 
> 
> Column A .
> Column B.
> Column C.
> Some of All.
> This other way.
> 
> Basically the options in how the game is developed and ultimately runs is really up to the developer. While on the surface this sounds really great, how great is it? If developers can basically choose to _really_ show hardware bias, what stops that from being bad? What stops "Batman Arkham World: Nvidia Edition" from happening? If developers are now in a situation of being able to truly program for one hardware methodology or another?
> 
> _This is not a GamesWorks or AMD's (whatever their version is) discussion!!_
> 
> Let me take it further! Please keep in mind the small little note above.......
> 
> AMD is literally on borrowed time and money (All you AMD people keep your panties on, it is just the truth) and Nvidia is sitting on significant cash. What in God's name is going to stop Nvidia from using their ~$4,500,000,000 in cash to just "sponsor" the Hell out of games?
> 
> Obviously this isn't as much of an issue if Nvidia and AMD both adopt similar hardware methodologies, but what happens if they take two separate paths entirely?
> 
> This whole thing reminds me of 3DFX honestly, at least in some ways.
> Yup, scary stuff actually. If true....
> 
> I see Nvidia muscle being an issue.


Well, the thing is, API still gotta go through the drivers, they are not programming directly to the GPU bypassing their drivers lol. It is still up to AMD and Nvidia to make sure Directx12 games are performing well on their hardwares with driver optimization.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kpjoslee*
> 
> Well, the thing is, API still gotta go through the drivers, they are not programming directly to the GPU bypassing their drivers lol. It is still up to AMD and Nvidia to make sure Directx12 games are performing well on their hardwares with driver optimization.


Yes, we understand it, and that is all well and good....

However; the counter theory is that the driver is just a doorway now, and not the doorway and master of what happens beyond it.

Also.....

Source
Quote:


> An enlightening read, but it doesn't make sense when you look at the benchmarks.
> 
> If AMD's parallel processing superiority was that superior to Maxwell's, then why are the benchmarks so close? Shouldn't AMD's lead be much greater than Maxwell's in DX12, considering Maxwell's supposedly serial processing nature?
> 
> But instead we find this. The benchmarks are very close. The only distinguishing anomalies are that NVidia barely gains anything from using DX12 (and even loses performance), and AMD's performance under DX11 is terrible..
> 
> Personally, I think it's just that the benchmark itself is more tuned for AMD hardware, and that AMD's Mantle venture gave them a good headstart on NVidia when it came to tuning their drivers for explicitly parallel APIs like DX12 and Mantle.
> 
> NVidia on the other hand still have their work cut out for them..


----------



## Mahigan

Quote:


> An enlightening read, but it doesn't make sense when you look at the benchmarks.
> 
> If AMD's parallel processing superiority was that superior to Maxwell's, then why are the benchmarks so close? Shouldn't AMD's lead be much greater than Maxwell's in DX12, considering Maxwell's supposedly serial processing nature?
> 
> But instead we find this. The benchmarks are very close. The only distinguishing anomalies are that NVidia barely gains anything from using DX12 (and even loses performance), and AMD's performance under DX11 is terrible..
> 
> Personally, I think it's just that the benchmark itself is more tuned for AMD hardware, and that AMD's Mantle venture gave them a good headstart on NVidia when it came to tuning their drivers for explicitly parallel APIs like DX12 and Mantle.
> 
> NVidia on the other hand still have their work cut out for them..


Ashes of the Singularity also demands that a gigantic amount of units be drawn onto the screen at once. Each unit is independent of the other. Each unit requires a triangle setup. What we're seeing is AMD GCN 1.1/1.2 hitting its Peak Rasterization Rate (expressed in Gtris/s). AMD are bottlenecked, in this game, by their rather limited array of RBE (Render Back Ends).

Take Star Swarm for example, it does not make use of Asynchronous shading (Post Processing Effects). Therefore although GCN 1.1/1.2 can be fed more Draw Calls than Maxwell, it cannot draw all of the Triangles on the screen as quickly as Maxwell:



GCN 1.1/1.2 thus get near free Post Processing Effects by using Asynchronous Shading, due to the dedicated Asynchronous Compute Engines. nVIDIA takes a hit when using Asynchronous Shading because it must do the work in its Compute Arrays.

On top of that... Under DirectX 12 you're using Maxwell 2's 31 Compute Queues for Parallelism vs GCN 1.1/1.2's 64 queues. Placing Maxwell 2 at a disadvantage to GCN 1.1/1.2 in that respect. This is not that surprising given that Maxwell 2 is a patched up Maxwell in this respect. Maxwell was never intended to be a Parallel architecture. You can derive more efficienct use of Compute resources, by using Parallelism, with GCN 1.1/1.2 than you can Maxwell/Maxwell 2.

Ashes of the Singularity thus taxes both GCN 1.1/1.2 and Maxwell 2 but for different reasons.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> Exactly what you would expect from a Serial Architecture. The Faster the CPU (since you're only using 1-2 cores in DX11) the more information it can feed the GPU in a Serialized sequence. The Slower the CPU, the less.
> 
> When you move to DirectX 12 you're able to feed more information to the GPU, at a lower clock, because more cores are assisting in feeding the GPU.
> 
> The limitation comes in the form of Parallel queues. If your CPU is fast enough, under DirectX 11, it will be able to feed the GPU more information, in a serial manner, than the GPU could be fed under DirectX 12 if there aren't enough Parallel queues available.
> 
> As for GCN vs Maxwell. Under DirectX 12 you're using Maxwell 2's 31 Compute Queues for Parallelism vs GCN 1.1/1.2's 64 queues. Placing Maxwell 2 at a disadvantage to GCN 1.1/1.2 in that respect. This is not that surprising given that Maxwell 2 is a patched up Maxwell in this respect. Maxwell was never intended to be a Parallel architecture.
> 
> The conclusion is that Maxwell 2 is better at Serial than Parallel workloads, it benefits more from having a strong CPU under DirectX 11 than it does from having access to more CPU Cores, Parallelism, under DirectX 12.
> 
> nVIDIAs own figures are what enabled me to deduce what was happening with Maxwell 2. The figures you posted are the figures I relied on when coming to that conclusion (prior to joining overclock.net).
> 
> Hope that answers your query.


I didn't ask anything. I agree with everything you've said so far.

It's just pretty strange that on NVIDIA cards there are performance gains with the CPU underclocked or with disabled cores moving from DX11 to DX12, but negative gains with the CPU @ stock. There's something really, really wrong with this benchmark, AS IS to be expected, because it's still on alpha. Looks like they didn't optimize it properly for NVIDIA cards, that's all I'm saying.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> Ashes of the Singularity also demands that a gigantic amount of units be drawn onto the screen at once. Each unit is independent of the other. Each unit requires a triangle setup. What we're seeing is AMD GCN 1.1/1.2 hitting its Peak Rasterization Rate (expressed in Gtris/s). AMD are bottlenecked, in this game, by their rather limited array of RBE (Render Back Ends).
> 
> Take Star Swarm for example, it does not make use of Asynchronous shading (Post Processing Effects). Therefore although GCN 1.1/1.2 can be fed more Draw Calls than Maxwell, it cannot draw all of the Triangles on the screen as quickly as Maxwell:
> 
> GCN 1.1/1.2 thus get near free Post Processing Effects by using Asynchronous Shading, due to the dedicated Asynchronous Compute Engines. nVIDIA takes a hit when using Asynchronous Shading because it must do the work in its Compute Arrays.
> 
> On top of that... Under DirectX 12 you're using Maxwell 2's 31 Compute Queues for Parallelism vs GCN 1.1/1.2's 64 queues. Placing Maxwell 2 at a disadvantage to GCN 1.1/1.2 in that respect. This is not that surprising given that Maxwell 2 is a patched up Maxwell in this respect. Maxwell was never intended to be a Parallel architecture. You can derive more efficienct use of Compute resources, by using Parallelism, with GCN 1.1/1.2 than you can Maxwell/Maxwell 2.
> 
> Ashes of the Singularity thus taxes both GCN 1.1/1.2 and Maxwell 2 but for different reasons.


Work with me here....

The reason we aren't seeing AMD in an even larger lead, in this one example, is because not only are they bottle-necking at the Render Back Ends, but also with the 64 Queues?

So which, in this situation, is the first of the two and by how much? The RBE is the first and tightest? Without changing anything else, if they had double the RBE, would we see massive further gains, until that 64 queue limit is hit?

Based off your own experience, what sort of ways do you think this could translate across other genres of games or titles using DX 12? Is this just a prime example of doing RTS really well, but it wouldn't work for beans in a FPS or MMO?

Quote:


> Originally Posted by *Devnant*
> 
> I didn't ask anything. I agree with everything you've said so far.
> 
> It's just pretty strange that on NVIDIA cards there are performance gains with the CPU underclock or disabled cores moving from DX11 to DX12, but negative gains with the CPU @ stock. There's something really, really wrong with this benchmark, AS IS to be expected, because it's still on alpha. Looks like they didn't optimized it properly for NVIDIA cards, that's all I'm saying.


Valid concerns.

If anything the conversation is getting really interesting. Surprised @Alatar or the other GPU fiends haven't joined in.


----------



## Bandersnatch

Quote:


> Originally Posted by *Mahigan*
> 
> This was also answered.
> 
> I will quote myself again:
> 
> "As for the folks claiming a conspiracy theory, not in the least. The reason AMDs DX11 performance is so poor under Ashes of the Singularity is because AMD literally did zero optimizations for the path. AMD is clearly looking on selling Asynchronous Shading as a feature to developers because their architecture is well suited for the task. It doesn't hurt that it also costs less in terms of Research and Development of drivers. Asynchronous Shading allows GCN to hit near full efficiency without even requiring any driver work whatsoever."
> 
> AMD likely didn't optimize their DX11 path in Ashes of the Singularity. They did so to prove a point. Without any driver optimizations whatsoever, this is the performance of DirectX 11 vs DirectX 12. AMD is banking on selling DirectX 12 because DirectX 12 benefits their hardware. Simple as that.


Sorry for the new account, but I want to keep this a personal account rather than something people will tie to a specific site/sites. Anyway, I'm not new here per se, though I don't come here often. But I'm finding there are a lot of "conclusions" being drawn that seem more than a bit premature. As for this specific thread:

The above is flawed logic. If Nvidia can optimize DX11 performance to a certain level, software developers should be able to optimized DX12 at the very least to that same level. In other words, *if your DX12 performance isn't better than the drivers' DX11 performance, you're doing it wrong.* You're right on the "without any driver optimizations" bit with DX12, but that's ignoring the fact that DX12 driver optimizations are going to be inherently less useful than under DX11. Instead, what we will need in order to get ideal performance is _DX12 software optimizations for each and every architecture_.

Let me put it another way: if you write a moderately complex program using C++, and then someone goes in and hand optimizes the code with ASM, any assembly language programmer worth being called such should be able to beat the compilers, often by a large amount (50% or more). It's why common loops of code are prime targets for ASM optimizations, because if you reduce the number of instructions by half, you could potentially double the performance of a critical piece of code. But it takes time and so it's only done for critical segments. DX12 (and Mantle) in many ways are like giving graphics guys the option to write ASM. But here's where it gets messy. If you write a highly tuned ASM binary for AMD's Bulldozer architecture, looking to extract every ounce of performance possible, that same binary will run on Intel's Core architecture... but _it will not run optimally_!

Carrying this over to Ashes of the Singularity, here's what I suspect we're seeing. We have a game backed by AMD receiving a lot of support from AMD. AMD does (relatively) nothing with their DX11 drivers to set a low bar for DX12 to clear, and then they help optimize the DX12 code -- not drivers! -- to be essentially as good as it can be for GCN (within reason -- there's always more that can be done, but they're probably 90-95 percent of the way there). Meanwhile, there is basically no Maxwell 2.0 / GM20x optimized DX12 code path in place, with the result being that Nvidia's hardware has to run the AMD code path, which will inherently be non-optimal. In addition, AMD can look at the code and suggest using features that will further benefit AMD's specific architecture.

Unless I'm seriously mistaken, this is effectively best-case (again, within reason) AMD GCN performance vs. worst-case (mostly -- it's probably not intentionally slowing down Nvidia hardware) Nvidia GM20x performance. All you need to do is look at the DX11 performance. That is the bar to clear, on both sides. AMD set the bar very low because they didn't optimize their DX11 drivers much at all. Nvidia set the DX11 bar as high as possible to show where the developers need to start, not where they should finish.

I'm also pretty skeptical of some of the claims and language coming from Oxide. (http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/) "All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months." Sure, but it's an AMD game and so AMD is actively working with the devs while Nvidia isn't. "Some optimizations that the drivers are doing in DX11 just aren't working in DX12 yet." Or, Nvidia has better DX11 driver optimizations than we have DX12 code optimizations. "This in no way affects the validity of a DX12 to DX12 test, as the same exact workload gets sent to everyone's GPUs." Um... see above: you're running AMD-tuned code on Nvidia hardware, and then saying this doesn't affect the validity? I call bunk.

We could also look at the examples from the Bitcoin/cryptocurrency world regarding optimizations. For two years, OpenCL and AMD was "the only way to go". Then CUDAminer/ccMiner/nvMiner arrived and dramatically closed the gap between many Nvidia and AMD GPUs. True, for certain algorithms AMD was still faster, but for others Nvidia was faster. Basically we had a bunch of OpenCL hacks writing mining code for GPUs, because that's what they knew, and when someone that knew CUDA decided to put in a _real_ effort, they more than doubled CUDA mining performance. The catch is that we're talking GPGPU code, which is quite different from game code, and most GPUs for obvious reasons are geared towards games.

In other words, I would be extremely hesitant about making blanket statements regarding what DX12 will and won't do for various GPU architectures based on a single game from a developer that is actively receiving help from only one of the GPU vendors. If we were looking at a game with an Nvidia TWIMTBP logo and Nvidia was doing great while AMD was struggling, I'd be saying the exact same thing. Looking at high level descriptions of the hardware and theoretical GFLOPS and using that to back up the current performance is silly, because the current performance is already skewed. Why is AMD performing better on a game with an AMD logo that isn't even in public beta yet ? (And remember that the beta stage is when a lot of optimizations take place!) Because if it was anything else, we would be really dismayed.

There's a whole separate subtext to this benchmark, however, and sadly it's all about politics. AMD is backing this title and throwing resources at it, and Nvidia is being sort of a dick and spending a lot of effort to optimize their DX11 drivers, but very little effort (if any) to optimize Oxide's DX12 code. Again, read between the lines; here's what Oxide says:

Being fair to all the graphics vendors

Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone's machine, regardless of what hardware our players have.

To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, _when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code._

We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn't move the engine architecture backward (that is, we are not jeopardizing the future for the present).

Why isn't Oxide actively monitoring the performance of their shaders on all GPUs? Why did Nvidia have to do the work? Oxide is the developer, and they should be largely held accountable for their performance. Except, apparently it's the job of the hardware companies, and so Oxide goes, "Hey, here's our code. We can't be bothered to really tune it right now, as we're making a game, but if you see anything you'd like to improve, let us know. But we might not include your changes on the grounds that we're jeopardizing the future for the present." They're basically asking the developers for help, or at least they're willing to accept help. Nvidia doesn't really want to promote a game their competition is heavily backing, so only in the most egregious cases will they submit improved shaders. As for AMD's optimized shader code, the only requirement is that it not perform worse on Nvidia hardware than the original Oxide shader code. But it seems like the level of optimizations Oxide has made without help from AMD may not be all that great to begin with. And parts of the engine can and will change, up to and beyond the time when the game ships.

It feels like more than anything, this was Oxide yelling "FIRST!!11!!" and posting a "real-world DX12 gaming benchmark". But like any and all gaming benchmarks, the only thing the benchmark truly shows is how fast this particular game -- at this particular point in time -- runs on the current hardware and drivers. Trying to go beyond that is like using 3DMark to predict performance in Unreal Engine games, or whatever other game engine you care to mention. Yes, sometimes the relative performance matches up nicely, but more often than not it's different. It's why all the serious hardware technology sites use a moderately large (eight or more) library of games, from a variety of engines, to test graphics cards.

Funny thing: I've heard both Nvidia and AMD say, more or less, "Wait for DX12 Fable to see how things play out." Given Fable is a Microsoft property and MS is going to be heavily invested in providing the best experience on all platforms and showcasing what DX12 can really do, I would agree that's a good idea. Fable also lacks an Nvidia or AMD logo, making it an even better representation of DX12 potential in my book.

And after Fable, wait for even more games, and perhaps more importantly, see what games are good and not just which games run faster on one set of hardware or another. (Hello, Sniper Elite, I'm looking at you: this is an AMD backed game (with Mantle support!) that, frankly, sort of sucks when it comes to being a game. And the Mantle code path is a joke as well. But to each his own....) If you love Total Annihilation, maybe Ashes is exactly what you've been waiting for, and maybe it stands to really benefit from DX12. I'm more interested in FPS and RPG games personally, so I want to see what non-RTS games do with the API. When we have ten shipping DX12 games, then we can actually start to make real conclusions; for now, it's mostly speculation and fanboi pissing matches, couched in technical speak to try and "prove" the validity of the claims.

Mahigan is probably right that asynchronous shaders work better on AMD GCN, and Ashes makes use of those to good benefit. Other games will do their own thing and will invariably have different results. From my testing, though, Ashes is looking more interesting as a way to see what type of CPU is the recommended minimum than as a way of evaluating the AMD and Nvidia GPUs against each other. Hell, the instructions for the benchmark even recommended testing it on AMD R9 Fury X, 390, 380, and 370... but on the Nvidia side, only the 980 Ti is recommended. They know already that their current code is so badly optimized on Nvidia hardware that they only want the press to look at the fastest Nvidia GPUs. Or at least, that's how I read it.

TL;DR: AMD backed titles perform better on AMD; Nvidia titles perform better on Nvidia. This is the _a priori_ starting point. Using performance from such a benchmark to prove superiority of one set of hardware is merely promoting whichever hardware is backing the game.


----------



## Kand

Quote:


> Originally Posted by *PontiacGTX*
> 
> what about the incoming directx 12 games being a console port from Xbox one?
> 
> you dont think many developers already being lazy to properly use the cores with with their games engines will be the same under directx 12?in other words directx 12 wont do an improvement as high as could be done with an engine with good multithread support?
> 
> and this game seems to have not a good scaling with more cores, how many of them are the optimal to be used 4,6 or 8?with HT?


You mean like how Batman Arkham Knight was ported to PC using the PS4 version?


----------



## SpeedyVT

Quote:


> Originally Posted by *Kand*
> 
> You mean like how Batman Arkham Knight was ported to PC using the PS4 version?


It probably would've played better if it was ported.


----------



## GorillaSceptre

Just about to pull the trigger on a 980 Ti and then Mahigan comes out of nowhere.. This is becoming a damn nightmare









On the bright side it looks like my 2600k is going to last forever at this rate


----------



## Kand

Quote:


> Originally Posted by *SpeedyVT*
> 
> It probably would've played better if it was ported.


It -was- ported and it was absolutely broken.


----------



## Jim Dotcom

With what is basically 50% market share, GCN is for sure being developed on by any cross-platform software company. This is why this matters so much with DX12. It's Nvidia who will need to do all the optimising while AMD will barely need to lift a finger. Even with Nvidia bursting their guts there will still be clear hardware advantages (async shaders for one) that no amount of optimisation will be able to equal. Async shaders are easy to optimise any (dx12) game for btw, so it'll be very clear which devs aren't doing it.

Quote:


> According to AMD's Robert Hallock "A developer doesn't need to change the way they write their shaders to use AS [Asynchronous Shaders], so it's relatively easy to extract gains on AMD hardware. It's part of the core DX12 spec, so it's not even something that needs to be specifically added to an engine. You support DX12, you have it."


What's readily apparent is that AMD's DX11 woes are pretty much over and Nvidia will really have to move up another gear. AMD appears to have played a great long game by bringing everything to bear at the same time (and of course get zero credit for doing that). They just need to reach the finishing line now and it'll all be pretty interesting again.


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Just about to pull the trigger on a 980 Ti and then Mahigan comes out of nowhere.. This is becoming a damn nightmare
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On the bright side it looks like my 2600k is going to last forever at this rate


Then get Fury X








On a serious note, he is right about each architecture's current advantage and limitations, but don't take his words like the words of a prophet, because he is basing his theory on the benchmark that may or may not be the indicative of how the final release version would perform. There are so many unknowns at this point why Nvidia's dx12 performance is below their dx11 results. It could be blamed on Nvidia, or Oxide, or could be both. I doubt it is a simple matter of hardware limitation.


----------



## DesertRat

If these tests pan out to be true, 8+ core AMD FX CPUs and 290s and newer may actually end up potent systems. Glad I have an 8-thread 4ghz+ CPU at least. Ended up going nVidia, but we'll see how things compare after nVidia's optimizing for DX12 rendering in their drivers.


----------



## Redwoodz

Quote:


> Originally Posted by *Bandersnatch*
> 
> Sorry for the new account, but I want to keep this a personal account rather than something people will tie to a specific site/sites. Anyway, I'm not new here per se, though I don't come here often. But I'm finding there are a lot of "conclusions" being drawn that seem more than a bit premature. As for this specific thread:
> 
> The above is flawed logic. If Nvidia can optimize DX11 performance to a certain level, software developers should be able to optimized DX12 at the very least to that same level. In other words, *if your DX12 performance isn't better than the drivers' DX11 performance, you're doing it wrong.* You're right on the "without any driver optimizations" bit with DX12, but that's ignoring the fact that DX12 driver optimizations are going to be inherently less useful than under DX11. Instead, what we will need in order to get ideal performance is _DX12 software optimizations for each and every architecture_.
> 
> Let me put it another way: if you write a moderately complex program using C++, and then someone goes in and hand optimizes the code with ASM, any assembly language programmer worth being called such should be able to beat the compilers, often by a large amount (50% or more). It's why common loops of code are prime targets for ASM optimizations, because if you reduce the number of instructions by half, you could potentially double the performance of a critical piece of code. But it takes time and so it's only done for critical segments. DX12 (and Mantle) in many ways are like giving graphics guys the option to write ASM. But here's where it gets messy. If you write a highly tuned ASM binary for AMD's Bulldozer architecture, looking to extract every ounce of performance possible, that same binary will run on Intel's Core architecture... but _it will not run optimally_!
> 
> Carrying this over to Ashes of the Singularity, here's what I suspect we're seeing. We have a game backed by AMD receiving a lot of support from AMD. AMD does (relatively) nothing with their DX11 drivers to set a low bar for DX12 to clear, and then they help optimize the DX12 code -- not drivers! -- to be essentially as good as it can be for GCN (within reason -- there's always more that can be done, but they're probably 90-95 percent of the way there). Meanwhile, there is basically no Maxwell 2.0 / GM20x optimized DX12 code path in place, with the result being that Nvidia's hardware has to run the AMD code path, which will inherently be non-optimal. In addition, AMD can look at the code and suggest using features that will further benefit AMD's specific architecture.
> 
> Unless I'm seriously mistaken, this is effectively best-case (again, within reason) AMD GCN performance vs. worst-case (mostly -- it's probably not intentionally slowing down Nvidia hardware) Nvidia GM20x performance. All you need to do is look at the DX11 performance. That is the bar to clear, on both sides. AMD set the bar very low because they didn't optimize their DX11 drivers much at all. Nvidia set the DX11 bar as high as possible to show where the developers need to start, not where they should finish.
> 
> I'm also pretty skeptical of some of the claims and language coming from Oxide. (http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/) "All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months." Sure, but it's an AMD game and so AMD is actively working with the devs while Nvidia isn't. "Some optimizations that the drivers are doing in DX11 just aren't working in DX12 yet." Or, Nvidia has better DX11 driver optimizations than we have DX12 code optimizations. "This in no way affects the validity of a DX12 to DX12 test, as the same exact workload gets sent to everyone's GPUs." Um... see above: you're running AMD-tuned code on Nvidia hardware, and then saying this doesn't affect the validity? I call bunk.
> 
> We could also look at the examples from the Bitcoin/cryptocurrency world regarding optimizations. For two years, OpenCL and AMD was "the only way to go". Then CUDAminer/ccMiner/nvMiner arrived and dramatically closed the gap between many Nvidia and AMD GPUs. True, for certain algorithms AMD was still faster, but for others Nvidia was faster. Basically we had a bunch of OpenCL hacks writing mining code for GPUs, because that's what they knew, and when someone that knew CUDA decided to put in a _real_ effort, they more than doubled CUDA mining performance. The catch is that we're talking GPGPU code, which is quite different from game code, and most GPUs for obvious reasons are geared towards games.
> 
> In other words, I would be extremely hesitant about making blanket statements regarding what DX12 will and won't do for various GPU architectures based on a single game from a developer that is actively receiving help from only one of the GPU vendors. If we were looking at a game with an Nvidia TWIMTBP logo and Nvidia was doing great while AMD was struggling, I'd be saying the exact same thing. Looking at high level descriptions of the hardware and theoretical GFLOPS and using that to back up the current performance is silly, because the current performance is already skewed. Why is AMD performing better on a game with an AMD logo that isn't even in public beta yet ? (And remember that the beta stage is when a lot of optimizations take place!) Because if it was anything else, we would be really dismayed.
> 
> There's a whole separate subtext to this benchmark, however, and sadly it's all about politics. AMD is backing this title and throwing resources at it, and Nvidia is being sort of a dick and spending a lot of effort to optimize their DX11 drivers, but very little effort (if any) to optimize Oxide's DX12 code. Again, read between the lines; here's what Oxide says:
> 
> Being fair to all the graphics vendors
> 
> Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone's machine, regardless of what hardware our players have.
> 
> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, _when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code._
> 
> We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn't move the engine architecture backward (that is, we are not jeopardizing the future for the present).
> 
> Why isn't Oxide actively monitoring the performance of their shaders on all GPUs? Why did Nvidia have to do the work? Oxide is the developer, and they should be largely held accountable for their performance. Except, apparently it's the job of the hardware companies, and so Oxide goes, "Hey, here's our code. We can't be bothered to really tune it right now, as we're making a game, but if you see anything you'd like to improve, let us know. But we might not include your changes on the grounds that we're jeopardizing the future for the present." They're basically asking the developers for help, or at least they're willing to accept help. Nvidia doesn't really want to promote a game their competition is heavily backing, so only in the most egregious cases will they submit improved shaders. As for AMD's optimized shader code, the only requirement is that it not perform worse on Nvidia hardware than the original Oxide shader code. But it seems like the level of optimizations Oxide has made without help from AMD may not be all that great to begin with. And parts of the engine can and will change, up to and beyond the time when the game ships.
> 
> It feels like more than anything, this was Oxide yelling "FIRST!!11!!" and posting a "real-world DX12 gaming benchmark". But like any and all gaming benchmarks, the only thing the benchmark truly shows is how fast this particular game -- at this particular point in time -- runs on the current hardware and drivers. Trying to go beyond that is like using 3DMark to predict performance in Unreal Engine games, or whatever other game engine you care to mention. Yes, sometimes the relative performance matches up nicely, but more often than not it's different. It's why all the serious hardware technology sites use a moderately large (eight or more) library of games, from a variety of engines, to test graphics cards.
> 
> Funny thing: I've heard both Nvidia and AMD say, more or less, "Wait for DX12 Fable to see how things play out." Given Fable is a Microsoft property and MS is going to be heavily invested in providing the best experience on all platforms and showcasing what DX12 can really do, I would agree that's a good idea. Fable also lacks an Nvidia or AMD logo, making it an even better representation of DX12 potential in my book.
> 
> And after Fable, wait for even more games, and perhaps more importantly, see what games are good and not just which games run faster on one set of hardware or another. (Hello, Sniper Elite, I'm looking at you: this is an AMD backed game (with Mantle support!) that, frankly, sort of sucks when it comes to being a game. And the Mantle code path is a joke as well. But to each his own....) If you love Total Annihilation, maybe Ashes is exactly what you've been waiting for, and maybe it stands to really benefit from DX12. I'm more interested in FPS and RPG games personally, so I want to see what non-RTS games do with the API. When we have ten shipping DX12 games, then we can actually start to make real conclusions; for now, it's mostly speculation and fanboi pissing matches, couched in technical speak to try and "prove" the validity of the claims.
> 
> Mahigan is probably right that asynchronous shaders work better on AMD GCN, and Ashes makes use of those to good benefit. Other games will do their own thing and will invariably have different results. From my testing, though, Ashes is looking more interesting as a way to see what type of CPU is the recommended minimum than as a way of evaluating the AMD and Nvidia GPUs against each other. Hell, the instructions for the benchmark even recommended testing it on AMD R9 Fury X, 390, 380, and 370... but on the Nvidia side, only the 980 Ti is recommended. They know already that their current code is so badly optimized on Nvidia hardware that they only want the press to look at the fastest Nvidia GPUs. Or at least, that's how I read it.
> 
> TL;DR: AMD backed titles perform better on AMD; Nvidia titles perform better on Nvidia. This is the _a priori_ starting point. Using performance from such a benchmark to prove superiority of one set of hardware is merely promoting whichever hardware is backing the game.


Well you can think Ashes was influenced by AMD,but then again maybe Oxide is just going with the technology with the best performance for what they are trying to achieve with the game. The bottom line GCN's asynchronous shaders provide a better path for post-processing. AMD designed it that way yes,but that is the prize of innovation. Nvidia will have to follow suite and modify their hardware to keep up.


----------



## gamervivek

ROPs aren't responsible for triangle setup and the first public gpu miner was based on CUDA. Wall of texters, please check your premises before you reach your conclusions.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Redwoodz*
> 
> Well you can think Ashes was influenced by AMD,but then again maybe Oxide is just going with the technology with the best performance for what they are trying to achieve with the game. The bottom line GCN's asynchronous shaders provide a better path for post-processing. AMD designed it that way yes,but that is the prize of innovation. Nvidia will have to follow suite and modify their hardware to keep up.


Isn't this one hell of a leap in logic based off one, obviously, biased (not in a negative tone) example?


----------



## Mahigan

I think I know what is happening.

Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.

nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been

Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.

Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.

The truth is that Maxwell 2 has only a single "Asynchronous" Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues). (Asynchronous is in brackets because it isn't Asynchronous as you will see).

I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.

How it works?










The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion to the Work Distributor (one after the other or in Serial based on priority) . The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMXs. nVIDIA call this entire process "HyperQ".

Here's the documentation: http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf

GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.

Maxwell 2 HyperQ is thus potentially bottlenecked at the Grid Management and then Work Distributor segments of its pipeline. This is because these stages of the Pipeline are "in order". In other words HyperQ contains only a single pipeline (Serial not Parallel).

AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".

A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.

This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.

GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I was reminded of the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rater the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.

I've been away from this stuff for a few years so I'm quite rusty but Direct X 12 is getting me interested once again.

PS. Don't expect an nVIDIA fix through Driver Intervention either. DirectX 12 is limited in driver intervention because it is closer to Metal than DirectX 11. Therefore nVIDIAs penchant for replacing shaders at the driver level is nullified with DirectX 12. DirectX 12 will be far more hardware limited than DirectX 11.

Oxide confirmed it here:
Quote:


> DirectX 11 vs. DirectX 12 performance
> 
> There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. *Second, D3D11 has more opportunities for driver intervention.* The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor's driver teams. On a closed system, this may not be the best choice if you're burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game's updates.


The developer can optimize by replacing shaders on their end. This was already done as confirmed here:
Quote:


> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, when *Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.*


http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


----------



## ToTheSun!

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Just about to pull the trigger on a 980 Ti and then Mahigan comes out of nowhere.. This is becoming a damn nightmare
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On the bright side it looks like my 2600k is going to last forever at this rate


I'd just grab the 980ti.

Firstly, it's going to be better in the majority of games using DX11.1 and older.
Secondly, it's naive to think Nvidia, with the market share and influence they exert, didn't see DX12 and its "quirks" coming. While not unsubscribing from what Mahigan said, that's unbecoming of a company that has shown nothing but the ability to make a profit and manage its leading position in the market.

They're directly tied to the gaming industry, for obvious reasons, and they're well aware of its needs and challenges.


----------



## PontiacGTX

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Just about to pull the trigger on a 980 Ti and then Mahigan comes out of nowhere.. This is becoming a damn nightmare
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On the bright side it looks like my 2600k is going to last forever at this rate


get a a cheap r9 290 as a placeholder and then upgrade next year


----------



## Jim Dotcom

Quote:


> Originally Posted by *ToTheSun!*
> 
> Secondly, it's naive to think Nvidia, with the market share and influence they exert, didn't see DX12 and its "quirks" coming. While not unsubscribing from what Mahigan said, that's unbecoming of a company that has shown nothing but the ability to make a profit and manage its leading position in the market.
> 
> They're directly tied to the gaming industry, for obvious reasons, and they're well aware of its needs and challenges.


And yet they're also nowhere in VR, giving up a lead they held for years before AMD launched LiquidVR. Still naive or just not equipped for it?

Mantle is the key to all of this. It's what is driving LiquidVR and it's what DX12 is based on. GCN has massive 50%+ market share *right now*.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> get a a cheap r9 290 as a placeholder and then upgrade next year


That's probably what I would do. At this point I wouldn't consider the high end.

AMD appear to offer negligible performance gains with their Fury/Fury-X and the DirectX 12 Asynchronous Shading performance of the GTX 980 Ti makes me doubt its abilities come next year.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> That's probably what I would do. At this point I wouldn't consider the high end.
> 
> AMD appear to offer negligible performance gains with their Fury/Fury-X and the DirectX 12 Asynchronous Shading performance of the GTX 980 Ti makes me doubt its abilities come next year.


well the 290 is the cheapest card and offers a good enough frame rate in most of games that arent limited by the performance of AMD driver on DX11 on cpu limited scenarios or by the tessetelation which became a problem with gimpworks
I guess that they have similar ACE count and thats why the asynchronous shader performance is similar,why amd didnt double that number?


----------



## ToTheSun!

Quote:


> Originally Posted by *Jim Dotcom*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> Secondly, it's naive to think Nvidia, with the market share and influence they exert, didn't see DX12 and its "quirks" coming. While not unsubscribing from what Mahigan said, that's unbecoming of a company that has shown nothing but the ability to make a profit and manage its leading position in the market.
> 
> They're directly tied to the gaming industry, for obvious reasons, and they're well aware of its needs and challenges.
> 
> 
> 
> And yet they're also nowhere in VR, giving up a lead they held for years before AMD launched LiquidVR. Still naive or just not equipped for it?
> 
> Mantle is the key to all of this. It's what is driving LiquidVR and it's what DX12 is based on. GCN has massive 50%+ market share *right now*.
Click to expand...

I was just saying.

If you think Nvidia are "not equipped for it [DX12]", that's cool too. Lately, all everyone could talk about was how Nvidia was leveraging its dominant market share to cripple AMD performance. All of a sudden, a benchmark from a game in alpha stages clearly points to Nvidia being unable to deal with DX12.

But, sure, go AMD!


----------



## Jim Dotcom

Quote:


> Originally Posted by *PontiacGTX*
> 
> well the 290 is the cheapest card and offers a good enough frame rate in most of games that arent limited by the performance of AMD driver on DX11 on cpu limited scenarios or by the tessetelation which became a problem with gimpworks
> I guess that they have similar ACE count and thats why the asynchronous shader performance is similar,why amd didnt double that number?


It's not about the number of ACEs, it's about having them or not.
Quote:


> Originally Posted by *PontiacGTX*
> 
> well the 290 is the cheapest card and offers a good enough frame rate in most of games that arent limited by the performance of AMD driver on DX11 on cpu limited scenarios or by the tessetelation which became a problem with gimpworks
> I guess that they have similar ACE count and thats why the asynchronous shader performance is similar,why amd didnt double that number?


Like he said earlier, Fury is probably being held back by render back-ends which are the same as on the 390x. The point here would be that doubling ROPs is likely to be an easier fix (for AMDs high-end) than whatever Pascal has to compete with actual ACEs. AMD will be doubling ROPs as a matter of course at 14nm anyway.


----------



## GorillaSceptre

Quote:


> Originally Posted by *ToTheSun!*
> 
> I'd just grab the 980ti.
> 
> Firstly, it's going to be better in the majority of games using DX11.1 and older.
> Secondly, it's naive to think Nvidia, with the market share and influence they exert, didn't see DX12 and its "quirks" coming. While not unsubscribing from what Mahigan said, that's unbecoming of a company that has shown nothing but the ability to make a profit and manage its leading position in the market.
> 
> They're directly tied to the gaming industry, for obvious reasons, and they're well aware of its needs and challenges.


I think DX12 is going to be adopted really quickly this time, some recent DX11 games are even getting updated to it. It also doesn't seem like a massive undertaking for devs to do.

Ashes seems to be stressing different aspects of both cards. But the nature of the game would give Maxwell an advantage usually wouldn't it, due to Maxwell's better use of triangles/geometry? ( and this game has a metric ton of them)

If what Mahigan says turns out to be accurate, then this game could actually be hiding GCN's true advantage under DX12, specifically Async. I'm not knowledgeable at all on the subject, so i could be completely misunderstanding the situation.

Rather than Nvidia messing up, AMD just could of gotten really lucky with DX12's timing.
Quote:


> Originally Posted by *PontiacGTX*
> 
> get a a cheap r9 290 as a placeholder and then upgrade next year


If the Fury drops in price a bit, i might consider that as something to hold me off till next year.


----------



## mtcn77

Wouldn't it be wiser to stick to consoles than, say, wait for an API update that will not take root before the new graphics line-up is announced? I mean think about it:

Asynchronous Shaders? Consoles have them,
8 post-processing engines? 64 queues? Consoles have them,
Titles that make use of these virtual cpu core buffer extensions? Already in place in the console peasantry world.
It might be high-time we get off our high horses, but don't take every word I say, just trying to rile up a conversation.


----------



## PontiacGTX

Quote:


> Originally Posted by *GorillaSceptre*
> 
> If the Fury drops in price a bit, i might consider that as something to hold me off till next year.


maybe if it drops below 450usd or 500usd if HBM keep some resell value on the card but it wont keep a performasnce lead over GDDR5 if you are not needing a lot of bandwidth/multigpu/gpu bound scenarios
Quote:


> Originally Posted by *mtcn77*
> 
> Wouldn't it be wiser to stick to consoles than, say, wait for an API update that will not take root before the new graphics line-up is announced? I mean think about it:
> 
> Asynchronous Shaders? Consoles have them,
> 8 post-processing engines? 64 queues? Consoles have them,
> Titles that make use of these virtual cpu core buffer extensions? Already in place in the console peasantry world.
> It might be high-time we get down off our high horses, but don't take every word I say, just trying to rile up a conversation.


it is a good choice also but you wont have those purchased games on PC platform.will you?
and also buying a console wouldnt support more the lazy developers that in the would keep porting in the future the console games to PC and some would be unoptimized


----------



## MonarchX

Actually, from what I researched, coding for DirectX 12 is more difficult than for DirectX 11. This is why MS is also releasing DirectX 11.3 for those who prefer a simpler approach. DirectX 12 coding is just like the API itself - low-level. The lower you go, the more complex it gets. How optimized the game will be also depends on developers. They can make a DirectX 12 title run as slow it would on DirectX 11, if coding is poor.

As far as Ashes of Singularity goes - why not wait for at least 2-3 more DirectX 12 benchmarks before making a conclusion on this situation?


----------



## GorillaSceptre

Quote:


> Originally Posted by *MonarchX*
> 
> Actually, from what I researched, *coding for DirectX 12 is more difficult than for DirectX 11*. This is why MS is also releasing DirectX 11.3 for those who prefer a simpler approach. DirectX 12 coding is just the API itself - low-level. The lower you go, the more complex it gets. How optimized the game will be also depends on developers. They can make a DirectX 12 titles that runs as fast it would on DirectX 11 if coding was poor.
> 
> As far as Ashes of Singularity goes - why not wait for at least 2-3 more DirectX 12 benchmarks before concluding the situation?


I understand that, developers have to do what the API usually handles. I was just going off what devs have been saying, specifically the Star Citizen guys. Others have said it only took them a few days to a month to update their games.


----------



## mtcn77

Quote:


> Originally Posted by *PontiacGTX*
> 
> it is a good choice also but you wont have those purchased games on PC platform.will you?
> and also buying a console wouldnt support more the lazy developers that in the would keep porting in the future the console games to PC and some would be unoptimized


A bit more voxel & adventure rpg games would be nice, but MineCraft is winking at me.
I remembered what I forgot to say in my last post: couldn't these extra post-processing operators be put to use for calculations of a 2*2 helper pixel correction algorithm? The reason I like the minecraft concept so much is the less geometry you have, the less artifacts corner shading introduces. Yes, conservative rasterization is warranted to to be brought up at this point, but couldn't this be emulated at these operators? I mean the geometry data is there. Looking up at it, the resultant "jpeg compression" artifacts should just as easily be reversed in my perspective. Games are missing so much of that fidelity that MineCraft reminds us of the Sega era, imo.
PS: here is Beyond3D's perspective on analytical post-processing filters:
Quote:


> Note that, unlike MSAA, these analytical methods do not care whether aliasing artefacts are caused by geometry, transparency or even shader evaluation. All edges are treated equally. Sadly, this also applies to the edges of the on-screen text, though the distortion is smaller for SMAA1x than it is for FXAA.


Just think of seperate channels of algorithms making corrections for geometry, transparency and shader aliases on the fly(like photomasks). There would be no spill-over among overlapping textures. It would totally change the depth perception limit we perceive which we try to overcome by upscaling the resolution, though without total rectification as upscaling is very inefficient performance-wise, thus we are left without any option to compensate for a bias-primed sample. Such use of these post-processing operators would solve the short-comings of current single pass filters.


----------



## Bandersnatch

Quote:


> Originally Posted by *Jim Dotcom*
> 
> And yet they're also nowhere in VR, giving up a lead they held for years before AMD launched LiquidVR. Still naive or just not equipped for it?
> 
> Mantle is the key to all of this. It's what is driving LiquidVR and it's what DX12 is based on. GCN has massive 50%+ market share *right now*.


Wait, *** are you saying Nvidia is "nowhere in VR"? I've been to quite a few trade shows and demos over the years, including everything from DK1 to the latest Oculus Rift stuff. The people demonstrating VR are using Nvidia GPUs 90% of the time in my experience. No joke. Ninety Percent. AMD can talk about LiquidVR, but it's about as widespread as Mantle use.


----------



## Jim Dotcom

Quote:


> Originally Posted by *Bandersnatch*
> 
> Wait, *** are you saying Nvidia is "nowhere in VR"? I've been to quite a few trade shows and demos over the years, including everything from DK1 to the latest Oculus Rift stuff. The people demonstrating VR are using Nvidia GPUs 90% of the time in my experience. No joke. Ninety Percent. AMD can talk about LiquidVR, but it's about as widespread as Mantle use.


You stopped going to VR shows in the past 3 months? Strange that you'd do that just as AMD started to dominate VR, but there you go.

I guess that's why you'd compare it to Mantle use too. That explains why you don't even realise that LiquidVR *is* Mantle.

Get off your hidden account and I'll be happy to teach you about everything you don't know about VR and more. I'm not gonna waste time with some faceless nobody so man up and then we'll talk.


----------



## Clocknut

Quote:


> Originally Posted by *delboy67*
> 
> The world is a very different place now, we now have some really good devs baking dx12 into their engine then effectively give their work away for free, we also have the x86/gcn consoles, dx12 adoption rate will be like no other api imo.


Quote:


> Originally Posted by *Mahigan*
> 
> The majority of people buy hardware based on new and upcoming titles. The majority of people don't buy a new Graphics card with the intent of playing Quake 3 at 1,000 Frames per Second.
> 
> Since the new titles will predominantly be DirectX 12 in nature... AMD doesn't need to bank on Multi-core support. AMD just needs to follow the direction in which the market is headed. Hopefully they won't continue making unnecessary risks.


Really? Explain why we are still using DX9 now? If developers are adopting new DirectX, 90% of the games should have been DX11 by now.

I am not talking about 10% of the market here. I am talking about 50% of the market, which right now is still on DX9.
By looking at this trend u wont be getting DX12 support for most of the games in the market anytime soon.You probably will find a couple of DX12 games on new AAA titles, thats about it. These few games are not representing the entire gaming community.

AMD is on the losing end here if they think DX9-11 is going to be completely gone in 1-2years.


----------



## Jim Dotcom

Quote:


> Originally Posted by *Clocknut*
> 
> Really? Explain why we are still using DX9 now? If developers are adopting new DirectX, 90% of the games should have been DX11 by now.
> 
> I am not talking about 10% of the market here. I am talking about 50% of the market, which right now is still on DX9.
> By looking at this trend u wont be getting DX12 support for most of the games in the market anytime soon.You probably will find a couple of DX12 games on new AAA titles, thats about it. These few games are not representing the entire gaming community.
> 
> AMD is on the losing end here if they think DX9-11 is going to be completely gone in 1-2years.


This is very easily explained as your lack of comprehension of the gaming market.

50% of the market is on GCN. What's more, the vast majority of GCN products on the market are far more potent than the average PC. No dev worth squat isn't developing on GCN.

So many "enthusiasts" and none even seems to realise that what we're seeing right now is unheralded in graphics. Never before has any company had such a massive, commanding share of the market. Yes, it's AMD, amazingly enough.


----------



## SpeedyVT

Quote:


> Originally Posted by *Redwoodz*


Asyncronis shaders are not post-processing, however it does include post processing.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Clocknut*
> 
> Really? Explain why we are still using DX9 now? If developers are adopting new DirectX, 90% of the games should have been DX11 by now.
> 
> I am not talking about 10% of the market here. I am talking about 50% of the market, which right now is still on DX9.
> By looking at this trend u wont be getting DX12 support for most of the games in the market anytime soon.You probably will find a couple of DX12 games on new AAA titles, thats about it. These few games are not representing the entire gaming community.
> 
> AMD is on the losing end here if they think DX9-11 is going to be completely gone in 1-2years.


DX11 really started to show because of new consoles. Before that DX9 was king. Now most games will be DX11 that graphics dont matter but where there is a difference they will be DX12. There was no major reason for dev to exclude Windows XP and older DX9 cards for some stupid DX11 features like tessellation. DX12 does give enough reason to consider. It help in graphically intense game and in RTS games. Also just by looking at Mantle which was AMD only and how many games used it is fair to say DX12 will go 2-3x as fast.


----------



## Mahigan

Quote:


> Originally Posted by *Jim Dotcom*
> 
> This is very easily explained as your lack of comprehension of the gaming market.
> 
> 50% of the market is on GCN. What's more, the vast majority of GCN products on the market are far more potent than the average PC. No dev worth squat isn't developing on GCN.
> 
> So many "enthusiasts" and none even seems to realise that what we're seeing right now is unheralded in graphics. Never before has any company had such a massive, commanding share of the market. Yes, it's AMD, amazingly enough.


Curious, where are you getting this 50% of the market is on GCN figure?


----------



## Jim Dotcom

Quote:


> Originally Posted by *Mahigan*
> 
> Curious, where are you getting this 50% of the market is on GCN figure?


Ubisoft - http://hugin.info/143604/R/1936320/698035.pdf



EDIT - Probably between 40-50% assuming AMD's share of PC's is around 5% GCN and maybe 5% other. Depends on when you take the numbers from.

Let's not forget how closely aligned to Nvidia Ubisoft is. Pretty staggering numbers when you think about it right?

Anyway, the point is clear. If you're a game developer you've got a combined market of 50% or so GCN to work with. Ain't nobody with a clue developing on Nvidia hardware. Kinda puts those 20% "discrete gpu" market share numbers in perspective doesn't it?


----------



## GorillaSceptre

Quote:


> Originally Posted by *Clocknut*
> 
> Really? Explain why we are still using DX9 now? If developers are adopting new DirectX, 90% of the games should have been DX11 by now.
> 
> I am not talking about 10% of the market here. I am talking about 50% of the market, which right now is still on DX9.
> By looking at this trend u wont be getting DX12 support for most of the games in the market anytime soon.You probably will find a couple of DX12 games on new AAA titles, thats about it. These few games are not representing the entire gaming community.
> 
> AMD is on the losing end here if they think DX9-11 is going to be completely gone in 1-2years.


Apples and oranges.

1. Windows 10 is a free upgrade, unlike in the past, people don't have to pay to get DX12.

2. Its supported by all GPU's as far back as the 400 series.

3. Microsoft is piling a ton of marketing behind it as it also pertains to their console.

4. Every major studio has said they are supporting it. Some, like Frostbite have even said they want Windows 10/ DX12 to be mandatory for all 2016 games using Frostbite.

This is a completely different situation compared to how it's been with other releases. People aren't being asked to buy a new GPU to get one or two features devs may never use, instead everyone is being put on an even playing field. DX12 adoption is already happening.


----------



## Mahigan

Ahhh

You're talking about Consoles using GCN. I think that with DirectX 12, the line between coding for a Console and coding for a PC becomes blurred. With DirectX 11, however, the two are very much separate entities due to the serial nature of that API.

I see your point.


----------



## semitope

Quote:


> Originally Posted by *ToTheSun!*
> 
> I'd just grab the 980ti.
> 
> Firstly, it's going to be better in the majority of games using DX11.1 and older.
> Secondly, it's naive to think Nvidia, with the market share and influence they exert, didn't see DX12 and its "quirks" coming. While not unsubscribing from what Mahigan said, that's unbecoming of a company that has shown nothing but the ability to make a profit and manage its leading position in the market.
> 
> They're directly tied to the gaming industry, for obvious reasons, and they're well aware of its needs and challenges.


you guys have just a little too much faith in this market share and influence. It is quite plausible that they designed for dx11 as best they could and end up with bottlenecks on dx12. Market dominance and influence can't fix engineering decisions. While AMD seems to have been designing for the future of GPUs as usual, nvidia was not. It's not at all impossible if you look at the history of the two companies. Nvidia wins in public opinion but AMD does the innovating.

The best case, they do something in drivers and dx12 games are as dx11 games were, both trading blows.

Worst case, AMD influenced dx12 with mantle beyond what nvidia expected or was prepared for and Pascal ends up another dx11 architecture or comes late to fix dx12 performance. It's quite possible dx12 was not to take the form it has no and was to be dx11 with more features.
Quote:


> Originally Posted by *Jim Dotcom*
> 
> Ubisoft - http://hugin.info/143604/R/1936320/698035.pdf
> 
> 
> 
> EDIT - Probably between 40-50% assuming AMD's share of PC's is around 5% GCN and maybe 5% other. Depends on when you take the numbers from.
> 
> Let's not forget how closely aligned to Nvidia Ubisoft is. Pretty staggering numbers when you think about it right?
> 
> Anyway, the point is clear. If you're a game developer you've got a combined market of 50% or so GCN to work with. Ain't nobody with a clue developing on Nvidia hardware. Kinda puts those 20% "discrete gpu" market share numbers in perspective doesn't it?


exactly. AMD might be behind on dGPU but the fact they are in all the consoles means more in the end.


----------



## SpeedyVT

All this fanboyism is sickening me. Why can't we just talk about the technical aspects of GCN and it's performance gains on draw call environments such as Ashes. Then discuss the physical strengths of Maxwell 2 and it's application in certain titles. Then we can produce a consensus of what titles of games are better suited for GCN and which are better for Maxwell. Such as FPS, RTS and etc. We should be discussing the aspects of what titles would be better suited for either system.


----------



## mtcn77

Quote:


> Originally Posted by *SpeedyVT*
> 
> All this fanboyism is sickening me. Why can't we just talk about the technical aspects of GCN and it's performance gains on draw call environments such as Ashes. Then discuss the physical strengths of Maxwell 2 and it's application in certain titles. Then we can produce a consensus of what titles of games are better suited for GCN and which are better for Maxwell. Such as FPS, RTS and etc. We should be discussing the aspects of what titles would be better suited for either system.


A new RTS title in the Gaming Evolved program would be well warranted.


----------



## provost

I haven't read more than a couple of posts in this thread, but this is generally the result that I expected for Nvidia from DX12. It doesn't take a rocket scientist to figure out that a company that is focused on "pricing products relative to software based performance" would want to manage as much of that performance gain itself to maximize profits, or risk upsetting its software based multi tiered sku strategy that has worked so well for it over the last few years....








Why leave that money on be table as "free performance", when it can be packaged it up and sold at a premium as performance gains over its current gen of gpus?....lol, particularly if the company has a near monopoly in the consumer discreet gpu segment, as reflected by its market share.


----------



## Xuper

Ok I thought Because Geforce Titan X is fastest single Card in *X World* (If we ignore 295X2) then It should be faster than in *Y world*.But Ashes shows us that it's not 100% True.It can be slower than X World.

Game engine tells 980 Ti :

1) I have A task and Task is all about *Serial* workload.now Please do me a favor and bring me your FPS.*Geforce got 46 fps*.

2) I have a task and Task is all about *Parallel* workload.now Please do me a favor and bring me your FPS.*Geforce got 45 fps*.

wonder that Why you didn't see 20% more FPS.


----------



## SpeedyVT

Quote:


> Originally Posted by *Xuper*
> 
> Ok I thought Because Geforce Titan X is fastest single Card in *X World* (If we ignore 295X2) then It should be faster than in *Y world*.But Ashes shows us that it's not 100% True.It can be slower than X World.
> Game engine tells 980 Ti :
> 1) I have A task and Task is all about *Serial* workload.now Please do me a favor and bring me your FPS.*Geforce got 46 fps*.
> 2) I have a task and Task is all about *Parallel* workload.now Please do me a favor and bring me your FPS.*Geforce got 45 fps*.
> 
> wonder that Why you didn't see 20% more FPS.


It has to dedicate a specific amount of it's hardware to serialize the parallel workloads. So in sense it could lose a fraction of it's frames aggregating it in the driver into a serialized workload.


----------



## Glottis

Quote:


> Originally Posted by *Xuper*
> 
> wonder that Why you didn't see 20% more FPS.


because it's a niche game engine with questionable quality and questionable optimization choices. maybe let's wait for Unreal Engine 4, Frostbine, Source 2, CryEngine, Unity and other big engines to come with DX12 mode and see their performance before crying sky is falling.

there are too many people in this thread basing all their DX12 performance claims solely on this one benchmark which is very wrong thing to do.


----------



## SpeedyVT

Quote:


> Originally Posted by *Glottis*
> 
> because it's a niche game engine with questionable quality and questionable optimization choices. maybe let's wait for Unreal Engine 4, Frostbine, Source 2, CryEngine, Unity and other big engines to come with DX12 mode and see their performance before crying sky is falling.


Technically niche being RTS... RTS being a game that relies on more draw calls. Yeah in that sense it's niche. I'd say both NVidia and AMD should gain 20% more fps over Ashes in a DX12 title with fewer draw calls. However this is good, I want to see games that can utilize draw calls.


----------



## error-id10t

Isn't this a win-win-win for everyone or have I lost the plot?

Nvidia can add more whatever it is that they're "lacking" today and causes a bottleneck. AMD can add more whatever it is that they're "lacking" today and causes a bottleneck - both of these scenarios have been discussed at length here already.

Then there's us the consumers who (if both companies fix their bottlenecks) will receive a very nice increase in performance for DX12 games.


----------



## SpeedyVT

Quote:


> Originally Posted by *error-id10t*
> 
> Isn't this a win-win-win for everyone or have I lost the plot?
> 
> Nvidia can add more whatever it is that they're "lacking" today and causes a bottleneck. AMD can add more whatever it is that they're "lacking" today and causes a bottleneck - both of these scenarios have been discussed at length here already.
> 
> Then there's us the consumers who (if both companies fix their bottlenecks) will receive a very nice increase in performance for DX12 games.


It's a win-win because parralel workloads has a future unlike serial. Serial will eventually hit a theoretical wall much like core performance in x86 today, it can only get so good before every iteration is practically the same with more bells & whistles.

Intel is starting to move toward parralel workloads with multi-threaded SMT processors. It won't be long before six cores will be the base level of an i7 core count.

Serial is sooooooo SCSI.


----------



## delboy67

Was just thinking, nvidia have set this up perfectly to sell different cards next year to the very same people who bought high end 9** series cards this year as an upgrade from high end 7** series, say what you want about them but when comes to selling cards they are brilliant


----------



## SpeedyVT

Quote:


> Originally Posted by *delboy67*
> 
> Was just thinking, nvidia have set this up perfectly to sell different cards next year to the very same people who bought high end 9** series cards this year as an upgrade from high end 7** series, say what you want about them but when comes to selling cards they are brilliant


Their marketing team could sell me my own shirt off my back and I wouldn't be the wiser.


----------



## Exilon

Quote:


> Originally Posted by *SpeedyVT*
> 
> It has to dedicate a specific amount of it's hardware to serialize the parallel workloads. So in sense it could lose a fraction of it's frames aggregating it in the driver into a serialized workload.


Yes, it's called a hardware scheduler; GCN has them too, way more hardware for scheduling in fact. Modern GPUs are devices where *a lot* of effort is dedicated to getting data and instructions to the right place and time. Nvidia has relied on driver-level grid optimizations to increase GPU occupancy since Kepler, and it certainly didn't hurt them much in the CPU overhead department.

But wow what a thread. Dev basically says "our analysis indicates driver issues with Nvidia" and then we have >50 pages of "nuh uh, we know better".


----------



## MonarchX

Quote:


> Originally Posted by *SpeedyVT*
> 
> Their marketing team could sell me my own shirt off my back and I wouldn't be the wiser.


Well, he is right you know. GTX 9xx series card do benefit from DirectX 12 at least somewhat. Right now there are barely any DirectX 12 games in development or at least announced. At first, just like with every API, DirectX 12 games will not be all that well-coded. They won't provide as much of a boost to AMD cards as this benchmark has provided, but it will be a greater boost than to NVidia cards. DirectX 11 games will be still be mainstream while big DiretX 12 titles are still in development, which again, will take more time than usual due to DirectX 12 complexity. By the time developers get better at coding for DirectX 12 new GeForce cards optimized for DirectX 12 will be out for sale. I mean have you seen Pascal specs???


----------



## ToTheSun!

Quote:


> Originally Posted by *SpeedyVT*
> 
> Intel is starting to move toward parralel workloads with multi-threaded SMT processors. It won't be long before six cores will be the base level of an i7 core count.


The 5820K is already here. "i7" is just nomenclature and has no bearing on what a consumer should choose.


----------



## Themisseble

http://forums.anandtech.com/showthread.php?t=2443723


Spoiler: Warning: Spoiler!



This mode assumes you have an infinitely powerful GPU ie CPU-bound. All cores loaded to 100% through entire benchmark.

DX11 CPU Test Summary

Average Framerate (all batches) 16.7 FPS
Average Framerate (normal batches) 25.9 FPS
Average Framerate (medium batches) 17.8 FPS
Average Framerate (heavy batches) 11.8 FPS

DX12 CPU Test Summary

Average Framerate (all batches) 59.2 FPS
Average Framerate (normal batches) 67.9 FPS
Average Framerate (medium batches) 62.6 FPS
Average Framerate (heavy batches) 50.0 FPS
Average CPU Framerate (all batches) 59.2
Percent GPU Bound (normal batches) 62.5%
Percent GPU Bound (medium batches) 67.6%
Percent GPU Bound (heavy batches) 70.3%

DX12 gain/loss vs DX11+354%, +262%, +351%, +423%



anyone making Dx12 FX 8350 benchmarks?


----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> One thing is for certain, nVIDIA will not be able to replace shaders, as they would do under DX11 in their drivers, which means that a longstanding nVIDIA advantage will vanish under DirectX12.


expected but good to have it confirmed.


----------



## OneB1t

yes i have [email protected] + R9 290X flashed to 390X
CPU usage is 97-99% all cores whole benchmark

this is [email protected]

==Sub Mark Heavy Batch ==================================
Total Time: 58.024754
Avg Framerate : 27.051168 ms (36.966980 FPS)
Weighted Framerate : 27.200045 ms (36.764645 FPS)
CPU frame rate (estimated framerate if not GPU bound): 26.965591 ms (37.084297 FPS)
Percent GPU Bound: 2.325356%
Driver throughput (Batches per ms): 3244.936035
Average Batches per frame: 20154.013672

im getting 36.97 fps which is better than pcper result but still expected more than this









pcper i3-4330 - 36
pcper FX-8370 - 32.4
pcper i7-6700K - 49.5
pcper i7-5960X - 46.5


----------



## Mahigan

Quote:


> Originally Posted by *flopper*
> 
> expected but good to have it confirmed.


Oxide confirmed it. I had only assumed it. You can see it here:
Quote:


> DirectX 11 vs. DirectX 12 performance
> 
> There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, *D3D11 has more opportunities for driver intervention*. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor's driver teams. On a closed system, this may not be the best choice if you're burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game's updates.


http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


----------



## Mahigan

The developer can optimize by replacing shaders on their end. This was already done as confirmed here:
Quote:


> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, when *Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code*.


http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> yes i have [email protected] + R9 290X flashed to 390X
> CPU usage is 97-99% all cores whole benchmark
> 
> this is [email protected]
> 
> ==Sub Mark Heavy Batch ==================================
> Total Time: 58.024754
> Avg Framerate : 27.051168 ms (36.966980 FPS)
> Weighted Framerate : 27.200045 ms (36.764645 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.965591 ms (37.084297 FPS)
> Percent GPU Bound: 2.325356%
> Driver throughput (Batches per ms): 3244.936035
> Average Batches per frame: 20154.013672
> 
> im getting 36.97 fps which is better than pcper result but still expected more than this
> 
> 
> 
> 
> 
> 
> 
> 
> 
> pcper i3-4330 - 36
> pcper FX-8370 - 32.4
> pcper i7-6700K - 49.5
> pcper i7-5960X - 46.5


wow... really? It runs that bad?
Something has to be wrong with it.

FX 8370 matches i5 at FPU and its better at integer performance. FX 8370 does as good as i5 in THE wicther 3.... looks like that something is wrong with it.

I saw benchmark with A10 7850K ... scored 30-31 FPS ... I think that FX 8350 should be at least 70-90% faster.


----------



## OneB1t

there is definitely something wrong with fx-8xxx performance...
just tested few bus related settings (PCI-E,CPU-NB,FSB,HT,MEM CLOCK, MEM TIMINGS) and it seems like it only scales with CPU clock so 0% to 5% from messing with bus speed

very bad result as in star swarm + mantle FX keeps with i5/i7
but with crazy settings game is GPU bound so no difference between i7-5960X vs FX-8350 even without MSAA enabled

i maybe try to run it with only 4 cores enabled and see how it change results


----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> Oxide confirmed it. I had only assumed it. You can see it here:
> http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


your assumptions works better than mine in this field.








Your posts was a good way to explain stuff.

AMD´s Plan after winning the console bids was to advantage that with Dx12 and future gaming and win 10.
Mantle had a superb Impact for me so I await games eagerly with it.


----------



## semitope

Quote:


> Originally Posted by *SpeedyVT*
> 
> Technically niche being RTS... RTS being a game that relies on more draw calls. Yeah in that sense it's niche. I'd say both NVidia and AMD should gain 20% more fps over Ashes in a DX12 title with fewer draw calls. However this is good, I want to see games that can utilize draw calls.


iirc witcher 3 devs said they had to change their render engine because all the wonderful things we say from e3 could only work well in dx12.
Quote:


> The billowing smoke and roaring fire from the trailer? "It's a global system and it will kill PC because transparencies - without DirectX 12 it does't work good in every game."


if this is related to draw calls, it's not limited to RTS games. It would simply be used differently


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> there is definitely something wrong with fx-8xxx performance...
> just tested few bus related settings (PCI-E,CPU-NB,FSB,HT,MEM CLOCK, MEM TIMINGS) and it seems like it only scales with CPU clock so 0% to 5% from messing with bus speed
> 
> very bad result as in star swarm + mantle FX keeps with i5/i7
> but with crazy settings game is GPU bound so no difference between i7-5960X vs FX-8350 even without MSAA enabled
> 
> i maybe try to run it with only 4 cores enabled and see how it change results


yea, please do.

I have also benchmarked FX 6300 vs i3 and FX 4300 in Bf4 DX11/Mantle. Yet FX 6300 just destroy both in DX11 and Mantle.... So it is weird that i3 is faster than FX 8370.
I do remember "kinda" same problem in BF4 at start on FX 6300... it was underperforming.

http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/3/


----------



## OneB1t

just tested FX-8320 with 4 cores disabled so it works like FX-4xxx

[email protected]
==Sub Mark Heavy Batch ==================================
Total Time: 57.973419
Avg Framerate : 37.767700 ms (26.477652 FPS)
Weighted Framerate : 38.071899 ms (26.266090 FPS)
CPU frame rate (estimated framerate if not GPU bound): 37.571934 ms (26.615612 FPS)
Percent GPU Bound: 6.452394%
Driver throughput (Batches per ms): 3170.016602
Average Batches per frame: 34691.609375

[email protected]
==Sub Mark Heavy Batch ==================================
Total Time: 57.954021
Avg Framerate : 30.121634 ms (33.198730 FPS)
Weighted Framerate : 30.450636 ms (32.840034 FPS)
CPU frame rate (estimated framerate if not GPU bound): 26.878809 ms (37.204029 FPS)
Percent GPU Bound: 83.472687%
Driver throughput (Batches per ms): 4916.684082
Average Batches per frame: 36009.164063

6fps increase from doubling CPU performance...

interesting number is Driver throughput (Batches per ms): 3170.016602 vs 4916.684082

even with such increase such low FPS increase?


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> just tested FX-8320 with 4 cores disabled so it works like FX-4xxx
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.973419
> Avg Framerate : 37.767700 ms (26.477652 FPS)
> Weighted Framerate : 38.071899 ms (26.266090 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 37.571934 ms (26.615612 FPS)
> Percent GPU Bound: 6.452394%
> Driver throughput (Batches per ms): 3170.016602
> Average Batches per frame: 34691.609375
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.954021
> Avg Framerate : 30.121634 ms (33.198730 FPS)
> Weighted Framerate : 30.450636 ms (32.840034 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.878809 ms (37.204029 FPS)
> Percent GPU Bound: 83.472687%
> Driver throughput (Batches per ms): 4916.684082
> Average Batches per frame: 36009.164063
> 
> 6fps increase from doubling CPU performance...


You should tell that to AotS support
- can you please do same benchmark with single/three module/s (2 cores and 6 cores)?


----------



## CrazyElf

This all makes me ask another question:

Will more CPU cores finally be of use? Namely, will the HEDT be of use?

Let's assume a scenario:

5820K vs 6700K
Both have 32 GB of RAM, with the 6700K having slightly faster RAM but in dual channel versus the 5820K having slightly slower RAM but in quad channel
2 GPUs (say 2x Fury in CF), the same SSDs, and so on
Are we finally seeing a situation with DX12 (and there are some existing games that can take advantage of the extra cores like BF4), where we can expect better frames on the HEDT setup?

Edit: I"m not 100% sure on this one, as Dx12 is supposed to take away from CPU demands? Can anyone with knowledge on this matter answer?

Quote:


> Originally Posted by *Mahigan*
> 
> That's probably what I would do. At this point I wouldn't consider the high end.
> 
> AMD appear to offer negligible performance gains with their Fury/Fury-X and the DirectX 12 Asynchronous Shading performance of the GTX 980 Ti makes me doubt its abilities come next year.


In theory, a Fury would be the most future proof. I think Fury X is another option, but I don't like the CLC (it will evaporate and they have pump noise issues), plus it's not much faster clock for clock. Some of the Fury can be unlocked and they have some pretty good custom PCBs coming (Asus Fury Strix for example). I hope we'll see some OC oriented ones.

The bigger problem that I see then is the fact that the Fury series has only 4GB of VRAM, which even if it is well positioned to take advantage of the latest and greatest DX12, it's not going to be super card due to the VRAM problems. HBM2 will solve this one though.

Hmm ... yet another option would be to get a 390 or 390X and Crossfire them (a 390 CF is probably similarly priced to a Fury X). The 8GB might be of value after all because if DX12 does indeed leverage GPUs better, then 3 and 4 way CF might scale better than they historically do, which in turn might mean that 4GB will be inadequate to satisfy the core requirements of 3 or 4 GPUs combined. Sigh ... I wish we would see a Lightning r9 390X.

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I think I know what is happening.
> 
> Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.
> 
> nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been
> 
> 
> 
> Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.
> 
> The truth is that Maxwell 2 has only a single "Asynchronous" Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues). (Asynchronous is in brackets because it isn't Asynchronous as you will see).
> 
> I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.
> 
> How it works?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion to the Work Distributor (one after the other or in Serial based on priority) . The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMXs. nVIDIA call this entire process "HyperQ".
> 
> Here's the documentation: http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf
> 
> GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.
> 
> Maxwell 2 HyperQ is thus potentially bottlenecked at the Grid Management and then Work Distributor segments of its pipeline. This is because these stages of the Pipeline are "in order". In other words HyperQ contains only a single pipeline (Serial not Parallel).
> 
> AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".
> 
> A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.
> 
> This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 and when compared to even a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.
> 
> 
> GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I was reminded of the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rater the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.
> 
> I've been away from this stuff for a few years so I'm quite rusty but Direct X 12 is getting me interested once again.
> 
> PS. Don't expect an nVIDIA fix through Driver Intervention either. DirectX 12 is limited in driver intervention because it is closer to Metal than DirectX 11. Therefore nVIDIAs penchant for replacing shaders at the driver level is nullified with DirectX 12. DirectX 12 will be far more hardware limited than DirectX 11.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Oxide confirmed it here:
> The developer can optimize by replacing shaders on their end. This was already done as confirmed here:
> http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


So where does that leave us for next year for AMD?

They'll release another revision of GCN?
They will definitely do something then to address the peak rasterization rate?
And of course, HBM2? That should address the 4GB limit and add even more bandwidth.
Will they try to improve how they handle the bandwidth too or is that moot with HBM2 since there is so much bandwidth to work with?

You may actually want to discuss this with Ryan Smith.

His email is:
ryan.smith at anandtech.com

Anyways, for those interested (from Techreport):


Rasterization did not change from 290X to Fury X and on the Fury X is actually slower than a 780Ti! This would also explain why adding more shaders does not scale very well (compare the Fury to Fury X at the same clockspeed). It simply was not the botlteneck.

Quote:


> Originally Posted by *error-id10t*
> 
> Isn't this a win-win-win for everyone or have I lost the plot?
> 
> Nvidia can add more whatever it is that they're "lacking" today and causes a bottleneck. AMD can add more whatever it is that they're "lacking" today and causes a bottleneck - both of these scenarios have been discussed at length here already.
> 
> Then there's us the consumers who (if both companies fix their bottlenecks) will receive a very nice increase in performance for DX12 games.


Not entirely. For the people who own Maxwell and earlier Nvidia GPUs, they will probably find their cards not aging as well as they otherwise might have.

This seems to be a pattern right now with Nvidia GPUs:


7970 vs 680; the 680 was praised at launch, but as time goes by, it is looking like the 7970 has done much better
The 290/290X versus the 780Ti/Titan (Kepler), again the 290X was hammered for it's high power consumption, but right now the 290X is definitely in the stronger position
I think that given the situation, neither the Fury X nor the 980Ti will age that well. The Fury X due to 4Gb of VRAM and the 980Ti as it is not parallel
So perhaps the best purchase then would be the 16nm AMD top end or 2nd top end (ex: whatever equal to 7950, 290, or Fury) card?

Quote:


> Originally Posted by *Jim Dotcom*
> 
> With what is basically 50% market share, GCN is for sure being developed on by any cross-platform software company. This is why this matters so much with DX12. It's Nvidia who will need to do all the optimising while AMD will barely need to lift a finger. Even with Nvidia bursting their guts there will still be clear hardware advantages (async shaders for one) that no amount of optimisation will be able to equal. Async shaders are easy to optimise any (dx12) game for btw, so it'll be very clear which devs aren't doing it.
> What's readily apparent is that AMD's DX11 woes are pretty much over and Nvidia will really have to move up another gear. AMD appears to have played a great long game by bringing everything to bear at the same time (and of course get zero credit for doing that). They just need to reach the finishing line now and it'll all be pretty interesting again.


They basically control the two consoles and by extension, the console ports. That in turn affects how PCs are developed.

Nvidia does have the financial resources though to more than "catch" up. As Mahigan has noted, Pascal will likely more than make up for the poor Parallel performance of Maxwell 2.

Off topic, but are you planning on posting more often here or at TechSoda? The reason is because you do have a good track record - you were right about the fact that 20nm TSMC HP proved to be something that both GPU vendors skipped.

I think though that if Zen turns out reasonably well (ex: if they actually make that 40% IPC gain they promised over Bulldozer/Steamroller - we're talking Conroe like gains here), and if they can get a solid implementation next generation, addressing the rasteriziation issues (which should be possible with the extra transistor budget on 16nm), we could see a turnaround?


----------



## ZealotKi11er

Quote:


> Originally Posted by *OneB1t*
> 
> just tested FX-8320 with 4 cores disabled so it works like FX-4xxx
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.973419
> Avg Framerate : 37.767700 ms (26.477652 FPS)
> Weighted Framerate : 38.071899 ms (26.266090 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 37.571934 ms (26.615612 FPS)
> Percent GPU Bound: 6.452394%
> Driver throughput (Batches per ms): 3170.016602
> Average Batches per frame: 34691.609375
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.954021
> Avg Framerate : 30.121634 ms (33.198730 FPS)
> Weighted Framerate : 30.450636 ms (32.840034 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.878809 ms (37.204029 FPS)
> Percent GPU Bound: 83.472687%
> Driver throughput (Batches per ms): 4916.684082
> Average Batches per frame: 36009.164063
> 
> 6fps increase from doubling CPU performance...
> 
> interesting number is Driver throughput (Batches per ms): 3170.016602 vs 4916.684082
> 
> even with such increase such low FPS increase?


Try with 6 Cores also. Something really interesting there. I think this benchmarks only work up too 6 Cores. Based on your numbers its 26.6 fps vs 37.2 fps. Thats is a 40% which is about right.
Also 5960X with 8 Core @ 3.3GHz is going againts 6700K with 4 Core @ 4.2GHz. In reality if only 6 of the cores work then 6700K is slightly faster.


----------



## PontiacGTX

Quote:


> Originally Posted by *CrazyElf*
> 
> In theory, a Fury would be the most future proof. I think Fury X is another option, but I don't like the CLC (it will evaporate and they have pump noise issues), plus it's not much faster clock for clock. Some of the Fury can be unlocked and they have some pretty good custom PCBs coming (Asus Fury Strix for example). I hope we'll see some OC oriented ones.
> 
> Hmm ... yet another option would be to get a 390 or 390X and Crossfire them (a 390 CF is probably similarly priced to a Fury X). The 8GB might be of value after all because if DX12 does indeed leverage GPUs better, then 3 and 4 way CF might scale better than they historically do, which in turn might mean that 4GB will be inadequate to satisfy the core requirements of 3 or 4 GPUs combined. Sigh ... I wish we would see a Lightning r9 390X.
> Not entirely. For the people who own Maxwell and earlier Nvidia GPUs, they will probably find their cards not aging as well as they otherwise might have.
> 
> This seems to be a pattern right now with Nvidia GPUs:


it doesnt matter because it will be replaced with the effieicnt 2016 gpus and also they would have HBM and new features

the r9 290 are the cheapest cards and they are the same as a r9 390, it is easier to sell one old card than 2 rebranded


----------



## Xuper

Quote:
Originally Posted by *Mahigan* 

Oxide confirmed it. I had only assumed it. You can see it here:
http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/


> DirectX 11 vs. DirectX 12 performance
> 
> There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, *D3D11 has more opportunities for driver intervention*. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor's driver teams. On a closed system, this may not be the best choice if you're burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game's updates.


What does it mean ? We saw performance of Fx-83xx and It's really low even Core i3 beats it.According to your Post , I assume that AMD should Fix low FPS Fx-83xx via Catalyst ? Am I right?


----------



## PontiacGTX

Quote:


> Originally Posted by *Xuper*
> 
> What does it mean ? We saw performance of Fx-83xx and It's really low even Core i3 beats it.According to your Post , I assume that AMD should Fix low FPS Fx-83xx via Catalyst ? Am I right?


the onyl way the FX could perform fine, it is the game engine takes advantage of 8 cores /that architecture.if the game engines just can use 4 cores then having more than 4 cores wont help except the directx 11 overhead isnt present on the drivers,yet the gain isnt big


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> just tested FX-8320 with 4 cores disabled so it works like FX-4xxx
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.973419
> Avg Framerate : 37.767700 ms (26.477652 FPS)
> Weighted Framerate : 38.071899 ms (26.266090 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 37.571934 ms (26.615612 FPS)
> Percent GPU Bound: 6.452394%
> Driver throughput (Batches per ms): 3170.016602
> Average Batches per frame: 34691.609375
> 
> [email protected]
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.954021
> Avg Framerate : 30.121634 ms (33.198730 FPS)
> Weighted Framerate : 30.450636 ms (32.840034 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.878809 ms (37.204029 FPS)
> Percent GPU Bound: 83.472687%
> Driver throughput (Batches per ms): 4916.684082
> Average Batches per frame: 36009.164063
> 
> 6fps increase from doubling CPU performance...
> 
> interesting number is Driver throughput (Batches per ms): 3170.016602 vs 4916.684082
> 
> even with such increase such low FPS increase?


I think that this is due to some of the engines coding optimizations. I can't prove this but something tells me that the engine uses AVX in order to program for thread Parallelism. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
Quote:


> Applications
> Suitable for floating point-intensive calculations in multimedia, scientific and financial applications (integer operations are expected in later extensions).
> Increases parallelism and throughput in floating point SIMD calculations.
> Reduces register load due to the non-destructive instructions.
> Improves Linux RAID software performance (required AVX2, AVX is not sufficient)


I think this is the case because Intel AVX can operate on up to 8 floats in parallel. This would seem to be quite useful if you're working on a game engine like Ashes of the Singularity. It could be used in order to optimize the Physics in the game engine. For physics we'd be looking at Float AVX performance. This is an achilles heel of the AMD FX lineup of Processors.

I've emailed Oxide and am waiting on a response.

If this is the case then the performance difference could be attributed by the inclusion of such CPU level optimizations.



You can simulate many Physics effects over the CPU with AVX:

Intel's AVX Cloth demo. Here are the FPS for the three options of the demo:
- Use Serial: 10FPS
- Use 128bit: 83FPS
- Use 256bit: 130PS


You can also do things such as Onloaded Shadows:

Onloaded Shadows is a technique by which shadow maps can be calculated asynchronously on the CPU. By using cascades, the shadow map for objects near the camera are calculated every frame on the GPU, but the shadow maps for objects in the second cascade and beyond are calculated less often on the CPU. This allows for better work balancing across the CPU and GPU.


----------



## Kuivamaa

If it is indeed using AVX, don't look further. AVX is well known that it does not perform any better than say, SSE3 on Vishera.


----------



## mav451

Follow-up question:
Well Zen be better at AVX?

I'm assuming this is a yes


----------



## SpeedyVT

Quote:


> Originally Posted by *ToTheSun!*
> 
> The 5820K is already here. "i7" is just nomenclature and has no bearing on what a consumer should choose.


I mean in a couple of years there won't be an i7 quad but an i7 six core or greater as the base. They'll move the quad core specs to the i5s.
Quote:


> Originally Posted by *semitope*
> 
> iirc witcher 3 devs said they had to change their render engine because all the wonderful things we say from e3 could only work well in dx12.
> if this is related to draw calls, it's not limited to RTS games. It would simply be used differently


RTS games are just naturally draw call heavy along with mmorpgs. I agree about the Witcher III, developers can add an extreme amount of visual effects with DX12. Any word if GTAV is getting a DX12 version?
Quote:


> Originally Posted by *OneB1t*
> 
> there is definitely something wrong with fx-8xxx performance...
> just tested few bus related settings (PCI-E,CPU-NB,FSB,HT,MEM CLOCK, MEM TIMINGS) and it seems like it only scales with CPU clock so 0% to 5% from messing with bus speed
> 
> very bad result as in star swarm + mantle FX keeps with i5/i7
> but with crazy settings game is GPU bound so no difference between i7-5960X vs FX-8350 even without MSAA enabled
> 
> i maybe try to run it with only 4 cores enabled and see how it change results


It's still in alpha lots of changes to happen.


----------



## OneB1t

its based on star swarm benchmark which is much older and performed better with vishera
so im not counting on any fixes for FX performance this is probably close to final performance


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> I think that this is due to some of the engines coding optimizations. I can't prove this but something tells me that the engine uses AVX in order to program for thread Parallelism. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
> I think this is the case because Intel AVX can operate on up to 8 floats in parallel. This would seem to be quite useful if you're working on a game engine like Ashes of the Singularity. It could be used in order to optimize the Physics in the game engine. For physics we'd be looking at Float AVX performance. This is an achilles heel of the AMD FX lineup of Processors.
> 
> I've emailed Oxide and am waiting on a response.
> 
> If this is the case then the performance difference could be attributed by the inclusion of such CPU level optimizations.
> 
> 
> 
> You can simulate many Physics effects over the CPU with AVX:
> 
> Intel's AVX Cloth demo. Here are the FPS for the three options of the demo:
> - Use Serial: 10FPS
> - Use 128bit: 83FPS
> - Use 256bit: 130PS
> 
> 
> You can also do things such as Onloaded Shadows:
> 
> Onloaded Shadows is a technique by which shadow maps can be calculated asynchronously on the CPU. By using cascades, the shadow map for objects near the camera are calculated every frame on the GPU, but the shadow maps for objects in the second cascade and beyond are calculated less often on the CPU. This allows for better work balancing across the CPU and GPU.


Hmm sandra is not good indicator of AMD CPU performance.
FX 8350 has 8x 128Bit FPU
- Games DX11 is bound to single FPU/Integer performance - high game complexity just even further IPC
- DX11 you can offload many physics to 4-6 threads but there is still IPC bottleneck.

This benchmarks AotS shows something very interesting.
- FX 8350 should beat any i3 even OC-ed if game is well threaded on DX12.
- I can show you mantle results in BF4 - and you will see that i3 is actually very weak against i5. I5 will give you around 185% performance of an i3. While FX 6300 and FX 8350 will sit between them. Fx 4300 will just lagg behind i3 (1-3 FPS)
- You have to know that FX 8350 has still 4x 256BIT FPU!

And look at latest RTS game - FX 8350 does pretty well for Dx11.
http://gamegpu.ru/rts-/-strategii/total-war-arena-test-gpu.html


----------



## OneB1t

in multithread workload i3 have half performance of FX-8xxx so its weird that ashes use 100% of all 8 threads reports such low performance

also i7 have double performance of i3 in ashes and HT help by adding about 10% of performance


----------



## SpeedyVT

Quote:


> Originally Posted by *OneB1t*
> 
> in multithread workload i3 have half performance of FX-8xxx so its weird that ashes use 100% of all 8 threads reports such low performance
> 
> also i7 have double performance of i3 in ashes and HT help by adding about 10% of performance


I'm curious to see what the FX-8150 does for performance in this game.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> Hmm sandra is not good indicator of AMD CPU performance.
> FX 8350 has 8x 128Bit FPU
> - Games DX11 is bound to single FPU/Integer performance - high game complexity just even further IPC
> - DX11 you can offload many physics to 4-6 threads but there is still IPC bottleneck.
> 
> This benchmarks AotS shows something very interesting.
> - FX 8350 should beat any i3 even OC-ed if game is well threaded on DX12.
> - I can show you mantle results in BF4 - and you will see that i3 is actually very weak against i5. I5 will give you around 185% performance of an i3. While FX 6300 and FX 8350 will sit between them. Fx 4300 will just lagg behind i3 (1-3 FPS)
> - You have to know that FX 8350 has still 4x 256BIT FPU!
> 
> And look at latest RTS game - FX 8350 does pretty well for Dx11.
> http://gamegpu.ru/rts-/-strategii/total-war-arena-test-gpu.html


The Core i3 used is the Intel Core i3-4330. It is based on Haswell. With Haswell you get twice the FPU output, under VP8 tests or in AVX/2, than with Sandy Bridge. If you're coding for AVX/2 then you end up with the same FPU output on a Haswell Core i3 than you do on a Sandy Bridge Core i5.



And look at the VP8 test here (not indicative of AVX but utilizing the new FPU benefits of Haswell):


I know that AMD FX-8xxx have 8x 128 bit FMACs. Two FMACs can be combined into a single 256-bit FPU. I realize this, but I was talking about AVX performance. The Sisoft test is indicative of AMD FXs floating AVX performance.

Thanks to Hyperthreading... Sandy Bridge gives you two 256-bit AVX operations per clock, Haswell gives you two 256-bit AVX operations per clock, while Bulldozer facilitates one (no Hyperthreading).


The kicker is that:

AVX

Intel Sandy Bridge/Ivy Bridge per core:
8 DP FLOPs/cycle
16 SP FLOPs/cycle

Intel Haswell/Broadwell per core:
16 DP FLOPs/cycle
32 SP FLOPs/cycle

AMD Bulldozer/Piledriver/Steamroller per module (2 cores):
8 DP FLOPs/cycle
16 SP FLOPs/cycle

What we're looking at is

A Core i3-4330 can thus perform two AVX operations at:
32 DP FLOPs/cycle
64 DP FLOPs/cycle

An AMD FX-83xx can thus perform one AVX operation at:
32 DP FLOPs/cycle
64 DP FLOPs/cycle

AMD FX-83xx performs at half the rate of a Core i3 4330 where AVX/2 is concerned. This is what led me to believe that AVX was being utilized by Ashes of the Singularity.

EDIT: Here are some older tests I've found which use AVX and compare an i5 2500K to an FX-8150:


----------



## ku4eto

Mahigan, can you explain why the Thuban in the graph is ahead of the 8150







I kinda don't understand it, it is lower on frequency + it has only 6 cores vs 4(8).


----------



## OneB1t

also what is interesting is this [email protected] have nearly same performance as [email protected] which points me to direction that maybe memory bandwidth can have impact on this test

[email protected] HIGH HEAVY
==Sub Mark Heavy Batch ==================================
Total Time: 58.039249
Avg Framerate : 42.960213 ms (23.277353 FPS)
Weighted Framerate : 43.192684 ms (23.152069 FPS)
CPU frame rate (estimated framerate if not GPU bound): 42.894264 ms (23.313141 FPS)
Percent GPU Bound: 0.810465%
Driver throughput (Batches per ms): 2434.806396
Average Batches per frame: 34503.402344


----------



## Cyro999

Quote:


> Originally Posted by *ku4eto*
> 
> Mahigan, can you explain why the Thuban in the graph is ahead of the 8150
> 
> 
> 
> 
> 
> 
> 
> I kinda don't understand it, it is lower on frequency + it has only 6 cores vs 4(8).


Thuban has higher performance at the same clock speed as bulldozer or even a significant clock speed deficit and some advantages/disadvantages depending on the workload because of vastly different architectural design

Bulldozer is 4 modules. I'm not certain of original bulldozer, but piledriver which should be similar doesn't get 2x scaling from running 2 threads on 1 module. It's about 1.68x in a few typical workloads, so an 8 threaded CPU would have the performance of ~6.72x its singlethreaded performance, not 8x. It can be way less or more depending on the workload though.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> The Core i3 used is the Intel Core i3-4330. It is based on Haswell. With Haswell you get twice the FPU output, under VP8 tests or in AVX/2, than with Sandy Bridge. If you're coding for AVX/2 then you end up with the same FPU output on a Haswell Core i3 than you do on a Sandy Bridge Core i5.
> 
> 
> 
> And look at the VP8 test here (not indicative of AVX but utilizing the new FPU benefits of Haswell):
> 
> 
> I know that AMD FX-8xxx have 8x 128 bit FMACs. Two FMACs can be combined into a single 256-bit FPU. I realize this, but I was talking about AVX performance. The Sisoft test is indicative of AMD FXs floating AVX performance.
> 
> Thanks to Hyperthreading... Sandy Bridge gives you two 256-bit AVX operations per clock, Haswell gives you two 256-bit AVX operations per clock, while Bulldozer facilitates one (no Hyperthreading).
> 
> 
> The kicker is that:
> 
> AVX
> 
> Intel Sandy Bridge/Ivy Bridge per core:
> 8 DP FLOPs/cycle
> 16 SP FLOPs/cycle
> 
> Intel Haswell/Broadwell per core:
> 16 DP FLOPs/cycle
> 32 SP FLOPs/cycle
> 
> AMD Bulldozer/Piledriver/Steamroller per module (2 cores):
> 8 DP FLOPs/cycle
> 16 SP FLOPs/cycle
> 
> What we're looking at is
> 
> A Core i3-4330 can thus perform two AVX operations at:
> 32 DP FLOPs/cycle
> 64 DP FLOPs/cycle
> 
> An AMD FX-83xx can thus perform one AVX operation at:
> 32 DP FLOPs/cycle
> 64 DP FLOPs/cycle
> 
> AMD FX-83xx performs at half the rate of a Core i3 4330 where AVX/2 is concerned. This is what led me to believe that AVX was being utilized by Ashes of the Singularity.
> 
> EDIT: Here are some older tests I've found which use AVX and compare an i5 2500K to an FX-8150:


No its not true. - mostly you are right but this does not effect in games -
Why do you believe that? Simple question? Why?
Simple that is not whole ting.. i5 2500K will beat i3 4330 in this same benchmarks.

Look at CB
i3 4330 has 2x better FPU (as you say) than i5 2500K - yet i5 is much faster? Whats the problem?

And here is simple benchmark





FX 6300 vs i3 4330


----------



## ku4eto

Quote:


> Originally Posted by *Cyro999*
> 
> Thuban has higher performance at the same clock speed as bulldozer or even a significant clock speed deficit and some advantages/disadvantages depending on the workload because of vastly different architectural design
> 
> Bulldozer is 4 modules. I'm not certain of original bulldozer, but piledriver which should be similar doesn't get 2x scaling from running 2 threads on 1 module. It's about 1.68x in a few typical workloads, so an 8 threaded CPU would have the performance of ~6.72x its singlethreaded performance, not 8x. It can be way less or more depending on the workload though.


I guess really DX12 and coding for multiple cores is breathing life again into old AMD hardware. My 960T will last probably for another year or 2 if games keep improving at such rate.


----------



## ZealotKi11er

Quote:


> Originally Posted by *OneB1t*
> 
> also what is interesting is this [email protected] have nearly same performance as [email protected] which points me to direction that maybe memory bandwidth can have impact on this test
> 
> [email protected] HIGH HEAVY
> ==Sub Mark Heavy Batch ==================================
> Total Time: 58.039249
> Avg Framerate : 42.960213 ms (23.277353 FPS)
> Weighted Framerate : 43.192684 ms (23.152069 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 42.894264 ms (23.313141 FPS)
> Percent GPU Bound: 0.810465%
> Driver throughput (Batches per ms): 2434.806396
> Average Batches per frame: 34503.402344


What GPU do you have?


----------



## Cyro999

Quote:


> Originally Posted by *ku4eto*
> 
> I guess really DX12 and coding for multiple cores is breathing life again into old AMD hardware. My 960T will last probably for another year or 2 if games keep improving at such rate.


It's not making a 2009 CPU perform like a 2015 CPU, it's showing that the 2011 CPU was never really better than the 2009 CPU in the first place
Quote:


> which points me to direction that maybe memory bandwidth can have impact on this test


Is definately possible, memory performance has a very significant effect on many other games that are not limited by the GPU, but particularly on a few. +10% is not unheard of going from alright RAM to fast RAM.

@above - Singlethreaded performance is still clearly very important for this game. On the first 4 threads, a 6600k/6700k will have like a 70% advantage over Piledriver even without avx/avx2 being utilised for any noticable performance change. You can easily test if AVX is being utilized anyway as you can disable it on the OS level.


----------



## Themisseble

Quote:


> Originally Posted by *Cyro999*
> 
> It's not making a 2009 CPU perform like a 2015 CPU, it's showing that the 2011 CPU was never really better than the 2009 CPU in the first place
> Is definately possible, memory performance has a very significant effect on many other games that are not limited by the GPU, but particularly on a few. +10% is not unheard of going from alright RAM to fast RAM.
> 
> @above - Singlethreaded performance is still clearly very important for this game. On the first 4 threads, a 6600k/6700k will have like a 70% advantage over Piledriver even without avx/avx2 being utilised for any noticable performance change. You can easily test if AVX is being utilized anyway as you can disable it on the OS level.


yeah but stardock said that FX 8350 is good (near i7) for this game.

FX 6300 is somehow better at multitasking as phenom x6.
http://www.pcgameshardware.de/CPU-Hardware-154106/Specials/CPU-Multitasking-Test-1075340/


----------



## Kand

Quote:


> Originally Posted by *Themisseble*
> 
> yeah but stardock said that FX 8350 is good (near i7) for this game.




Stardock?

Yeah. No.


----------



## Cyro999

Quote:


> Originally Posted by *Themisseble*
> 
> yeah but stardock said that FX 8350 is good (near i7) for this game.


Got a source? I remember them saying it was on haswell i5/i7 level at stock speeds, but that was for the star swarm demo without a huge heavyweight RTS engine under the hood. It was also 2 years ago before a ton of driver changes and optimization

Also, benchmarks confirm quite clearly that it's not the case for ashes right now










30fps for 8370, 72fps for 6700k is the max they got out of them.


----------



## Themisseble

Quote:


> Originally Posted by *Cyro999*
> 
> Got a source? I remember them saying it was on haswell i5/i7 level at stock speeds, but that was for the star swarm demo without a huge heavyweight RTS engine under the hood. It was also 2 years ago before a ton of driver changes and optimization
> 
> Also, benchmarks confirm quite clearly that it's not the case for ashes right now
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 30fps for 8370, 72fps for 6700k is the max they got out of them.


Isnt that weird?

Fx does all benchmarks same... and also i3.


----------



## ToTheSun!

Quote:


> Originally Posted by *Themisseble*
> 
> [Isnt that weird?
> 
> Fx does all benchmarks same... and also i3.


My best guess is that it's strictly CPU limited at all times, making the graphical settings irrelevant. The same doesn't happen for the 6700K and 5960X.


----------



## ZealotKi11er

Quote:


> Originally Posted by *ToTheSun!*
> 
> My best guess is that it's strictly CPU limited at all times, making the graphical settings irrelevant. The same doesn't happen for the 6700K and 5960X.


The thing is that if we look DX11 vs DX12 the gain is very small. I would have expected at least 50% in 100% CPU limited games.


----------



## Themisseble

Quote:


> Originally Posted by *ZealotKi11er*
> 
> The thing is that if we look DX11 vs DX12 the gain is very small. I would have expected at least 50% in 100% CPU limited games.


Its also weird that
i7 6700K is 2x faster than i3 Dx11 mode?
So this game scales perfectly in DX11 .. i dont belive that.

Loks like pcper is not trusted source anymore.
http://s7.postimg.org/8ruppxjiz/Capture.png


----------



## mav451

Yup - nVidia's test data (in my reddit post that was quoted earlier) showcased the difference between two cores (any clock) and 6-cores (reduced clock). Both scenario simulated a lesser performing CPU situation.
IPC and core-count continue to be strong factors, but ideally you aren't _relying entirely_ for DX12 to address this - b/c it won't.

I'd be curious just how low you could go in IPC before it's limiting, but it's obviously higher than Piledriver is providing.
My guess is SB IPC is probably plenty, and frankly that's not exactly a high point of entry in 2015. I'd wager even a 2009 Lynnfield would be able to show proper scaling.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Themisseble*
> 
> Its also weird that
> i7 6700K is 2x faster than i3 Dx11 mode?
> So this game scales perfectly in DX11 .. i dont belive that.
> 
> Loks like pcpaper is not trusted source anymore.


PCper is fine. We just dont know the detail of what this benchmarks does and uses.

The i3 is clocked lower and is a generation older then i7.

Also even in DX11 this game will use all the cores or at least 4 Cores minimum.

I was just hopping FX-8320 with its 8 cores would match or beat Core i5s. I mean even if FX is 40% if 4 cores are used if you use just 6 Core full in a game then it should match a 4 Core Intel and if all 8 Cores are used it should probably beat it.


----------



## Themisseble

Quote:


> Originally Posted by *mav451*
> 
> Yup - nVidia's test data (in my reddit post that was quoted earlier) showcased the difference between two cores (any clock) and 6-cores (reduced clock). Both scenario simulated a lesser performing CPU situation.
> IPC and core-count continue to be strong factors, but ideally you aren't _relying entirely_ for DX12 to address this - b/c it won't.
> 
> I'd be curious just how low you could go in IPC before it's limiting, but it's obviously higher than Piledriver is providing.
> My guess is SB IPC is probably plenty, and frankly that's not exactly a high point of entry in 2015. I'd wager even a 2009 Lynnfield would be able to show proper scaling.


NO! and NO!
Thats the point of DX12... to offload things from single core. Making IPC unimportant... Thats the point right?


----------



## ZealotKi11er

Quote:


> Originally Posted by *Themisseble*
> 
> NO! and NO!
> Thats the point of DX12... to offload things from single core. Making IPC unimportant... Thats the point right?


To me this benchmark basically takes DX12 and current CPU and makes no difference. They though so much draw calls that the extra cores off FX CPU make no difference.


----------



## Themisseble

Okay
little more reasearch

Fx 8350/7970
DX11

Total Time: 57.735985
Avg Framerate : 91.936279 ms (10.877098 FPS)
Weighted Framerate : 95.895103 ms (10.428061 FPS)
Average Batches per frame: 29422.806641

DX12


Spoiler: Warning: Spoiler!



Total Time: 57.871265
Avg Framerate : 51.670776 ms (19.353300 FPS)
Weighted Framerate : 52.491089 ms (19.050854 FPS)
CPU frame rate (estimated framerate if not GPU bound): 29.191771 ms (34.256229 FPS)
Percent GPU Bound: 99.766823%
Driver throughput (Batches per ms): 3174.050293
Average Batches per frame: 33720.179688

Configuration
API: DirectX
Resolution: 1920x1080
Fullscreen: True
Bloom Quality: High
PointLight Quality: High
Glare Quality: High
Shading Samples: 16
Terrain Shading Samples: 8
Shadow Quality: High
Temporal AA Duration: 6
Temporal AA Time Slice: 2
Multisample Anti-Aliasing: 1
Texture Rank : 1



i3 4330/R9 390X

http://www.pcper.com/image/view/60371?return=node%2F63599

huge difference between avg. batches per frame.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Themisseble*
> 
> Okay
> little more reasearch
> 
> Fx 8350/7970
> DX11
> 
> Total Time: 57.735985
> Avg Framerate : 91.936279 ms (10.877098 FPS)
> Weighted Framerate : 95.895103 ms (10.428061 FPS)
> Average Batches per frame: 29422.806641
> 
> DX12
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Total Time: 57.871265
> Avg Framerate : 51.670776 ms (19.353300 FPS)
> Weighted Framerate : 52.491089 ms (19.050854 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 29.191771 ms (34.256229 FPS)
> Percent GPU Bound: 99.766823%
> Driver throughput (Batches per ms): 3174.050293
> Average Batches per frame: 33720.179688
> 
> Configuration
> API: DirectX
> Resolution: 1920x1080
> Fullscreen: True
> Bloom Quality: High
> PointLight Quality: High
> Glare Quality: High
> Shading Samples: 16
> Terrain Shading Samples: 8
> Shadow Quality: High
> Temporal AA Duration: 6
> Temporal AA Time Slice: 2
> Multisample Anti-Aliasing: 1
> Texture Rank : 1
> 
> 
> 
> i3 4330/R9 390X
> 
> http://www.pcper.com/image/view/60371?return=node%2F63599
> 
> huge difference between avg. batches per frame.


In DX11 you get 10 fps, in DX12 you get 20 fps and 34 fps if you had a faster GPU. You are good.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> No its not true. - mostly you are right but this does not effect in games -
> Why do you believe that? Simple question? Why?
> Simple that is not whole ting.. i5 2500K will beat i3 4330 in this same benchmarks.
> 
> Look at CB
> i3 4330 has 2x better FPU (as you say) than i5 2500K - yet i5 is much faster? Whats the problem?
> 
> And here is simple benchmark
> 
> 
> 
> 
> 
> FX 6300 vs i3 4330


I'm talking AVX/2. I think that Ashes of the Singularity uses AVX/2 optimizations. I'm not saying that a Core i3 4330 is faster than an FX 6300 under all benchmarks. I'm saying that I believe the Physics engine, in Ashes of the Singularity, uses AVX/2. Considering the amount of units on the screen, operating independently from one another, that has got to tax the CPU in terms of Physics and AI.


----------



## Themisseble

Ashes of the singularity is build on same engine as star swarm benchmark.

physics calc. depends on IPC.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> Ashes of the singularity is build on same engine as star swarm benchmark.
> 
> physics calc. depends on IPC.


Star Swarm was a Time Demo. None of the units were being destroyed. There was no physics being calculated. Ashes of the Singularity is an actual game. With full Physics and AI.

I believe that the Physics were optimized using AVX/2.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Star Swarm was a Time Demo. None of the units were being destroyed. There was no physics being calculated. Ashes of the Singularity is an actual game. With full Physics and AI.
> 
> I believe that the Physics were optimized using AVX/2.


And you think that even with AVX optimizations... single haswell FPU is 2x faster than FPU(2x128Bit) in module?
... I am just waisting time.

The problem is that athlon x4 860K is as fast as i3 or FX 8350.
- http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/3/

http://forums.ashesofthesingularity.com/462287


----------



## OneB1t

someone allready contacted oxide techsupport about that? so they can tell us their fairy tale?


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> someone allready contacted oxide techsupport about that? so they can tell us their fairy tale?


I dont know.
If FX is really that bad in this game then...


----------



## OneB1t

i really dont understand that as both oxide developers and people from AMD told that FX will shine in this benchmark and then this


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> And you think that even with AVX optimizations... single haswell FPU is 2x faster than FPU(2x128Bit) in module?
> ... I am just waisting time.
> 
> The problem is that athlon x4 860K is as fast as i3 or FX 8350.
> - http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/3/
> 
> http://forums.ashesofthesingularity.com/462287


I think that the Dual Core (HT) Core i3 4330 is better at processing AVX/2 than an FX-8350. Well it's not that I think so it is that it is better at AVX/2 as a matter of fact.

The Core i3 4330, however, lacks the parallel capability of feeding the GPU while handling the AI, Physics etc. Therefore it barely breaks 40FPS but does surpass the A10-7870K. 2 Cores with HT surpasses 4 A10-7870K cores.

As for the A10-7870K, it is akin the the test done in this thread. Disabling 2 modules on an FX-8350 resulted in a small performance hit. Nothing large. I do not know what could be causing this issue. The only thing I can think of is that Ashes of the Singularity does not detect all of the FX-8350s cores.

One way of verifying this is by looking at the results page of the Ashes of the Singularity benchmark. It shows you the number of cores detected.


----------



## OneB1t

it detect all cores as all cores are used to 100% ...
detect it as 4 physical cores and 8 logical


----------



## Themisseble

Also rom digital foundry

i3 DX12 mode beats i7 4770K DX11 on GTx 970 - when complex scene

Do not know how does i7 beat i3 in DX11 by double in pcper benchmark.
http://www.eurogamer.net/articles/digitalfoundry-2015-ashes-of-the-singularity-dx12-benchmark-tested


----------



## OneB1t

thats because when ashes of singularity are set to LOW there is no GPU limitation
when its set to high or crazy nearly every procesor is capable to feed even powerfull graphic card like 390X/fury x or 980/980ti as GPU limitation comes to play

pcper tested with LOW setting where i7 doubles the i3 performance
but strangely FX-83xx is not doubling FX-4xxx performance


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> it detect all cores as all cores are used to 100% ...
> detect it as 4 physical cores and 8 logical


When you disable 2 Modules, does it detect it as 2 Physical and 4 Logical?


----------



## OneB1t

yep when you disable 2 modules its detected as 2 physical and 4 logical
unfortunatelly new motherboards cant disable 1 core per module as older motherboard can

so i cant test 4 modules with 4 cores (i expect that this setting will be very close to full FX-83xx in this benchmark)


----------



## Cyro999

Quote:


> Originally Posted by *Themisseble*
> 
> Its also weird that
> i7 6700K is 2x faster than i3 Dx11 mode?
> So this game scales perfectly in DX11 .. i dont belive that.
> 
> Loks like pcper is not trusted source anymore.
> http://s7.postimg.org/8ruppxjiz/Capture.png


Not a trusted source any more? EVEN IF IT USES ONLY TWO CORES to show no advantage for having 3+ real cores instead of 2 cores + HT, the i7 is clocked 21-27% higher in game due to way higher stock clock and a bit of turbo boost with an architecture that shows ~8-15% higher IPC in games. It would show 40% better performance easily - but it obviously benefits from having over 2 cores, maybe even from the extra cache too.

You didn't think at all before writing that comment, you will say anything to defend bulldozer. If any site goes against what you think it's automatically a bad source - a 6700k, even with two cores disabled, is in no way comparable to a 4330.


----------



## ZealotKi11er

Some are questioning FX-4 to FX-8. Look at 6700K to 5960X. Fair to say there is something going on here.


----------



## Kuivamaa

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Some are questioning FX-4 to FX-8. Look at 6700K to 5960X. Fair to say there is something going on here.


Difference being that 6700k has much higher clocks plus a bit stronger core. I say that with this in mind, the 5960X being so close indicates AotS benefits from more threads than 6700k can provide.


----------



## wiak

then playing ashes in DX12 on a Intel iGPU is far superior than any other card from another vendor?
for me AMD seem to be better adapted to DX12, main due to its ACE engines, if you look into their compute performance (opencl, bitcoin etc) they are fast

my ancient HD 7970 card (released in 2011) went from 10fps to 20fps by switching from DX11 to DX12 in the ashes benchmark
this was a system with FX-8350, 32GB ram and 512GB Sammy SSD

i wonder how vulkan perf will be


----------



## ZealotKi11er

Quote:


> Originally Posted by *Kuivamaa*
> 
> Difference being that 6700k has much higher clocks plus a bit stronger core. I say that with this in mind, the 5960X being so close indicates AotS benefits from more threads than 6700k can provide.


6-Core.


----------



## SpeedyVT

Quote:


> Originally Posted by *Cyro999*
> 
> Got a source? I remember them saying it was on haswell i5/i7 level at stock speeds, but that was for the star swarm demo without a huge heavyweight RTS engine under the hood. It was also 2 years ago before a ton of driver changes and optimization
> 
> Also, benchmarks confirm quite clearly that it's not the case for ashes right now
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 30fps for 8370, 72fps for 6700k is the max they got out of them.


You need disable core parking, I believe it's still an issue in Windows 10. Anyway core parking is an issue on an AMD FX processor.


----------



## Cyro999

Quote:


> Originally Posted by *SpeedyVT*
> 
> You need disable core parking, I believe it's still an issue in Windows 10. Anyway core parking is an issue on an AMD FX processor.


Still an issue on HT CPU's, too.


----------



## OneB1t

prolly not an issue when all cores ale 100% loaded but ill give it a try


----------



## ZealotKi11er

Quote:


> Originally Posted by *SpeedyVT*
> 
> You need disable core parking, I believe it's still an issue in Windows 10. Anyway core parking is an issue on an AMD FX processor.


If you compare FX-6300 vs FX-8370 you see that the extra Module (2 Cores) do nothing. The small difference in performance is because FX-8370 is clocked higher. There is no FX-4300 there. If there was it would have scored ~ 20 fps.


----------



## OneB1t

question is
why they do nothing while they are 100% loaded?


----------



## SpeedyVT

Quote:


> Originally Posted by *Cyro999*
> 
> Still an issue on HT CPU's, too.


Disable that coreparking on the HT as well.


----------



## ZealotKi11er

Quote:


> Originally Posted by *OneB1t*
> 
> question is
> why they do nothing while they are 100% loaded?


Could be a architecture bottleneck.


----------



## wiak

Quote:


> Originally Posted by *geoxile*
> 
> http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/
> 
> Another interesting commentary on drivers by a former Nvidia driver developer intern.
> 
> The most relevant part
> In short, the current API infrastructure and the methodology is FUBAR. Because the APIs obfuscate a lot of things the game devs write crappy (game) renderers, and then the driver teams have to try to fix those problems on their (driver) side.
> 
> Clearly, Nvidia's been more hands on with developers and most likely their driver-side fixes are just more comprehensive and game-specific. AMD traditionally hasn't been as hands on, with the exception of GE titles.
> 
> With DX12, the game devs will require greater knowledge and they'll handle more work. But that means the hand-holding by the driver team becomes less necessary and less effective, since DX12 won't hide things away like earlier APIs. The implication is that the driver teams from AMD and Nvidia will only be doing the bare minimum to support DX12 renderers.


well have you been watching the vulkan video? amd was there first, watch this part 



 and you will see/listen to a amd driver developer (vulkan/mantle/opengl) talk about exacly what you write about, highy recommended watching whole video, alot of good stuff from amd, nvidia, valve (yes those behind half life..)


----------



## wiak

Quote:


> Originally Posted by *Kand*
> 
> Current gen consoles will never be able to support dx12.
> 
> Look at how slow dx11 adoption was and is. I expect the same rate for dx12.
> 
> Nothing to see here, folks. Move along.


are you sure? most of the stuff in xbone is amd gcn, soo basicly they can upgrade to windows 10, thats a known fact,
http://support.xbox.com/en-GB/browse/xbox-on-windows

sony are sure to support vulkan for their ps4, why? what else is there to support when the hardware is amd gcn ;=)


----------



## mtcn77

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Could be a architecture bottleneck.


The modules suffer cache contention and core stall in the ganged 256 bit mode, afaik. I wonder if the test could be recompiled in xop...


----------



## PontiacGTX

Quote:


> Originally Posted by *Kuivamaa*
> 
> Difference being that 6700k has much higher clocks plus a bit stronger core. I say that with this in mind, the 5960X being so close indicates AotS benefits from more threads than 6700k can provide.


i dont think.directx 11 and mantle based games would have a 5960x over a 6700k if the game engine is properly supporting those extra cores/threads
Quote:


> Originally Posted by *ZealotKi11er*
> 
> Some are questioning FX-4 to FX-8. Look at 6700K to 5960X. Fair to say there is something going on here.


skylake might have something that intel hasnt revealed?or the game is designed to scale better with AMD cpus when doublig the core count?or it is just related to the engine?


----------



## ZealotKi11er

Quote:


> Originally Posted by *PontiacGTX*
> 
> i dont think.directx 11 and mantle based games would have a 5960x over a 6700k if the game engine is properly supporting those extra cores/threads
> skylake might have something that intel hasnt revealed?or the game is designed to scale better with AMD cpus when doublig the core count?or it is just related to the engine?


4790K gets high fps too.


----------



## flopper

Quote:


> Originally Posted by *wiak*
> 
> well have you been watching the vulkan video? amd was there first, watch this part
> 
> 
> 
> and you will see/listen to a amd driver developer (vulkan/mantle/opengl) talk about exacly what you write about, highy recommended watching whole video, alot of good stuff from amd, nvidia, valve (yes those behind half life..)


AMD develop and have superb engineers however they do need to hire better marketing and brand people.
Yea I like when nvidia guys try to think they are first anywhere.


----------



## Mahigan

The first CPU core always gets the most usage. This is the same in DX11 as it is in DX12. If you have a Quad Core (with HT) clocked at 4GHz vs a Six Core (with HT) clocked at 3GHz you will most likely, unless the game is both incredibly demanding and incredibly well threaded, gain more performance from the Quad Core.


----------



## Mahigan

I also think that it is safe to say the developer used the Intel compiler for his CPU coding. Ashes of the Singularity is most likely using all of the optimizations available to Intel CPUs. Be it the various SSE implementations as well as AVX/2 etc.

I emailed Oxide and am waiting for a response.


----------



## Mahigan

I also think that Ashes of the Singularity is doing something funky on FX CPUs. We should be seeing a large boost when moving from an FX4xxx to an FX8xxx but we are not. Clearly Ashes is not able to make use of several optimizations on AMD CPUs. This could be due to the way the Intel compiler treats AMD CPUs or the way AMD CPUs react to certain optimizations (such as AVX for example). Since Ashes detects AMD FX CPUs as being made up of more logical than physical units, it may be running the code on them in a funky way.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> I also think that it is safe to say the developer used the Intel compiler for his CPU coding. Ashes of the Singularity is most likely using all of the optimizations available to Intel CPUs. Be it the various SSE implementations as well as AVX/2 etc.
> 
> I emailed Oxide and am waiting for a response.


You may be right.... however we should see difference between FX 4300 and FX 8350.


----------



## Mahigan

The fact that AMD ran their Fury test systems with Intel CPUs during the conference when they announced Fiji is telling. It could be that DirectX 12, unlike Mantle, favors Intel CPUs to a greater extent.

So far I cannot pinpoint a specific pattern in order to explain what we're seeing. There are too many variables at play and too many questions needing answers.


----------



## Themisseble

Yes but is simple... CMT is made for MT.

It could be compiler based problem.
http://i.imgur.com/Jq3l3cV.jpg
CPU frames
i7 5960X was clearly bottleneck by GTX 980.

only 58 FPS on 8 core/16threads?

found some benchmark where FX 8350 at 5GHz scores 45FPs ... and also that is way to slow as it should


__ https://twitter.com/i/web/status/634367360707727364- they will fix it.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> The fact that AMD ran their Fury test systems with Intel CPUs during the conference when they announced Fiji is telling. It could be that DirectX 12, unlike Mantle, favors Intel CPUs to a greater extent.
> 
> So far I cannot pinpoint a specific pattern in order to explain what we're seeing. There are too many variables at play and too many questions needing answers.


Do you think this game is a good example of AMD's DX12 advantage, or do you think it might be hiding an even bigger advantage for them?

The reason I'm asking is, wouldn't Maxwell's superior triangle throughput usually benefit Nvidia in a game like Ashes?


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> Yes but is simple... CMT is made for MT.
> 
> It could be compiler based problem.
> http://i.imgur.com/Jq3l3cV.jpg
> CPU frames
> i7 5960X was clearly bottleneck by GTX 980.
> 
> only 58 FPS on 8 core/16threads?


Well the DirectX 12 does not make use of perfect parallelism. It is more parallel than DirectX 11 but not perfect. See graphs bellow (provided by AMD):




The first core always gets more use. Therefore if the CPU is clocked higher, it might even perform better, with less cores, than a CPU with more cores but a lower clock. It depends on the work the developer has done in order to spread the workload more evenly across several cores.


----------



## mtcn77

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Do you think this game is a good example of AMD's DX12 advantage, or do you think it might be hiding an even bigger advantage for them?
> 
> The reason I'm asking is, wouldn't Maxwell's superior triangle throughput usually benefit Nvidia in a game like Ashes?


Nvidia Maxwell has a single context switch with 32 queues while AMD GCN 1.1/1.2 has 8 with 8 registers for each. Maxwell has 50% more triangle throughput, but context switching between compute and graphics modes stalls it, afaik.


----------



## GorillaSceptre

@Mahigan

http://www.dsogaming.com/news/amds-directx-12-advantage-explained-gcn-architecture-more-friendly-to-parallelism-than-maxwell/


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Do you think this game is a good example of AMD's DX12 advantage, or do you think it might be hiding an even bigger advantage for them?
> 
> The reason I'm asking is, wouldn't Maxwell's superior triangle throughput usually benefit Nvidia in a game like Ashes?


That depends entirely on the kind of Post Processing being done. A game like Ashe's would work best on Maxwell if it weren't for the fact that it makes heavy use of Post Processing effects. The game is filled with various light sources being emitted by every single individual units on the screen as they shoot and blast apart other units. Maxwell is not Parallel enough in order to be able to efficiently process all of those Post Processing effects. The Post Processing is done using Asynchronous Shading. This is a feature that AMDs GCN was built to tackle with an enormous amount of efficiency compared to Maxwell.

What nVIDIA have done, in the past, is taken advantage of DirectX 11's serial nature. They would make use of un-used CPU cycles in order to perform driver interventions (replacing less favorable post processing shaders with more favorable ones). This is why nVIDIAs DX11 performance is so high under Ashe's. The problem with DX12 is that there are less opportunities for driver intervention (closer to Metal) therefore nVIDIA can't optimize like they used too. The increased Parallelism, of DX12, also plays against nVIDIAs favor. All in all you get the results we witnessed.

Ashes of the Singularity thus exposes nVIDIAs architectural problems with Parallelism.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> @Mahigan
> 
> http://www.dsogaming.com/news/amds-directx-12-advantage-explained-gcn-architecture-more-friendly-to-parallelism-than-maxwell/


Oh crap... :/

bring on the haters... lol


----------



## GorillaSceptre

Quote:


> Originally Posted by *mtcn77*
> 
> Nvidia Maxwell has a single context switch with 32 queues while AMD GCN 1.1/1.2 has 8 with 8 registers for each. Maxwell has 50% more triangle throughput, but context switching between compute and graphics modes stalls it, afaik.


Quote:


> Originally Posted by *Mahigan*
> 
> That depends entirely on the kind of Post Processing being done. A game like Ashe's would work best on Maxwell if it weren't for the fact that it makes heavy use of Post Processing effects. The game is filled with various light sources being emitted by every single individual units on the screen as they shoot and blast apart other units. Maxwell is not Parallel enough in order to be able to efficiently process all of those Post Processing effects. The Post Processing is done using Asynchronous Shading. This is a feature that AMDs GCN was built to tackle with an enormous amount of efficiency compared to Maxwell.


Thanks









Still doesn't answer the golden question.. Whats the best GPU to go for in the near future


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Oh crap... :/
> 
> bring on the haters... lol


When i read the comments i was thinking the same


----------



## OneB1t

Quote:


> Originally Posted by *Mahigan*
> 
> The first core always gets more use. Therefore if the CPU is clocked higher, it might even perform better, with less cores, than a CPU with more cores but a lower clock. It depends on the work the developer has done in order to spread the workload more evenly across several cores.


if that was the case then i cant get same fps on [email protected] as on [email protected] as first thread will be more than half slower


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Still doesn't answer the golden question.. Whats the best GPU to go for in the near future


That entirely depends on the developers. I am inclined to believe that DX12 developers will make use of Asynchronous shading, to what degree? Probably less than Ashe's of the Singularity. Asynchronous Shading appears to be something that an RTS will make heavy use of, moreso than say an FPS title.

That being said Battlefield 4, on the PS4, makes heavy use of Asynchronous shading and any future iteration of the Frosbite engine is heading in that direction. Source: http://www.kitguru.net/gaming/matthew-wilson/ea-frostbite-engineer-wants-games-to-require-win10dx12-by-2016/

For AMD, this means that older cards of theirs (all GCN cards) will receive a boost. This should breathe new life into older AMD GPUs. Older nVIDIA GPUs, however, will be left in the dust. That much is clear. As anything older than Maxwell 2 is going to have a hard time dealing with all that extra Parallelism. (they may end up running the DirectX 11.3 path instead and only if it is available. I doubt developers will code both paths though).

I'm not upgrading my 290x Crossfire combination anytime soon. Something tells me I won't need too (maybe even for all of 2016).

Sources:

1. http://techreport.com/news/28196/directx-12-multiadapter-shares-work-between-discrete-integrated-gpus
2. http://blogs.msdn.com/b/directx/archive/2015/05/01/directx-12-multiadapter-lighting-up-dormant-silicon-and-making-it-work-for-you.aspx

Unlike Crossfire and SLI under DX11, Multi-Adapter in DX12 does not replicate Textures in Memory. Therefore two 4GB cards turns into 8GB worth of available framebuffer under DirectX 12 vs 4GB under DirectX 11.


----------



## mtcn77

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Still doesn't answer the golden question.. Whats the best GPU to go for in the near future


You should check out this Stanford Professor's essay. What they describe is, generating a merge shader that interjected an extra merge step before the virtual texels are transitioned into visible pixels. The text gives examples how higher fidelity rendering minutes texels and cause overshading(details smaller than two pixels cannot be represented individually and hence cause overshading). The overhead of 4x antialiasing was found to be 2x the shading load compared to such a proposed shader.
Such gpus as the Fury can help in these situation as, the gpu can indeed compensate for this exorbitant shading overload; eventhough in the triangle throughput scale it isn't faster - it will still destroy its former iterations.
[Quad Fragment Merge Shader]


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> if that was the case then i cant get same fps on [email protected] as on [email protected] as first thread will be more than half slower


I don't think we can view this with certainty with AMDs FX series. However we can see this happening with Intel's processors.

With AMD FX... another piece of the puzzle (likely more than one piece working concurrently) is causing the issues we're seeing. This is why it is nearly impossible to logically deduce what is happening with the AMD FX series.


----------



## Noufel

So far what i've undersood is its all on the developers side to code their game to advantage one or another aka AMD or Nvidia and knowing what market share nvidia has i don't think many devs will bother making good use of asynchroneous shaders, but its very impressive when used and i'll go with 2x arctic islands gpus for my next upgrade cause AMD will have then the efficiency of 14 or 16 nm finfet and will mature its HBM2 memory


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> So far what i've undersood is its all on the developers side to code their game to advantage one or another aka AMD or Nvidia and knowing what market share nvidia has i don't think many devs will bother making good use of asynchroneous shaders, but its very impressive when used and i'll go with 2x arctic islands gpus for my next upgrade cause AMD will have then the efficiency of 14 or 16 nm finfet and will mature its HBM2 memory


There's another way of looking at it. Console developers are making use of Asynchronous shading on both the Xbox One and the PS4 (I think Nintendo is also heading for GCN). That means that console ports to DirectX 12 will likely already be coded for Asynchronous Shading, an important feature of DX12. In a way, the DirectX 12 market is already heavily tilted towards GCN as it encompasses the console market share of GCN and not just the discrete GPU market share of GCN. Another user pointed this out a few pages back. I believe he is onto something. If you also add Multi-Adapter, another DirectX 12 feature, this means that the iGPU market share also plays into this. Anyone with a GCN APU can throw in a discrete GCN GPU and use the available Compute resources of both.


----------



## Casey Ryback

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Still doesn't answer the golden question.. Whats the best GPU to go for in the near future


Want a current benchmark winner? Buy Nvidia.

Want a card that ages like a fine wine? Buy AMD.

This is more or less the way it's been since 2012, could be even better for all AMD owners with DX12.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> That entirely depends on the developers. I am inclined to believe that DX12 developers will make use of Asynchronous shading, to what degree? Probably less than Ashe's of the Singularity. Asynchronous Shading appears to be something that an RTS will make heavy use of, moreso than say an FPS title.
> 
> That being said Battlefield 4, on the PS4, makes heavy use of Asynchronous shading and any future iteration of the Frosbite engine is heading in that direction. Source: http://www.kitguru.net/gaming/matthew-wilson/ea-frostbite-engineer-wants-games-to-require-win10dx12-by-2016/
> 
> For AMD, this means that older cards of theirs (all GCN cards) will receive a boost. This should breathe new life into older AMD GPUs. Older nVIDIA GPUs, however, will be left in the dust. That much is clear. As anything older than Maxwell 2 is going to have a hard time dealing with all that extra Parallelism. (they may end up running the DirectX 11.3 path instead and only if it is available. I doubt developers will code both paths though).
> 
> I'm not upgrading my 290x Crossfire combination anytime soon. Something tells me I won't need too (maybe even for all of 2016).
> 
> Sources:
> 
> 1. http://techreport.com/news/28196/directx-12-multiadapter-shares-work-between-discrete-integrated-gpus
> 2. http://blogs.msdn.com/b/directx/archive/2015/05/01/directx-12-multiadapter-lighting-up-dormant-silicon-and-making-it-work-for-you.aspx
> 
> Unlike Crossfire and SLI under DX11, Multi-Adapter in DX12 does not replicate Textures in Memory. Therefore two 4GB cards turns into 8GB worth of available framebuffer under DirectX 12 vs 4GB under DirectX 11.


Quote:


> Originally Posted by *mtcn77*
> 
> You should check out this Stanford Professor's essay. What they describe is, generating a merge shader that interjected an extra merge step before the virtual texels are transitioned into visible pixels. The text gives examples how higher fidelity rendering minutes texels and cause overshading(details smaller than two pixels cannot be represented individually and hence cause overshading). The overhead of 4x antialiasing was found to be 2x the shading load compared to such a proposed shader.
> Such gpus as the Fury can help in these situation as, the gpu can indeed compensate for this exhorbitant shading overload; eventhough in the triangle throughput scale it isn't faster - it will still destroy its former iterations.
> [Quad Fragment Merge Shader]


Great info guys, thanks.

I was set on the 980 Ti, but I'm starting to want a Fury lol. Would be a Fury X if it weren't for the damn CLC..

I thought GPU memory won't stack unless devs specifically code for it? If thats the case then good luck








Quote:


> Originally Posted by *Noufel*
> 
> So far what i've undersood is its all on the developers side to code their game to advantage one or another aka AMD or Nvidia and knowing what market share nvidia has i don't think many devs will bother making good use of asynchroneous shaders, but its very impressive when used and i'll go with 2x arctic islands gpus for my next upgrade cause AMD will have then the efficiency of 14 or 16 nm finfet and will mature its HBM2 memory


Market share would play a big factor IF the consoles weren't both using GCN too.

EDIT:

Mahigan beat me to it.


----------



## Mahigan

In fact I think that Batman Arkham Knight, the game that was pulled because it performed poorly on the PC once it was ported over from the PS4, may have performed as such because it made use of Asynchronous Shading over the DirectX 11 API on the PC. Something tells me that a Parallel shading technique, over a Serial API, would spell disaster performance wise. It is just a hunch but I believe that this was the cause of its poor performance on the PC.

I wouldn't be surprised to see Batman Arkham Knight re-released as a DirectX 12 title if that was indeed the case.


----------



## mtcn77

I found the explanation for a context switch on redtechgaming:
Quote:


> Because DirectX 11 is essentially serial in its thinking, PreEmption can cause a lot of idle time as Context Switching (a Context Switch, at its most basic, is the processor saving results of one task, and switching to another to begin processing that task) occurs, and naturally this time where the GPU isn't processing data is essentially wasted performance and can also create stutter in the frame rate (frame rate or frame time variance).


[RedGamingTech should change their name to RedTechGaming, it's better in that format]
Essentially, 290X is capable of higher sustained throughput. I visualize its 8 ACE's with 8 registers each as 64 virtual cores waiting for commands from the processor, so the cpu could even be 64 cores wide. Which feels remarkable in a forward looking way.


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> Oh crap... :/
> 
> bring on the haters... lol


why isnt there a single mention in that article that this game is sponsored by amd and amd gpus were primary focus since the start of this games development. i think this would be important disclaimer to mention but curiously not a word. if this was gameworks title we would see entirely different article...


----------



## mtcn77

Quote:


> Originally Posted by *Glottis*
> 
> why isnt there a single mention in that article that this game is sponsored by amd and amd gpus were primary focus since the start of this games development. i think this would be important disclaimer to mention but curiously not a word. *if this was gameworks title we would see entirely different article*...


Yeah, we would be complaining how broken the game was.


----------



## Glottis

Quote:


> Originally Posted by *mtcn77*
> 
> Yeah, we would be complaining how broken the game was.


ashes is a broken mess if you look beyond this amd vs nvidia arguing but no one bothered to look. 30fps with furyx in dx11 mode, any other game with such performance would be burried alive and their forum is flooded with people having technical issues.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> why isnt there a single mention in that article that this game is sponsored by amd and amd gpus were primary focus since the start of this games development. i think this would be important disclaimer to mention but curiously not a word. if this was gameworks title we would see entirely different article...


Maybe that's because Asynchronous Shading is a DirectX 12 feature (the most important one mind you) whereas GameWorks is nVIDIAs technological property. All DirectX 12 Graphics cards are supposed to incorporate the capacity for Asynchronous Shading (Open Standard). All DirectX 12 Graphics Cards (or DirectX 9,10,11) have no such requirement to incorporate GameWorks... in fact it is the opposite. nVIDIA keeps the source code, for GameWorks, secret.

GameWorks is a method to boost nVIDIAs Graphics cards performance over that of its competitors. Asynchronous Shading is just something all DirectX 12 cards are supposed to be able to handle.

That's why.

And I say that as an objective individual. I don't play the partisan game.


----------



## ToTheSun!

Quote:


> Originally Posted by *Glottis*
> 
> Quote:
> 
> 
> 
> Originally Posted by *mtcn77*
> 
> Yeah, we would be complaining how broken the game was.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ashes is a broken mess if you look beyond this amd vs nvidia arguing but no one bothered to look. 30fps with furyx in dx11 mode, any other game with such performance would be burried alive and their forum is flooded with people having technical issues.
Click to expand...

Well, if Batman's newest is, indeed, a good analogy for Ashes, and if coding with the approach used for GCN (consoles) for DX11 really does create a stuttery mess, the issue is beyond AMD vs Nvidia. It's just DX11 being sub-optimal going forward, and serial workloads as well.

IF.


----------



## mtcn77

Quote:


> Originally Posted by *Glottis*
> 
> ashes is a broken mess if you look beyond this amd vs nvidia arguing but no one bothered to look. 30fps with furyx in dx11 mode, any other game with such performance would be burried alive and their forum is flooded with people having technical issues.


I'm not going to involve myself in that discussion, but the stall issues with Nvidia's HyperQ(in-order single core gpu with 32 stepped pipeline) in comparison to AMD's Asynchronous Compute Engines(out-of-order octa-core gpu with 8 stepped pipelines) shouldn't be hard to visualize, imo.


----------



## Mahigan

Quote:


> Originally Posted by *ToTheSun!*
> 
> Well, if Batman's newest is, indeed, a good analogy for Ashes, and if coding with the approach used for GCN (consoles) for DX11 really does create a stuttery mess, the issue is beyond AMD vs Nvidia. It's just DX11 being sub-optimal going forward, and serial workloads as well.
> 
> IF.


Bingo!


----------



## mtcn77

Quote:


> Originally Posted by *ToTheSun!*
> 
> Well, if Batman's newest is, indeed, a good analogy for Ashes, and if coding with the approach used for GCN (consoles) for DX11 really does create a stuttery mess, the issue is beyond AMD vs Nvidia. It's just DX11 being sub-optimal going forward, and serial workloads as well.
> 
> IF.


I should get _some_ flak should my suggestion be acknowledged by its total contradiction to the norms of the enthusiast community, but I do believe everyone should invest in a 1000Hz tv and just sink into the game, console, or not. The display is the primary visual element, not the gpu contrary to the wide spread belief. I haven't looked for its input lag repercussions, or whether it even works, but frame interpolation in a smart tv cannot be underscored enough, imo.


----------



## Mahigan

The truth is that AMD cannot, even with driver optimizations, derive full efficiency of their GCN architecture under DirectX 11. They are aware of this. This is what pushed them to promote Mantle in the first place. GCN is too Parallel for DirectX 11's serial nature.

nVIDIA, on the other hand, can derive full efficiency of their various Kepler and Maxwell architectures, under DirectX 11. They are aware of this. This is why they place so much emphasis on Driver development and driver interventions under DirectX 11. Their architectures serial nature lends itself perfectly to that API.

Under DirectX 12, the tables turn. AMD is able to derive more efficiency out of their GCN architecture. nVIDIA is able to derive a large degree of efficiency with their latest Maxwell 2 but not to the same degree as GCN. All older nVIDIA architectures (pre-Maxwell 2) cannot derive nearly the same degree of efficiency under DirectX 12 than DirectX 11. That is why they perform better under DirectX11 than DirectX12.

Simple.

Anything anyone brings up, to try and detract from this overall statement, can be explained by this statement. If you can explain everything with a statement, then the statement is true.


----------



## resonance spark

@Mahigan, I believe you on are on the right track with the AVX/ other CPU extensions which may be resulting in a speed difference between AMD and Intel CPUs. This behavior has been observed in older AMD CPUs, though I haven't tested with newer AMD gear.

Example sse intel vs amd. It could also be caused by the compiler used and whether there is interleaving in the code and how it is setup as I believe the pipeline between the two processors are different.

Btw, thanks for the great in depth explanation of the whole DX12 AMD/Nvidia, one thing that does bother me is that nvidia can still implement libraries to carry out nvidia GPU optimized tasks like tessellation to hamper performance of GCN, that way numbers will still look great on green team.

I really enjoyed the read.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Well the DirectX 12 does not make use of perfect parallelism. It is more parallel than DirectX 11 but not perfect. See graphs bellow (provided by AMD):
> 
> 
> 
> 
> The first core always gets more use. Therefore if the CPU is clocked higher, it might even perform better, with less cores, than a CPU with more cores but a lower clock. It depends on the work the developer has done in order to spread the workload more evenly across several cores.


LoL...
You are totally wrong if you think that DX12 is still bound to IPC or ST perf.
See this is just example how same gamecode effect in DX12 and DX11... but you can code it differently for DX12. Also look at DXruntime and DX driver


----------



## ToTheSun!

Quote:


> Originally Posted by *Themisseble*
> 
> LoL...
> You are totally wrong if you think that DX12 is still bound to IPC or ST perf.


We'd love if you could elaborate.


----------



## sugarhell

Quote:


> Originally Posted by *Themisseble*
> 
> LoL...
> You are totally wrong if you think that DX12 is still bound to IPC or ST perf.


Dx12 still needs a driver thread. And that thread will be IPS bound on the first core. Just not as high as dx11.


----------



## black96ws6

Quote:


> LoL...
> You are totally wrong if you think that DX12 is still bound to IPC or ST perf.


----------



## Themisseble

Quote:


> Originally Posted by *black96ws6*


Sure mate.




which one does better CPU with higher IPC or with more cores?

3.5GHz vs 2GHz makes huge difference in IPC. +75%.
_
Please look at heavy batches (medium setting to eliminate GPU bottleneck)_
*DX12*
*37.8 vs 50.5*
- 33% faster.

*DX11*
*29.2 vs 27.8*
Dual core is about 5% faster

IPC very important on DX12?
*Look at how much boost does pentium G3258 really get from DX11 to DX12. Remarkable 25-30%!! While slow six core gets 80% boost*

PS:


Spoiler: Warning: Spoiler!


----------



## agentx007

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> 
> which one does better CPU with higher IPC or with more cores?
> 
> 3.5GHz vs 2GHz makes huge difference in IPC. +75%.
> _
> Please look at heavy batches (medium setting to eliminate GPU bottleneck)_
> *DX12*
> *37.8 vs 50.5*
> - 33% faster.
> 
> *DX11*
> *29.2 vs 27.8*
> Dual core is about 5% faster
> 
> IPC very important on DX12?
> *Look at how much boost does pentium G3258 really get from DX11 to DX12. Remarkable 25-30%!! While slow six core gets 80% boost*


And look how much Dual-Core gained going to 3,5GHz from 2GHz (in DX12).
37,8FPS vs. 22,0FPS = +71,81%

THAT is ST bound - period.

U can say that if CPU has enough threads, Frequency is less important (for now), BUT if U don't - IPC x Frequncy are King and Queen.

Basic CPU rule : If something is running on CPU it WILL be ST bound at some point - the question isn't "if", it's "how fast".


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> Sure mate.
> 
> 
> 
> 
> which one does better CPU with higher IPC or with more cores?
> 
> 3.5GHz vs 2GHz makes huge difference in IPC. +75%.
> 
> Please look at heavy batches
> *37.8 vs 50.5*
> - 33% faster.
> 
> IPC very important?


To start with,

There's one problem with showcasing those graphs and attempting to deduce CPU IPC vs Parallel performance. Those graphs are bound to nVIDIA Maxwell 2's architectural performance. You're not simply looking at the CPU performance but also the GPU performance (and Driver interventions under DirectX 11). You would be required to eliminate these interventions in order to derive any conclusions. That is only logical.

Secondly you would have to factor in the following:

DirectX 11: Serial
nVIDIA GPU serial
AMD GPU parallel

DirectX 12: Parallel
nVIDIA GPU serial
AMD GPU parallel

Now the problem is you're trying to derive CPU performance conclusions based on the use of nVIDIA GPU serial hardware under either a Serial API, with driver interventions, vs its performance under a Parallel API, absent driver interventions. Evidently the conclusions drawn will be false. The reason being is that you're not taking into account the nVIDIA driver interventions being made under DirectX 11. Basically... nVIDIAs GPU is not processing the same shaders under DirectX 11 that it does under DirectX 12. You would need to look at the AMD GPUs in order to derive any conclusions as to the parallel performance and look at nVIDIA GPUs in order to derive conclusions as to the serial performance. Since both AMD GPUs and nVIDIA GPUs are like comparing apples to oranges... this wouldn't work.

There's a way around this. The Ashes to the Singularity benchmark comes with a CPU Frame rate meter. You could compare the CPU Framerate, using the same GPU and CPU, with various Cores disabled as well as with Hyperthreading enabled and disabled.


----------



## Themisseble

Quote:


> Originally Posted by *agentx007*
> 
> And look how much Dual-Core gained going from to 3,5GHz rom 2GHz (in DX12).
> 47,5FPS vs. 28,3FPS = +67,84%
> 44,4FPS vs. 26,3FPS = +68,82%
> 
> THAT is ST bound - period.
> 
> U can say that if CPU has enough threads, IPC/Frequency is less important (for now), BUT if U don't - IPC x Frequncy are King and Queen.
> 
> Basic CPU rule : If something is running on CPU it WILL be ST bound at some point.
> In another words, the question isn't "if", it's "how fast".


LOL... hahahahahahahahahah

Yet six core CPU with 2GHz beats it.

Yeah I am the stupid one... I proved it that IPC is not most important thing.


----------



## Kuivamaa

Quote:


> Originally Posted by *Mahigan*
> 
> I also think that Ashes of the Singularity is doing something funky on FX CPUs. We should be seeing a large boost when moving from an FX4xxx to an FX8xxx but we are not. Clearly Ashes is not able to make use of several optimizations on AMD CPUs. This could be due to the way the Intel compiler treats AMD CPUs or the way AMD CPUs react to certain optimizations (such as AVX for example). Since Ashes detects AMD FX CPUs as being made up of more logical than physical units, it may be running the code on them in a funky way.


FX octoores are seen as 4c/8t (quad i7 like) units from post patch 7 windows onwards. AMD realized that otherwise the OS would freely load cores of the same module even when there were other modules idle. This is unwanted behavior because it triggers CMT performance penalty for no reason. Dual module CPUs are seen as i3s as well, since windows know relatively well how to properly use logical cores thanks to Intel.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> To start with,
> 
> There's one problem with showcasing those graphs and attempting to deduce CPU IPC vs Parallel performance. Those graphs are bound to nVIDIA Maxwell 2's architectural performance. You're not simply looking at the CPU performance but also the GPU performance (and Driver interventions under DirectX 11). You would be required to eliminate these interventions in order to derive any conclusions. That is only logical.
> 
> Secondly you would have to factor in the following:
> 
> DirectX 11: Serial
> nVIDIA GPU serial
> AMD GPU parallel
> 
> DirectX 12: Parallel
> nVIDIA GPU serial
> AMD GPU parallel
> 
> Now the problem is you're trying to derive CPU performance conclusions based on the use of nVIDIA GPU serial hardware under either a Serial API, with driver interventions, vs its performance under a Parallel API, absent driver interventions. Evidently the conclusions drawn will be false. The reason being is that you're not taking into account the nVIDIA driver interventions being made under DirectX 11. Basically... nVIDIAs GPU is not processing the same shaders under DirectX 11 that it does under DirectX 12. You would need to look at the AMD GPUs in order to derive any conclusions.


WHAT?!!

We have simple question does IPC matter in DX12 as much as in DX11? Is better to have more cores with lower IPC or less with higher?! simple


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> LOL... hahahahahahahahahah
> 
> Yet six core CPU with 2GHz beats it.
> 
> Yeah I am the stupid one... I proved it that IPC is not most important thing.


Being hostile won't help your case.

Might I suggest you learn this lesson first...

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Arthur Conan Doyle (Sherlock Holmes)

You see you did not eliminate the fact that nVIDIAs DX11 driver uses the available CPU cycles in order to perform driver interventions. With a lower clocked CPU, using such driver interventions results in a performance penalty. Since those driver interventions are not being performed under DirectX 12, you're inevitably comparing Apples to Oranges.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> WHAT?!!
> 
> We have simple question does IPC matter in DX12 as much as in DX11? Is better to have more cores with lower IPC or less with higher?! simple


I edited that post...

To start with,

There's one problem with showcasing those graphs and attempting to deduce CPU IPC vs Parallel performance. Those graphs are bound to nVIDIA Maxwell 2's architectural performance. You're not simply looking at the CPU performance but also the GPU performance (and Driver interventions under DirectX 11). You would be required to eliminate these interventions in order to derive any conclusions. That is only logical.

Secondly you would have to factor in the following:

DirectX 11: Serial
nVIDIA GPU serial
AMD GPU parallel

DirectX 12: Parallel
nVIDIA GPU serial
AMD GPU parallel

Now the problem is you're trying to derive CPU performance conclusions based on the use of nVIDIA GPU serial hardware under either a Serial API, with driver interventions, vs its performance under a Parallel API, absent driver interventions. Evidently the conclusions drawn will be false. The reason being is that you're not taking into account the nVIDIA driver interventions being made under DirectX 11. Basically... nVIDIAs GPU is not processing the same shaders under DirectX 11 that it does under DirectX 12. You would need to look at the AMD GPUs in order to derive any conclusions as to the parallel performance and look at nVIDIA GPUs in order to derive conclusions as to the serial performance. Since both AMD GPUs and nVIDIA GPUs are like comparing apples to oranges... this wouldn't work.

There's a way around this. The Ashes to the Singularity benchmark comes with a CPU Frame rate meter. You could compare the CPU Framerate, using the same GPU and CPU, with various Cores disabled as well as with Hyperthreading enabled and disabled at varying clock rates.


----------



## Themisseble

You are wrong about performance.

INTEL multicore is working perfectly whether you use AMD/NVIDIA in DX12. i7 5960X destroys i7 6600K in non-cpu bound scenarios.
- has nothing to do with AMD/NVIDIA gpus
- They ran it on AMD GPU and NVIDIA GPU same resoults every time (Dx12 only) AMD CPUs are scaling poorly.
Problems is in AMD CPU/MB/GAME.. why it does not scale as it should?

You are changing problems and talking nonsense.


----------



## ku4eto

Quote:


> Originally Posted by *Themisseble*
> 
> You are wrong about performance.
> 
> INTEL multicore is working perfectly whether you use AMD/NVIDIA in DX12. i7 5960X destroys i7 6600K in non-cpu bound scenarios.
> - has nothing to do with AMD/NVIDIA gpus
> - They ran it on AMD GPU and NVIDIA GPU same resoults every time (Dx12 only) AMD CPUs are scaling poorly.
> Problems is in AMD CPU/MB/GAME.. why it does not scale as it should?
> 
> You are changing problems and talking nonsense.


Actually you are the one talking problems and going all caps here. His posts are the most logic in this thread.


----------



## Mahigan

Quote:


> Originally Posted by *Kuivamaa*
> 
> FX octoores are seen as 4c/8t (quad i7 like) units from post patch 7 windows onwards. AMD realized that otherwise the OS would freely load cores of the same module even when there were other modules idle. This is unwanted behavior because it triggers CMT performance penalty for no reason. Dual module CPUs are seen as i3s as well, since windows know relatively well how to properly use logical cores thanks to Intel.


Fair enough but that poses one problem. Intel's logical cores (Hyperthreading) are capable of executing AVX/2 code leading to two AVX/2 operations per clock per core. AMDs logical cores are not. If the Ashes of the Singularity were treating both Intel and AMD logical cores in the same manner, and if the Ashes of the Singularity engine were attempting to executive AVX/2 code onto the AMD Logical cores... this would result in poor performance no?


----------



## OneB1t

then why you not loose near half of performance running half of cores?
that AVX theory is nonsense


----------



## Themisseble

Quote:


> Originally Posted by *ku4eto*
> 
> Actually you are the one talking problems and going all caps here. His posts are the most logic in this thread.


And what I am saying?

That AMD CPUs scales poorly?
- He is saying that IPC still matter a lot - which is not true and I have proved it.
- He is saying that they are using INTEL compilers - they said that it runs on same engine as star swarm benchmark. (anyway he could be right)
- What else?

@Oneb1t can you run AotS on Mantle?


----------



## agentx007

Quote:


> Originally Posted by *OneB1t*
> 
> then why you not loose near half of performance running half of cores?
> that AVX theory is nonsense


Because having more HP under the hood is meaningless if U can't tranfer that power to actual road.
DX12 can better utilise Single Thread than DX11. It's seen in two core enabled scores.
Also, GPU isn't fast enough to give U perfect scaling.


----------



## mtcn77

Quote:


> Originally Posted by *Themisseble*
> 
> And what I am saying?
> 
> That AMD CPUs scales poorly?
> *- He is saying that IPC still matter a lot - which is not true and I have proved it.*
> - He is saying that they are using INTEL compilers - they said that it runs on same engine as star swarm benchmark. (anyway he could be right)
> - What else?


On an Intel chip, not an AMD one. Get your facts straight, imo.


----------



## Themisseble

Quote:


> Originally Posted by *mtcn77*
> 
> On an Intel chip, not an AMD one. Get your facts straight, imo.


Hmm I thought you saw benchmarks of OneB1t

FX 8350 @2.0GHz = FX 4300 @4.6GHZ

OneB1T can you do @FX4300 2.0GHz same benchmark.


----------



## CrazyElf

I like the fact that they are using AVX2. Often it takes years for the latest instruction sets to proliferate into games and other software. Good move on Oxide's part if this is true.

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Great info guys, thanks.
> 
> I was set on the 980 Ti, but I'm starting to want a Fury lol. Would be a Fury X if it weren't for the damn CLC..
> 
> I thought GPU memory won't stack unless devs specifically code for it? If thats the case then good luck
> 
> 
> 
> 
> 
> 
> 
> 
> Market share would play a big factor IF the consoles weren't both using GCN too.
> 
> EDIT:
> 
> Mahigan beat me to it.


Actually, I'm not too sure the Fury X is that good either as an upgrade. Perhaps Mahigan can weigh in on this one - is the Fury X worth buying over the 290X?

Let's consider the implications of Mahigan's analysis here:

Fury X is Raster limited, not ROP or bandwidth or any other bottleneck.
Fury X also only has 4GB of VRAM, an HBM1 limitation.
These 2 are not upgrades over the 290X.
Remember this chart from TechReport?


Fury X is not an upgrade at all in terms of Rasterization and of course, has only 4Gb of VRAM. You do see better benchmarks at 4k, which I imagine can take advantage of the extra bandwidth and shaders, but otherwise it's not an upgrade at all.

The reason why is due to the way the architectures were made (Hawaii vs Fiji):

 

Big changes were:

GDDR5 got upgraded to HBM1
Addition of more shaders from 44 to 64 clusters (2816 to 4096) - and the shaders got improved to GCN 1.2, which probably improved efficiency somewhat
Larger cache, for improved color compression
What stayed the same:

There's still 8 ACEs and basically the front end remained unchanged
Still 64 ROPs, although the capabilities of the ROPs are believed to have been upgraded
Still 4GB of VRAM (compared to 290X, although of course there's a lot more bandwidth now)
Most importantly we think, the rasterizers remained the same, which is why the Fury X has less rasterization than a 780Ti
So if rasterization remains the big bottleneck in the future, then the 290(X) or the 390(X) may actually be the best value. Plus if DX12 really does improve multi-GPU configurations, it means that CF scaling could improve.

The real question then is how fast DX12 gets into developer's hands.

If it's rapid, then all cards older than Pascal are going to age very poorly
If it's slow then Nvidia's cards will remain relevant for longer
If the 4GB of VRAM limitation becomes an issue in the future, then AMD's Fury X will also not age too well as well;
Same idea if rasterization is important, the Fury X won't age too well, as it was not upgraded compared to the 290X

In other words, neither side's GPUs may age well. The best advice may be to hold onto what you have and wait for next year's GPUs. Barring that, a 290(X) or 390(X) is your best bet.

Quote:


> Originally Posted by *Mahigan*
> 
> In fact I think that Batman Arkham Knight, the game that was pulled because it performed poorly on the PC once it was ported over from the PS4, may have performed as such because it made use of Asynchronous Shading over the DirectX 11 API on the PC. Something tells me that a Parallel shading technique, over a Serial API, would spell disaster performance wise. It is just a hunch but I believe that this was the cause of its poor performance on the PC.
> 
> I wouldn't be surprised to see Batman Arkham Knight re-released as a DirectX 12 title if that was indeed the case.


Shouldn't it be easier to port games over with the new consoles, since they are all GCN GPUs?

The PS4 GPU is more or less, an HD 7870, and the Xbox One, an HD 7790. More or less.

The CPU is basically an underclocked AMD Jaguar. I suppose this means that future console ports may not be too CPU dependent.

http://www.extremetech.com/extreme/171375-reverse-engineered-ps4-apu-reveals-the-consoles-real-cpu-and-gpu-specs
http://www.extremetech.com/gaming/162612-ps4-vs-xbox-one-performance-compared-using-representative-pc-hardware

I do have a question, perhaps Mahigan can answer this one. It's the fact that there are 8 Jaguar cores that makes me wonder though if in the future, the HEDT (8 core CPU setups) may be worth the money? I suspect that the mainstream 8 threaded/4 core CPUs too will age better than the 4 core/4 thread CPUs. Now granted, the Jaguars are very weak cores and an overclocked Haswell-E at 4.4-4.6 GHz would run circles around them, but the fact that there are so many cores makes me wonder if more games will be more multithreaded.

Quote:


> Originally Posted by *Mahigan*
> 
> Maybe that's because Asynchronous Shading is a DirectX 12 feature (the most important one mind you) whereas GameWorks is nVIDIAs technological property. All DirectX 12 Graphics cards are supposed to incorporate the capacity for Asynchronous Shading (Open Standard). All DirectX 12 Graphics Cards (or DirectX 9,10,11) have no such requirement to incorporate GameWorks... in fact it is the opposite. nVIDIA keeps the source code, for GameWorks, secret.
> 
> GameWorks is a method to boost nVIDIAs Graphics cards performance over that of its competitors. Asynchronous Shading is just something all DirectX 12 cards are supposed to be able to handle.
> 
> That's why.
> 
> And I say that as an objective individual. I don't play the partisan game.


I would agree with your assessment here, and I must commend you for not getting too partisan. There does seem to be a lot of unhappy fanboyism. I mean, AMD fans have done this in the past, but it seems to be worse right now because it shows Nvidia in a less favorable light. That is not to say Nvidia doesn't have advantages, such as tesselation, rasterization, and in terms of efficiently managing memory bandwidth, but they are worse at this specific area right now.

These benchmarks are not designed to favor AMD. Nvidia has the money to finance the R&D and they have had access to the source code (judging by Oxide game's blog post for over a year).

See here: http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/
Quote:


> Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.
> 
> ...
> 
> Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone's machine, regardless of what hardware our players have.
> 
> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.
> 
> We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn't move the engine architecture backward (that is, we are not jeopardizing the future for the present).


I think though even with this, I expect that there will be the Backfire effect.

Given the amount of money Nvidia has for R&D, I am expecting Pascal to be very parallel oriented then, correcting this deficiency. All told, combined with Parallelism for DX12, HBM2, the new 16nm process, and perhaps NVLink, I think we will see the biggest leap in performance from Nvidia since the 8800 GTX. But if you own a Maxwell card, I think that your will be thrown under the bus here, as driver optimizations may not be possible (we'll see I guess).

For AMD, I'm interested in seeing what would be the best use of their transistor budget for 16nm?

Obviously the rasterizer needs a huge upgrade
Will we see any gains if say, more ACEs and ROPs were added (now that it has been brought to light that these are likely not the big bottleneck)?
What would be the best use of the remaining transistors? A modest bump in shaders?
How much cache is needed?
Probably we'll see something comparable to the 7970 from AMD then?


----------



## OneB1t

Quote:


> Originally Posted by *Themisseble*
> 
> Hmm I thought you saw benchmarks of OneB1t
> 
> FX 8350 @2.0GHz = FX 4300 @4.6GHZ
> 
> OneB1T can you do @FX4300 2.0GHz same benchmark.


yep







just let me restart so i can set 4 cores


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> then why you not loose near half of performance running half of cores?
> that AVX theory is nonsense


I think this depends greatly on the amount of cores (and logical cores) the Ashes of the Singularity engine utilizes and on which cores it executes the code (how many cores are used for Physics and AI for example). If only two cores are used to execute the AVX/2 code then you would be keeping two cores very busy. If you disabled two modules you would incur a small performance hit, nothing enormous. This of course would depend on how well your Logical cores would be performing when processing batches (Draw Calls). Say Ashes of the Singularity also only makes use of 6 cores total (in any efficient way)...

You have 4 Physical Cores and 8 Logical cores.
You're executing the AVX code on 2 cores, you have 2 Physical cores which are free and 8 logical cores free but only need 4 logical cores for the batches.

Say you have 2 Physical Cores and 4 Logical Cores.
You're executing the AVX code on 2 cores, you have 0 Physical cores which are free and 4 logical cores free but only need 4 logical cores for the batches.

Logical cores, as Hyperthreading on the i3 4330 has shown, are excellent at processing batches, AI and Physics on Intel's architecture (not so great at processing AI and Physics on AMDs architecture)

That's just a theory though. Several pieces of the puzzle are missing such as 1. How threaded is the Ashes of the Singularity code? 2. What optimizations is it using? I think that the PcPer test (6700K vs 5960x) shows that more cores does not necessarily mean more performance. This means that Ashes of the Singularity may not be using each core in an egalitarian manner. This is in keeping with the AMD Graphs I posted above but I'll post them here as well:


----------



## Themisseble

http://www.tomshardware.com/reviews/ivy-bridge-wolfdale-yorkfield-comparison,3487-14.html
http://gamegpu.ru/rts-/-strategii/starcraft-ii-heart-of-the-swarm-test-gpu.html

starcraft is very CPU intensive optimized for AVX.

batch submission
http://egmr.net/2015/02/direct-x-12-starstorm-benchmark-increases-amd-apus-relevance/


----------



## OneB1t

[email protected] 9 [email protected] (1050/1325) same as other tests

==Sub Mark Heavy Batch ==================================
Total Time: 57.940823
Avg Framerate : 60.104591 ms (16.637665 FPS)
Weighted Framerate : 60.428894 ms (16.548374 FPS)
CPU frame rate (estimated framerate if not GPU bound): 60.072514 ms (16.646547 FPS)
Percent GPU Bound: 0.000000%
Driver throughput (Batches per ms): 1677.447998
Average Batches per frame: 33859.664063

so
[email protected] - 16.6fps
[email protected] - 23.2fps
[email protected] - 26.4fps
[email protected] - 33.1fps

exactly same settings but average batches differs slighty about 5-10% each run

conclusion
[email protected] have 4x times more raw performance than [email protected] but only 2x increase in real world performance question is why this happen?


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> [email protected] 9 [email protected] (1050/1325) same as other tests
> 
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.940823
> Avg Framerate : 60.104591 ms (16.637665 FPS)
> Weighted Framerate : 60.428894 ms (16.548374 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 60.072514 ms (16.646547 FPS)
> Percent GPU Bound: 0.000000%
> Driver throughput (Batches per ms): 1677.447998
> Average Batches per frame: 33859.664063
> 
> so
> [email protected] - 16.6fps
> [email protected] - 23.2fps
> [email protected] - 26.4fps
> [email protected] - 33.1fps
> 
> exactly same settings but average batches differs slighty about 5-10% each run
> 
> conclusion
> [email protected] have 4x times more raw performance than [email protected] but only 2x increase in real world performance question is why this happen?


send this to AotS support.

You memory?


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> http://www.tomshardware.com/reviews/ivy-bridge-wolfdale-yorkfield-comparison,3487-14.html
> http://gamegpu.ru/rts-/-strategii/starcraft-ii-heart-of-the-swarm-test-gpu.html
> 
> starcraft is very CPU intensive optimized for AVX.
> 
> batch submission
> http://egmr.net/2015/02/direct-x-12-starstorm-benchmark-increases-amd-apus-relevance/


Batch submission is one thing, AI an Physics is another. The Star Swarm test contained very little AI and Physics. It contained a series of Spacecrafts flying around and shooting. It was intended to be a a Demo for the Nitrous engine mean't to highlight the Nitrous engine capability of rendering many simultaneous objects onto the screen.

Ashes of the Singularity uses the Nitrous engine, but it is a game. It contains AI and Physics.

But again we cannot draw any conclusions until we have more information about Ashes of the Singularity as it pertains to CPU Optimizations and how many threads it utilizes.


----------



## Kuivamaa

Quote:


> Originally Posted by *Mahigan*
> 
> You have 4 Physical Cores and 8 Logical cores.
> You're executing the AVX code on 2 cores, you have 2 Physical cores which are free and 8 logical cores free but only need 4 logical cores for the batches.
> 
> Say you have 2 Physical Cores and 4 Logical Cores.
> You're executing the AVX code on 2 cores, you have 0 Physical cores which are free and 4 logical cores free but only need 4 logical cores for the batches.


I haven't seen the benchmark itself so I have no idea what kind of cores it sees. But in OS terms, a quad core that has 8 logical cores, like i7 920/2600k/3770k/4790k etc doesn't have 12 threads. It has 4 physical cores that through hyperthreading can be seen as 8 logical cores by the OS. For the 8 logical cores to be totally free for use,a quad CPU needs all its physical cores to be free as well.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> The first CPU core always gets the most usage. This is the same in DX11 as it is in DX12. If you have a Quad Core (with HT) clocked at 4GHz vs a Six Core (with HT) clocked at 3GHz you will most likely, unless the game is both incredibly demanding and incredibly well threaded, gain more performance from the Quad Core.


but if this graphics engine scaled up to with 6 cores there should be a gain with the extra 2 cores unless they see HT threads as cores
Quote:


> Originally Posted by *Themisseble*
> 
> LoL...
> You are totally wrong if you think that DX12 is still bound to IPC or ST perf.
> See this is just example how same gamecode effect in DX12 and DX11... but you can code it differently for DX12. Also look at DXruntime and DX driver


still if for example someone decided to port a singlethreaded game(due ot engine limitation of multthreading) then the game would scale better with more performance per core..


----------



## OneB1t

Quote:


> Originally Posted by *Themisseble*
> 
> You memory?


1333mhz cl9 4x2GB
its slow i dont think that going to 2400mhz will help it


----------



## Mahigan

Quote:


> Originally Posted by *Kuivamaa*
> 
> I haven't seen the benchmark itself so I have no idea what kind of cores it sees. But in OS terms, a quad core that has 8 logical cores, like i7 920/2600k/3770k/4790k etc doesn't have 12 threads. It has 4 physical cores that through hyperthreading can be seen as 8 logical cores by the OS. For the 8 logical cores to be totally free for use,a quad CPU needs all its physical cores to be free as well.


Bah yes.

I boondoggled that one. 4 Physical cores and 4 Logical cores for a total of 8 Logical cores. I was relying on the Ashes of the Singularity screenshot when I wrote that.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> but if this graphics engines scaled up to wiith 6 cores there should be a gain with the extra 2 cores unless they see HT threads as cores


Exactly.

And under AVX/2, HT threads can execute AVX/2 code.

So a Core i3 4330 (2 Physical and 2 Logical), under AVX/2 can output 4 Operations/cycle. That's what made me think of AVX/2 in the first place when I saw the performance the Core i3 was able to muster in Ashes of the Singularity. I would have thought that it would suffer greatly, moreso than say an AMD FX 8350.


----------



## OneB1t

i really want to see i7-5960X in this test with such settings

8cores enabled HT ON
8 cores enabled HT OFF
4 cores enabled HT ON
4 cores enabled HT OFF

then compare results from 8 core vs 4 core with HT OFF and see if there is also such low increase in performance as on FX series


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Batch submission is one thing, AI an Physics is another. The Star Swarm test contained very little AI and Physics. It contained a series of Spacecrafts flying around and shooting. It was intended to be a a Demo for the Nitrous engine mean't to highlight the Nitrous engine capability of rendering many simultaneous objects onto the screen.
> 
> Ashes of the Singularity uses the Nitrous engine, but it is a game. It contains AI and Physics.
> 
> But again we cannot draw any conclusions until we have more information about Ashes of the Singularity as it pertains to CPU Optimizations and how many threads it utilizes.




athlon x4 860K beating FX 8350 or FX 4300 4.6Ghz


----------



## Kuivamaa

Quote:


> Originally Posted by *OneB1t*
> 
> [email protected] 9 [email protected] (1050/1325) same as other tests
> 
> ==Sub Mark Heavy Batch ==================================
> Total Time: 57.940823
> Avg Framerate : 60.104591 ms (16.637665 FPS)
> Weighted Framerate : 60.428894 ms (16.548374 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 60.072514 ms (16.646547 FPS)
> Percent GPU Bound: 0.000000%
> Driver throughput (Batches per ms): 1677.447998
> Average Batches per frame: 33859.664063
> 
> so
> [email protected] - 16.6fps
> [email protected] - 23.2fps
> [email protected] - 26.4fps
> [email protected] - 33.1fps
> 
> exactly same settings but average batches differs slighty about 5-10% each run
> 
> conclusion
> [email protected] have 4x times more raw performance than [email protected] but only 2x increase in real world performance question is why this happen?


Could you rerun the bench with core affinity set so it uses only cores 0,2,4,6?


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> 1333mhz cl9 4x2GB
> its slow i dont think that going to 2400mhz will help it


try it out.


----------



## OneB1t

yes i can but dunno if its 0 2 4 6 or 0 1 2 3


----------



## Themisseble

Quote:


> Originally Posted by *PontiacGTX*
> 
> but if this graphics engines scaled up to wiith 6 cores there should be a gain with the extra 2 cores unless they see HT threads as cores




16 threads at 100% load.


----------



## Kuivamaa

Quote:


> Originally Posted by *OneB1t*
> 
> yes i can but dunno if its 0 2 4 6 or 0 1 2 3


No, FX cores are arranged in pairs, 0-1 is module one,2-3 is module 2 and so on. You want to be using only 1 core from each module and nothing else for this test so we can see how it scales. Expected result is a number between your simulated 4300 scores and 8300 scores. If it is lower than 4300 or higher than 8300, there is something wrong with the scheduler or the bench itself.


----------



## PontiacGTX

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> 16 threads at 100% load.


they can be loaded with something but are they really being used with the basic AI,Physics,shader,etc in game or just is overhead with low resolution on cpu bound?.

@OneB1t could you test how much is being used on the FX with 4 and 8 cores with and without OC?


----------



## OneB1t

ok so test with affinity goes badly










==Sub Mark Heavy Batch ==================================
Total Time: 54.470570
Avg Framerate : 52.375549 ms (19.092878 FPS)
Weighted Framerate : 211.130981 ms (4.736396 FPS)
CPU frame rate (estimated framerate if not GPU bound): 49.208702 ms (20.321609 FPS)
Percent GPU Bound: 32.072826%
Driver throughput (Batches per ms): 3863.894531
Average Batches per frame: 33876.628906

game itself freezes every 2-3sec for 1-2sec

i think that game spawns 8 threads (checked in process hacker) which then run on only 4 cores and it creates massive lags


----------



## mtcn77

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> athlon x4 860K beating FX 8350 or FX 4300 4.6Ghz


Steamroller vs. Piledriver architecture. :/


----------



## GorillaSceptre

Quote:


> Originally Posted by *CrazyElf*
> 
> Actually, I'm not too sure the Fury X is that good either as an upgrade. Perhaps Mahigan can weigh in on this one - is the Fury X worth buying over the 290X?
> 
> Fury X is not an upgrade at all in terms of Rasterization and of course, has only 4Gb of VRAM. You do see better benchmarks at 4k, which I imagine can take advantage of the extra bandwidth and shaders, but otherwise it's not an upgrade at all.


There's still DX11 games that i will play, and on that front, Fury trounces the 290/X.

If i had a 290X i wouldn't upgrade, but i'm stuck with a 570 so i have no choice. Buying a 290X near the end of it's life doesn't seem like a great choice. Whereas if i got a Fury/X it should last me until the big die GPU's arrive, which will probably drop late 2016 or early 2017.


----------



## OneB1t

also phenom x6 better than my FX-8xxx


----------



## PontiacGTX

Quote:


> Originally Posted by *GorillaSceptre*
> 
> There's still DX11 games that i will play, and on that front, Fury trounces the 290/X.
> 
> If i had a 290X i wouldn't upgrade, but i'm stuck with a 570 so i have no choice. Buying a 290X near the end of it's life doesn't seem like a great choice. Whereas if i got a Fury/X it should last me until the big die GPU's arrive, which will probably drop late 2016 or early 2017.


an used 290 can be found at 200usd or less. meanwhile a fury is 550usd ...


----------



## Kuivamaa

Quote:


> Originally Posted by *OneB1t*
> 
> ok so test with affinity goes badly
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ==Sub Mark Heavy Batch ==================================
> Total Time: 54.470570
> Avg Framerate : 52.375549 ms (19.092878 FPS)
> Weighted Framerate : 211.130981 ms (4.736396 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 49.208702 ms (20.321609 FPS)
> Percent GPU Bound: 32.072826%
> Driver throughput (Batches per ms): 3863.894531
> Average Batches per frame: 33876.628906
> 
> game itself freezes every 2-3sec for 1-2sec
> 
> i think that game spawns 8 threads (checked in process hacker) which then run on only 4 cores and it creates massive lags


This is peculiar. It sounds as if the bench has certain CPU profiles hardcoded - it detects the the FX-8 and tries to use 8 cores ,the OS gives four and they conflict trying to override each other or something. What happens when you set affinity at 7 cores?








Quote:


> Originally Posted by *OneB1t*
> 
> also phenom x6 better than my FX-8xxx


Ok, then there is no AVX optimization or what have you. Something is off.

Edit: Why is your resolution changing between tests? Kinda invalidates the results.


----------



## Themisseble

WOW
Quote:


> Originally Posted by *OneB1t*
> 
> also phenom x6 better than my FX-8xxx


WOW
Phenom on page with i5 4670K! lol


----------



## OneB1t

that result is with 5ghz overclock







i can achieve same thing around 4.6ghz when i have lucky run









i dont change resolution its same for all runs

only difference for last run is that i runned in windowed mode so i can set affinity (benchmark crashes with alt+tab)


----------



## Themisseble

OneB1T
http://s608.photobucket.com/user/TheFlame1/media/test11_zpsv1qetzq3.png.html


----------



## GorillaSceptre

Quote:


> Originally Posted by *PontiacGTX*
> 
> an used 290 can be found at 200usd or less. meanwhile a fury is 550usd ...


So? I'm not going for value. I want something that will last.


----------



## colorfuel

Is there a major difference in Results between versions?

In v 0.49.11978 I get an average of 33,2 FPS on the same settings as PCGH uses and on default gpu clocks (1020/1350, 290X, [email protected]).

But If I look at their results:

http://www.pcgameshardware.de/Ashes-of-the-Singularity-Spiel-55338/Specials/Benchmark-DirectX-12-DirectX-11-1167997/

They use the 0.49.11820 version though, so that could be it?

I wouldnt dare dream that my 290X runs as fast as their Fury X, maybe I just oversought something.

Output_15_08_23_1804.txt 317k .txt file


----------



## Kuivamaa

Quote:


> Originally Posted by *colorfuel*
> 
> Is there a major difference in Results between versions?
> 
> In v 0.49.11978 I get an average of 33,2 FPS on the same settings as PCGH uses and on default gpu clocks (1020/1350, 290X, [email protected]).
> 
> But If I look at their results:
> 
> http://www.pcgameshardware.de/Ashes-of-the-Singularity-Spiel-55338/Specials/Benchmark-DirectX-12-DirectX-11-1167997/
> 
> They use the 0.49.11820 version though, so that could be it?
> 
> I wouldnt dare dream that my 290X runs as fast as their Fury X, maybe I just oversought something.
> 
> Output_15_08_23_1804.txt 317k .txt file


There may or may not be a difference. But you should only compare same version results.


----------



## PontiacGTX

Quote:


> Originally Posted by *GorillaSceptre*
> 
> So? I'm not going for value. I want something that will last.


the fury( x) wont last as long as you would think once you see the performance on gpu for 2016. the 290 is a placeholder


----------



## OneB1t

Quote:


> Originally Posted by *Themisseble*
> 
> OneB1T
> http://s608.photobucket.com/user/TheFlame1/media/test11_zpsv1qetzq3.png.html


thats MSAA 4x with 5ghz FX-95xx so not really comparable

only 1 fps better in cpu performance








http://postimg.org/image/wewlb335z/full/

if you increase MSAA in this benchmark than all CPU will score same average FPS as it became GPU bounded


----------



## Kuivamaa

Quote:


> Originally Posted by *OneB1t*
> 
> thats MSAA 4x with 5ghz FX-95xx so not really comparable
> 
> only 1 fps better in cpu performance
> 
> 
> 
> 
> 
> 
> 
> 
> http://postimg.org/image/wewlb335z/full/
> 
> if you increase MSAA in this benchmark than all CPU will score same average FPS as it became GPU bounded


9590 is a 4.7GHz CPU, unless this particular one is overclocked and I missed it?


----------



## OneB1t

its overclocked this image is from overclockers.co.uk


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> 16 threads at 100% load.


Then, evidently, it is a matter of IPC.

Intel cores do more work per clock. If the engine maximizes 16 cores then what we have left is IPC. This explains Intel's performance. Does nothing to explain the AMD performance deficits when scaling the FX series.

More Cores, which do more work per clock, would yield more performance under DirectX 12. We therefore cannot compare DirectX 11 as it does not maximize the available CPU resources.


----------



## OneB1t

if it was IPC bound then you will see much better scaling from 2.0ghz to 4.6ghz... (results must double)


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> if it was IPC bound then you will see much better scaling from 2.0ghz to 4.6ghz... (results must double)


It explains Intel's performance scaling. As it pertains to AMD. We do not have enough information in order to deduce the cause of the performance deficits.


----------



## OneB1t

IPC is going up linear with frequency even for AMD cpus...
also its not caused by memory bandwidth as phenom x6 have worse memory subsystem score than FX-xxxx but still pulls ahead in overall score


----------



## agentx007

Quote:


> Originally Posted by *OneB1t*
> 
> if it was IPC bound then you will see much better scaling from 2.0ghz to 4.6ghz... (results must double)


Well, results will double if it isn't bottlenecked somewhere else (ie. cache level/IMC/RAM) - classic Pentium 4/Pentium D problems.

@up Since when IPC *must go up* when Frequncy is going up ?
Performance = IPC x Frequency.
They are independent from one another.


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> IPC is going up linear with frequency even for AMD cpus...
> also its not caused by memory bandwidth as phenom x6 have worse memory subsystem score than FX-xxxx but still pulls ahead in overall score


This is what is occurring on Intel CPUs. This is not what is occurring on AMD CPUs. Therefore something is amiss as it pertains to AMD CPUs. What is amiss requires more information. Absent such information we cannot deduce a cause for the AMD FX performance deficits. It is thus quite futile to argue over it.

We must wait for a response from Oxide.


----------



## OneB1t

yes there can be some cache bottleneck thats plausible


----------



## black96ws6

ARS Techinica showed some very interesting results:

*4 cores (no HT):*
http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0021.png

*6 cores:*
http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0011.png

The results between the 2 are barely different? Basically the same? For both AMD and Nvidia?


----------



## OneB1t

because i5-4xxx is enought to feed even 980TI or fury X even with only 4 threads
thats why there is not much difference between 4 and 6 core as cards are fully saturated even with 4 core system


----------



## mtcn77

Quote:


> Originally Posted by *black96ws6*
> 
> ARS Techinica showed some very interesting results:
> 
> *4 cores (no HT):*
> http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0021.png
> 
> *6 cores:*
> http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0011.png
> 
> The results between the 2 are barely different? Basically the same? For both AMD and Nvidia?


The 2160p result is gpu bound in 980Ti's case.


----------



## Mahigan

Quote:


> Originally Posted by *black96ws6*
> 
> ARS Techinica showed some very interesting results:
> 
> *4 cores (no HT):*
> http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0021.png
> 
> *6 cores:*
> http://cdn.arstechnica.net/wp-content/uploads/sites/3/2015/08/Review-chart-template-final-full-width-3.0011.png
> 
> The results between the 2 are barely different? Basically the same? For both AMD and Nvidia?


Yes, they mention that the CPU frame rate was much higher but the GPUs were taxed to their limits. Their results show a GPU bottleneck.

mtcn77 beat me to it lol

Once they add Multi-Adapter functionality... things should get more interesting.


----------



## OneB1t

its all GPU bound under DX12 nothing interesting about CPU performance with these results


----------



## PontiacGTX

Quote:


> Originally Posted by *OneB1t*
> 
> because i5-4xxx is enought to feed even 980TI or fury X even with only 4 threads
> thats why there is not much difference between 4 and 6 core as cards are fully saturated even with 4 core system


only that the 5930k has way more cache than an i5..


----------



## OneB1t

not really matters as both i5 and cripled i7-5960X have enought performance to not limit GPU


----------



## GorillaSceptre

Quote:


> Originally Posted by *PontiacGTX*
> 
> the fury( x) wont last as long as you would think once you see the performance on gpu for 2016. the 290 is a placeholder


I think a lot of people are massively over estimating what next years GPU's will be. Even if AMD/NV overcome all the problems with the new process (Which is obviously going to cost a lot of money) It would be terrible business to release products that are twice as fast as a 980 Ti. Not to mention they will be the small dye parts.

I bet they will be a bit better than what the 980 was to the 780 Ti.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> I think a lot of people are massively over estimating what next years GPU's will be. Even if AMD/NV overcome all the problems with the new process (Which is obviously going to cost a lot of money) It would be terrible business to release products that are twice as fast as a 980 Ti. Not to mention they will be the small dye parts.
> 
> I bet they will be a bit better than what the 980 was to the 780 Ti.


I think Pascal will show some nice DirectX 11 performance improvements, nothing too impressive, but I am banking on some very impressive DirectX 12 performance.


----------



## garwynn

It's also extremely important to note that AMD probably had a leg up on DX12 for nearly 3 or more years.
Most people forget that DX11 was custom modified for the Xbox One - often referred to as DX11.x
Those changes never made it back into mainline, and that was probably intentional by Microsoft.
After all, if they make it in there, what advantage does their console offer versus PC gaming?

Start speculative:

AMD also very likely had knowledge of the changes MS made to tweak and get every ounce of performance out of the hardware.
They can't get MS to do that but they know they have something here that would bring them closer to NV.
Solution? Take what they can and make Mantle.
Had DX12 not been announced I think they would have had no choice but to heavily invest in proprietary code to maximize Mantle.
But neither MS or AMD win in that scenario and they _are_ partners in the Xbox One after all.

Would explain why DX12 announcement seemed so sudden and why Mantle development nearly halted immediately after the announcement.
AMD would have gotten the word out and probably given a guide on coding adjustments for DX12.
This would have made DX12 the best Christmas present AMD got last year - and the one that keeps on giving.


----------



## OneB1t

i also found another interesting thing about FX vs i5 performance

this is [email protected]

Code:



Code:


==Sub Mark Normal Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 13.358088 ms (74.861015 FPS) 100%

==Sub Mark Medium Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 13.826407 ms (72.325371 FPS) 96,6%

==Sub Mark Heavy Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 21.193769 ms (47.183681 FPS) 63%

my [email protected]

Code:



Code:


==Sub Mark Normal Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 20.296707 ms (49.269073 FPS) 100%

==Sub Mark Medium Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 21.904108 ms (45.653538 FPS) 92,6%

==Sub Mark Heavy Batch ================================== 
CPU frame rate (estimated framerate if not GPU bound): 26.364561 ms (37.929703 FPS) 76,9%

as you can see when draw calls increase i5 became relatively much slower (63% of performance with low load vs 77% of performance with low load)

i think that if benchmark became even more complicated than FX is going to match i5

this can be seen in star swarm as batch count in star swarm is around 100 000 for RTS test and only 35 000 for ashes of singularity heavy batches


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> I think Pascal will show some nice DirectX 11 performance improvements, nothing too impressive, but I am banking on some very impressive DirectX 12 performance.


Isn't Pascal essentially 16nm Maxwell?


----------



## Mahigan

Quote:


> Originally Posted by *garwynn*
> 
> It's also extremely important to note that AMD probably had a leg up on DX12 for nearly 3 or more years.
> Most people forget that DX11 was custom modified for the Xbox One - often referred to as DX11.x
> Those changes never made it back into mainline, and that was probably intentional by Microsoft.
> After all, if they make it in there, what advantage does their console offer versus PC gaming?
> 
> Start speculative:
> 
> AMD also very likely had knowledge of the changes MS made to tweak and get every ounce of performance out of the hardware.
> They can't get MS to do that but they know they have something here that would bring them closer to NV.
> Solution? Take what they can and make Mantle.
> Had DX12 not been announced I think they would have had no choice but to heavily invest in proprietary code to maximize Mantle.
> But neither MS or AMD win in that scenario and they _are_ partners in the Xbox One after all.
> 
> Would explain why DX12 announcement seemed so sudden and why Mantle development nearly halted immediately after the announcement.
> AMD would have gotten the word out and probably given a guide on coding adjustments for DX12.
> This would have made DX12 the best Christmas present AMD got last year - and the one that keeps on giving.


Well...


----------



## sugarhell

Quote:


> Originally Posted by *Mahigan*
> 
> Well...


Well we have a better fact.

http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf

Page 92

Anyone that knows from APIs he will understand ;_;


----------



## ZealotKi11er

Quote:


> Originally Posted by *OneB1t*
> 
> i also found another interesting thing about FX vs i5 performance
> 
> this is [email protected]
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> ==Sub Mark Normal Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 13.358088 ms (74.861015 FPS) 100%
> 
> ==Sub Mark Medium Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 13.826407 ms (72.325371 FPS) 96,6%
> 
> ==Sub Mark Heavy Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 21.193769 ms (47.183681 FPS) 63%
> 
> my [email protected]
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> ==Sub Mark Normal Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 20.296707 ms (49.269073 FPS) 100%
> 
> ==Sub Mark Medium Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 21.904108 ms (45.653538 FPS) 92,6%
> 
> ==Sub Mark Heavy Batch ==================================
> CPU frame rate (estimated framerate if not GPU bound): 26.364561 ms (37.929703 FPS) 76,9%
> 
> as you can see when draw calls increase i5 became relatively much slower (63% of performance with low load vs 77% of performance with low load)
> 
> i think that if benchmark became even more complicated than FX is going to match i5
> 
> this can be seen in star swarm as batch count in star swarm is around 100 000 for RTS test and only 35 000 for ashes of singularity heavy batches


Yours is relative to 50 fps while the i5 is relative to 75 fps. Core i5 will always be faster. If you had a faster GPU you would have seen a big higher on Normal, same on Medium and same on Heavy but % would be much lower.


----------



## OneB1t

in fact i think under really heavy workload (100 000 batches) FX will pull ahead of i5
later i will try to modify test sequence to show just single map with huge fight zoomed out

see this
[email protected]

==Shot long shot 3 ==================================
Total Time: 5.008700
Avg Framerate : 32.314194 ms (30.946156 FPS)
Weighted Framerate : 32.396713 ms (30.867331 FPS)
CPU frame rate (estimated framerate if not GPU bound): 24.420563 ms (40.949097 FPS)
Percent GPU Bound: 99.271950%
Driver throughput (Batches per ms): 3960.520752
Average Batches per frame: 47504.597656

==Shot high vista ==================================
Total Time: 4.970572
Avg Framerate : 35.759510 ms (27.964590 FPS)
Weighted Framerate : 35.872990 ms (27.876127 FPS)
CPU frame rate (estimated framerate if not GPU bound): 26.463301 ms (37.788181 FPS)
Percent GPU Bound: 99.365089%
Driver throughput (Batches per ms): 3962.454102
Average Batches per frame: 50500.582031

[email protected]

==Shot long shot 3 ==================================
Total Time: 4.985974
Avg Framerate : 33.462917 ms (29.883827 FPS)
Weighted Framerate : 33.534695 ms (29.819862 FPS)
CPU frame rate (estimated framerate if not GPU bound): 29.043478 ms (34.431137 FPS)
Percent GPU Bound: 93.482292%
Driver throughput (Batches per ms): 5927.758789
Average Batches per frame: 47757.062500

==Shot high vista ==================================
Total Time: 4.998882
Avg Framerate : 33.776230 ms (29.606619 FPS)
Weighted Framerate : 33.864346 ms (29.529583 FPS)
CPU frame rate (estimated framerate if not GPU bound): 28.254892 ms (35.392101 FPS)
Percent GPU Bound: 98.633514%
Driver throughput (Batches per ms): 6525.858398
Average Batches per frame: 49477.691406

see the batches per ms figure 4k for i5 vs 6k for FX


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Isn't Pascal essentially 16nm Maxwell?


We don't know enough information yet but something tells me that the inclusion of fp16 mixed precision hints at some big changes in terms of the compute capabilities of Pascal. Sure, fp16 is useless for gaming but the fact that Pascal comes with new compute capabilities hints at some rather large architectural changes.
Quote:


> Originally Posted by *sugarhell*
> 
> Well we have a better fact.
> 
> http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
> 
> Page 92
> 
> Anyone that knows from APIs he will understand ;_;


LOL
Quote:


> All DirectX 12 draw calls use the same Mantle topology


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> We don't know enough information yet but something tells me that tien inclusion of fp16 mixed precision hints at some big changes in terms of the compute capabilities of Pascal. Sure, fp16 is useless for gaming but the fact that Pascal comes with new compute capabilities hints at some rather large architectural changes.


What do you do, if you don't mind me asking?


----------



## Kuivamaa

Quote:


> Originally Posted by *sugarhell*
> 
> Well we have a better fact.
> 
> http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
> 
> Page 92
> 
> Anyone that knows from APIs he will understand ;_;


Rep worthy. Comparison between DX11 on page 90, DX12 on 91 and Mantle on 92 is revealing.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> What do you do, if you don't mind me asking?


I teach. Mostly in the areas of Fiber Optics Engineering and Maintenance, xDSL network infrastructure and anything having to do with the installation and maintenance of large scale networks.

I came to Morocco, working for Bell Canada, for that very reason. Morocco still uses a rather outdated xDSL network in order to provide Internet to its citizens.


----------



## KSIMP88

Can't wait to run these on my desktop and laptop! Should be interesting. It's a shame my GPU in my laptop is nVidia...







Wish I could have found one with a good AMD card.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> We don't know enough information yet but something tells me that tien inclusion of fp16 mixed precision hints at some big changes in terms of the compute capabilities of Pascal. Sure, fp16 is useless for gaming but the fact that Pascal comes with new compute capabilities hints at some rather large architectural changes.
> LOL


See, I'm still confused on this.
My understanding was Pascal's primary market was HPC, which is overdue an update.
They may scale down for 2H16/1H17 consumer releases, but that also depends on how quickly Volta moves along in development.
And HPC goals versus consumer goals are further diverging, not converging - unless they've found a way to do both in a similar fashion.

Edit: What I mean by diverging is there seems to be a shift towards heterogeneous computing for HPC where that has gains for SoCs but not as much (yet) for the discrete GPU/CPU combination that most gamers use.


----------



## KSIMP88

Would my 7950s see similar gains?


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> I teach. Mostly in the areas of Fiber Optics Engineering and Maintenance, xDSL network infrastructure and anything having to do with the installation and maintenance of large scale networks.
> 
> I came to Morocco, working for Bell Canada, for that very reason. Morocco still uses a rather outdated xDSL network in order to provide Internet to its citizens.


Nice, it seems i find out about a new type of engineering everyday









Come to South Africa when you're done there, their internet probably looks first world compared to ours..


----------



## ToTheSun!

Quote:


> Originally Posted by *KSIMP88*
> 
> Would my 7950 see similar gains? Or did AMD only start Parallel with the 200 series?


The entire GCN family is engineered with this particular structure in mind. My best bet is you'll see neat performance in DX12.


----------



## garwynn

Quote:


> Originally Posted by *KSIMP88*
> 
> Would my 7950s see similar gains?


In theory you should see some gains given that you're on GCN. How much? YMMV.


----------



## sugarhell

Quote:


> Originally Posted by *ToTheSun!*
> 
> The entire GCN family is engineered with this particular structure in mind. My best bet is you'll see neat performance in DX12.


Well GCN 1.0 has only 2 ACEs.

I bet AMD from their work on PS4 understand how the development will go. Thats why we got 8 ACEs on Hawaii too like the ps4 gpu.


----------



## Mahigan

Quote:


> Originally Posted by *garwynn*
> 
> See, I'm still confused on this.
> My understanding was Pascal's primary market was HPC, which is overdue an update.
> They may scale down for 2H16/1H17 consumer releases, but that also depends on how quickly Volta moves along in development.
> And HPC goals versus consumer goals are further diverging, not converging - unless they've found a way to do both in a similar fashion.


Well nVIDIA does claim that Volta will be the most powerful Parallel Processor ever assembled. This hints at the companies move towards Parallelism. I think that Pascal will likely integrate a portion of these improvements. At least enough of these improvements in order to compete with AMDs upcoming AMD Radeon Rx 400 Series. 2016 is the year that DirectX 12 games will ramp up in production. Failing to place their own variant of a Parallel hardware would flood the market with GCN hardware. nVIDIA only have the discrete GPU market to bank on when it comes to guiding the future of game development on the PC platform. AMD, on the other hand, has the Console market all to itself and a large footing in both the discrete and iGPU markets (with their APUs). Any loss of market by nVIDIA would likely usher in a flood of AMD hardware and game titles into the market.

I don't view AMD as being the underdog. With DirectX 12 having gone their way (it is basically Mantle after-all) and with Console games making use of Asynchronous Compute capabilities, making it easier to port them over to AMD hardware on the PC, it would seem key that nVIDIAs Pascal tackle their apparent problems with Parallelism soon.

I mean we saw AMD release the HD 5xxx series with hardware Tessellation. nVIDIA scoffed at the idea only to release a part that buried AMDs cards in that respect. I am thinking that nVIDIA is set to do the same thing with Pascal. I could be wrong. It could be that nVIDIA did not anticipate DirectX 12 being a near copy of Mantle and began production of Pascal before said knowledge. If that is the case then nVIDIA are in for a pretty significant reduction in market share come 2016.


----------



## SpeedyVT

Quote:


> Originally Posted by *sugarhell*
> 
> Well GCN 1.0 has only 2 ACEs.
> 
> I bet AMD from their work on PS4 understand how the development will go. Thats why we got 8 ACEs on Hawaii too like the ps4 gpu.


PS4 has 8 ACEs and XBone has 2 ACEs.

Additional comparative notes.

If we include console sales AMD is outselling GCN to arch to any Maxwell.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> I mean we saw AMD release the HD 5xxx series with hardware Tessellation. nVIDIA scoffed at the idea only to release a part that buried AMDs cards in that respect. I am thinking that nVIDIA is set to do the same thing with Pascal. I could be wrong. It could be that nVIDIA did not anticipate DirectX 12 being a near copy of Mantle and began production of Pascal before said knowledge. If that is the case then nVIDIA are in for a pretty significant reduction in market share come 2016.


This is critical. Development normally starts 2 years in advance or more for new designs and this would have put them right square in the design phase. If they didn't expect this but Pascal can help address the issues that Maxwell will face, I see an acceleration of the scaled-down consumer versions and a quick end to Maxwell at the high end consumer level. If not, we ought to see Volta fast-tracked to market as quickly as possible as it should definitively address it. And neither I think we'll see until later this fall, when it will either be clear or not that Nvidia is not able to solve/minimize this at a driver level.


----------



## pengs

Quote:


> Originally Posted by *OneB1t*
> 
> in fact i think under really heavy workload (100 000 batches) FX will pull ahead of i5
> later i will try to modify test sequence to show just single map with huge fight zoomed out
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Percent GPU Bound: 99.271950%
> Percent GPU Bound: 99.365089%
> Percent GPU Bound: 93.482292%
> Percent GPU Bound: 98.633514%
> 
> see the batches per ms figure 4k for i5 vs 6k for FX


Your GPU limited in this scenario though.
--
A possible explanation for the "flat" FX benchmark could be that because the benchmark is cycling through area's or scenes some of it is more draw call intensive than integer freeing and intensifying depending upon whats in the camera's field of view. If AVX is being used the Intel part would show a lead in the integer intensive scene evening out when the draw call factor raises, vise versa ect. - becoming GPU bound in a few instances could complicate and obscure the end result further. Not enough telemetry to really make a definite conclusion imo.

If DX12 can also build a temporary draw call database as Vulcan can that may play into it unless that's specific to Vulcan. There's a lot going on which is different with 12 compared to how communication worked traditionally under previous DX iterations. Just random speculation here.


----------



## sugarhell

Also a more bit info about the game that i found interesting :


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> in fact i think under really heavy workload (100 000 batches) FX will pull ahead of i5
> later i will try to modify test sequence to show just single map with huge fight zoomed out
> 
> see this
> [email protected]
> 
> ==Shot long shot 3 ==================================
> Total Time: 5.008700
> Avg Framerate : 32.314194 ms (30.946156 FPS)
> Weighted Framerate : 32.396713 ms (30.867331 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 24.420563 ms (40.949097 FPS)
> Percent GPU Bound: 99.271950%
> Driver throughput (Batches per ms): 3960.520752
> Average Batches per frame: 47504.597656
> 
> ==Shot high vista ==================================
> Total Time: 4.970572
> Avg Framerate : 35.759510 ms (27.964590 FPS)
> Weighted Framerate : 35.872990 ms (27.876127 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.463301 ms (37.788181 FPS)
> Percent GPU Bound: 99.365089%
> Driver throughput (Batches per ms): 3962.454102
> Average Batches per frame: 50500.582031
> 
> [email protected]
> 
> ==Shot long shot 3 ==================================
> Total Time: 4.985974
> Avg Framerate : 33.462917 ms (29.883827 FPS)
> Weighted Framerate : 33.534695 ms (29.819862 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 29.043478 ms (34.431137 FPS)
> Percent GPU Bound: 93.482292%
> Driver throughput (Batches per ms): 5927.758789
> Average Batches per frame: 47757.062500
> 
> ==Shot high vista ==================================
> Total Time: 4.998882
> Avg Framerate : 33.776230 ms (29.606619 FPS)
> Weighted Framerate : 33.864346 ms (29.529583 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 28.254892 ms (35.392101 FPS)
> Percent GPU Bound: 98.633514%
> Driver throughput (Batches per ms): 6525.858398
> Average Batches per frame: 49477.691406
> 
> see the batches per ms figure 4k for i5 vs 6k for FX


thats kinda weird.


----------



## PontiacGTX

Quote:


> Originally Posted by *GorillaSceptre*
> 
> I think a lot of people are massively over estimating what next years GPU's will be. Even if AMD/NV overcome all the problems with the new process (Which is obviously going to cost a lot of money) It would be terrible business to release products that are twice as fast as a 980 Ti. Not to mention they will be the small dye parts.
> 
> I bet they will be a bit better than what the 980 was to the 780 Ti.


it will be around 20/30%+ faster maybe 40%+ if a game really need HBM.at least improving GCN1.2 should deliver more than 20%


----------



## Mahigan

Quote:


> Originally Posted by *garwynn*
> 
> This is critical. Development normally starts 2 years in advance or more for new designs and this would have put them right square in the design phase. If they didn't expect this but Pascal can help address the issues that Maxwell will face, I see an acceleration of the scaled-down consumer versions and a quick end to Maxwell at the high end consumer level. If not, we ought to see Volta fast-tracked to market as quickly as possible as it should definitively address it. And neither I think we'll see until later this fall, when it will either be clear or not that Nvidia is not able to solve/minimize this at a driver level.


And with less room for driver intervention in DirectX 12, improvements will be far trickier than they were in the past.


----------



## Evil Penguin

Quote:


> Originally Posted by *Mahigan*
> 
> And with less room for driver intervention in DirectX 12, improvements will be far trickier than they were in the past.


A much simpler driver is a good thing for IHVs unless you count on your software (a bunch of hacks) to beat the competition.
It's disgusting what has to be done at the driver level in order to make DX11/OGL as fast as it is today.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Evil Penguin*
> 
> A much simpler driver is a good thing for IHVs unless you count on your software (a bunch of hacks) to beat the competition.
> It's disgusting what has to be done at the driver level in order to make DX11/OGL as fast as it is today.


For sure especially for AMD since they are behind compare to Nvidia in drivers. Now Nvidia has to go all out with GameWorks to take the advantage back.


----------



## Mahigan

If anyone wants to use this information for whatever reason... I've since revised my thoughts and I believe these to be final accurate assessment..

Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.

nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been rather quiet about this feature for the most part.

Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.

Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.

The truth is that Maxwell 2 has only a single Asynchronous Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues).

I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.

How it works?

The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion, for Maxwell, and a Parallel fashion, for Maxwell 2, to the Work Distributor. The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMMs. The SMMs then assigns the work to a specific array of CUDA cores. nVIDIA call this entire process "HyperQ".
Here's the documentation for Kepler/Maxwell: http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf

GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.

Maxwell/2 HyperQ is thus potentially bottlenecked at the Grid Management with Maxwell also suffering a bottleneck at the Work Distributor segments of their pipeline. This is because both these stages of the Pipeline are "in order" for Maxwell and one stage is "in order" for Maxwell 2. In other words HyperQ contains, for the most part, a single pipeline (thus Maxwell/2 is more Serial than Parallel).

AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".

A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.

This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 when compared to a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.

GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I remembered about the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rather the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.


----------



## CrazyElf

Given how similar Mantle is to both DX12 and Vulkan, I wonder how big the differences are between Vulkan vs DX12?

Would it be easier in the future to port to LInux? Ashes I will note has indicated already that a Vulkan port (and by extension a LInux port) is likely in the future.

Quote:


> Originally Posted by *garwynn*
> 
> This is critical. Development normally starts 2 years in advance or more for new designs and this would have put them right square in the design phase. If they didn't expect this but Pascal can help address the issues that Maxwell will face, I see an acceleration of the scaled-down consumer versions and a quick end to Maxwell at the high end consumer level. If not, we ought to see Volta fast-tracked to market as quickly as possible as it should definitively address it. And neither I think we'll see until later this fall, when it will either be clear or not that Nvidia is not able to solve/minimize this at a driver level.


This is an important point. +Rep for bringing it up.

I may take back the statement that in that case, Pascal would be the biggest gain since the 8800GTX (which also heralded DX10). It could be only until Volta that we see real gains then.

First, it may be an HPC card first, so good FP64 performance, but perhaps not as good gaming performance (there is a tradeoff between the two)? They'd have to wait until Volta. Plus there will probably be Compute specific features, like ECC-HBM2. Another possibility is that they release the Compute card first without releasing the gaming card.

In that case, perhaps we may not see the gains?

Quote:


> Originally Posted by *Mahigan*
> 
> Well nVIDIA does claim that Volta will be the most powerful Parallel Processor ever assembled. This hints at the companies move towards Parallelism. I think that Pascal will likely integrate a portion of these improvements. At least enough of these improvements in order to compete with AMDs upcoming AMD Radeon Rx 400 Series. 2016 is the year that DirectX 12 games will ramp up in production. Failing to place their own variant of a Parallel hardware would flood the market with GCN hardware. nVIDIA only have the discrete GPU market to bank on when it comes to guiding the future of game development on the PC platform. AMD, on the other hand, has the Console market all to itself and a large footing in both the discrete and iGPU markets (with their APUs). Any loss of market by nVIDIA would likely usher in a flood of AMD hardware and game titles into the market.
> 
> I don't view AMD as being the underdog. With DirectX 12 having gone their way (it is basically Mantle after-all) and with Console games making use of Asynchronous Compute capabilities, making it easier to port them over to AMD hardware on the PC, it would seem key that nVIDIAs Pascal tackle their apparent problems with Parallelism soon.
> 
> I mean we saw AMD release the HD 5xxx series with hardware Tessellation. nVIDIA scoffed at the idea only to release a part that buried AMDs cards in that respect. I am thinking that nVIDIA is set to do the same thing with Pascal. I could be wrong. It could be that nVIDIA did not anticipate DirectX 12 being a near copy of Mantle and began production of Pascal before said knowledge. If that is the case then nVIDIA are in for a pretty significant reduction in market share come 2016.


Yeah I think that Nvidia may improve parallel somewhat in Pascal, but the real improvements will have to wait until Volta. We don't have enough information to work on right now.

I do disagree on the underdog though. I still would argue AMD is the underdog at this point:

They've lost a lot of market-share and perhaps more importantly, mind-share in the discrete market
AMD does not have a lot of money and are cutting back on their R&D for GPUs, at a time when Nvidia has tons of cash and their CPU division is struggling
Nvidia still has the money and channels to sponsor more games
From what I have heard from AMD Markham's office, the morale is down
Even with the control of the console market and a good portion of future games development, they are still having to face some serious challenges.

That being said, I'd love to see AMD bounce back and go to 50% marketshare - or more.

Quote:


> Originally Posted by *KSIMP88*
> 
> Would my 7950s see similar gains?


The gains will be more modest because there's only 2 ACE's:

Die shot of 7970 (7950 will be very similar): http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review


It will be better than had you gotten say, a 670 or 680 (by a considerable amount), but because there's only 2 ACEs, the parallel abilities will be more modest.

Quote:


> Originally Posted by *Mahigan*
> 
> If anyone wants to use this information for whatever reason... I've since revised my thoughts and I believe these to be final accurate assessment..
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Ashes of the Singularity makes use of Asynchronous Shading. Now we know that AMD have been big on advertising this feature. It is a feature which is used in quite a few Playstation 4 titles. It allows the Developer to make efficient use of the compute resources available. GCN achieves this by making use of 8 Asynchronous Compute Engines (ACE for short) found in GCN 1.1 290 series cards as well as all GCN 1.2 cards. Each ACE is capable of queuing up to 8 tasks. This means that a total of 64 tasks may be queued on GCN hardware which features 8 ACEs.
> 
> nVIDIA can also do Asynchronous Shading through its HyperQ feature. The amount of available information, on the nVIDIA side regarding this feature, is minimal. What we do know is that nVIDIA mentioned that Maxwell 2 is capable of queuing 32 Compute or 1 Graphics and 31 Compute for Asynchronous Shading. nVIDIA has been rather quiet about this feature for the most part.
> 
> 
> Anandtech made a BIG mistake in their article on this topic which seems to have become the defacto standard article for this topic. Their information has been copied all over the web. This information is erroneous. Anandtech claimed that GCN 1.1 (290 series) and GCN 1.2 were Capable of 1 Graphics and 8 Compute queues per cycle. This is in fact false. The truth is that GCN 1.1 (290 series) and GCN 1.2 are capable of 1 Graphics and 64 Compute queues per cycle.
> 
> Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.
> 
> The truth is that Maxwell 2 has only a single Asynchronous Compute Engine tied to 32 Compute Queues (or 1 Graphics and 31 Compute queues).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I figured this out when I began to read up on Kepler/Maxwell/2 CUDA documentation and I found what I was looking for. Basically Maxwell 2 makes use of a single ACE-like unit. nVIDIA name this unit the Grid Management Unit.
> 
> How it works?
> 
> The CPUs various Cores send Parallel streams to the Stream Queue Management. The Stream Queue Management sends streams to the Grid Management Unit (Parallel to Serial thus far). The Grid Management unit can then create multiple hardware work queues (1 Graphics and 31 Compute or 32 Compute) which are then sent in a Serial fashion, for Maxwell, and a Parallel fashion, for Maxwell 2, to the Work Distributor. The Work Distributor, in a Parallel fashion, assigns the work loads to the various SMMs. The SMMs then assigns the work to a specific array of CUDA cores. nVIDIA call this entire process "HyperQ".
> Here's the documentation for Kepler/Maxwell: http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf
> 
> GCN 1.1 (290 series)/GCN 1.2, on the other hand, works in a very different manner. The CPUs various Cores send Parallel streams to the Asynchronous Compute Engines various Queues (up to 64). The Asynchronous Compute Engines prioritizes the work and then sends it off, directly, to specific Compute Units based on availability. That's it.
> 
> Maxwell/2 HyperQ is thus potentially bottlenecked at the Grid Management with Maxwell also suffering a bottleneck at the Work Distributor segments of their pipeline. This is because both these stages of the Pipeline are "in order" for Maxwell and one stage is "in order" for Maxwell 2. In other words HyperQ contains, for the most part, a single pipeline (thus Maxwell/2 is more Serial than Parallel).
> 
> AMDs Asynchronous Compute Engine implementation is different. It contains 8 Parallel Pipelines working independently from one another. This is why AMDs implementation can be described as being "out of order".
> 
> A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.
> 
> This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 when compared to a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.
> 
> GCN 1.1 (290 series)/GCN 1.2 are clearly being limited elsewhere, and I believe it is due to their Peak Rasterization Rate or Gtris/s. Many objects and units permeate the screen under Ashes of the Singularity. Each one is made up of Triangles (Polygons). Since both the Fury-X and the 290x/390x have the same amount of hardware rasterization units, I believe that this is the culprit. Some people have attribute this to the amount of ROps (64) that both Fury-X and 290/390x share. I thought the same at first but then I remembered about the Color Compression found in the Fury/Fury-X cards. The Fury/X make use of Color Compression algorithms which have shown to alleviate the Pixel Fill Rate issues which were found in the 290/390x cards. Therefore I do not believe that ROps (Render Back Ends) are the issue. Rather the Triangle Setup Engine (Raster/Hierarchical Z) are the likely culprits.


Do you think it's worth contacting Ryan Smith about this? Has he even read this thread?

So next generation, AMD desperately needs to upgrade it's rasterizer? More than 4 rasterizers or something else is needed? Better color compression again (perhaps an even larger cache then)?


----------



## Mahigan

Quote:


> Originally Posted by *CrazyElf*
> 
> Do you think it's worth contacting Ryan Smith about this? Has he even read this thread?


I don't want to bring too much attention my way. I have enough stress as is LOL









I figure that once more DirectX 12 titles come out, he'll revise his article.

Quote:


> Originally Posted by *CrazyElf*
> 
> So next generation, AMD desperately needs to upgrade it's rasterizer? More than 4 rasterizers or something else is needed? Better color compression again (perhaps an even larger cache then)?


I think that AMD should, maybe, focus on optimizing their Rasterizer's. They should also work on better Color Compression algorithms. These are two areas in which nVIDIA have them beat which will hamper AMD once nVIDIA get a hold of HBM themselves.

Their Geometry Engines, responsible for Tessellation, could also use a revamp. If more characters can be placed onto the screen at once then it might be better to ensure they can handle all that extra tessellation.


----------



## agentx007

I don't quite get why AMD didn't go more heavy on Rasterisers and Tesselators with GCN.
Strange thing is : They must have knew (in 2010), what nV did with Fermi.
BUT in GCN (2013), they still opted for less Triangles/s...

I think it should be obvious to them that this will limit their architecture in DX11 - so why they didn't do anything about it ?

PS. @Mahigan I sended a PM message - if U will have time, check it out.


----------



## Kuivamaa

Quote:


> Originally Posted by *agentx007*
> 
> I don't quite get why AMD didn't go more heavy on Rasterisers and Tesselators with GCN.
> Strange thing is : They must have knew (in 2010), what nV did with Fermi.
> BUT in GCN (2013), they still opted for less Triangles/s...
> 
> I think it should be obvious to them that this will limit their architecture in DX11 - so why they didn't do anything about it ?
> 
> PS. @Mahigan I sended a PM message - if U will have time, check it out.


Transistor budget. AMD went compute heavy with GCN ,especially Hawaii onwards. Priorities I guess.


----------



## semitope

Quote:


> Originally Posted by *agentx007*
> 
> I don't quite get why AMD didn't go more heavy on Rasterisers and Tesselators with GCN.
> Strange thing is : They must have knew (in 2010), what nV did with Fermi.
> BUT in GCN (2013), they still opted for less Triangles/s...
> 
> I think it should be obvious to them that this will limit their architecture in DX11 - so why they didn't do anything about it ?
> 
> PS. @Mahigan I sended a PM message - if U will have time, check it out.


Amd doesn't consider their hardware weak in tessellation. Nvidia overdosing on tessellation is why people think they are weak. The main purpose of improving it previously would be to avoid Nvidia exploiting that advantage to no benefit of the gamer

Not sure on the rasterizer. Higher polygon counts don't make graphics completely. Post processing, particles etc add the flair.


----------



## Kpjoslee

Quote:


> Originally Posted by *garwynn*
> 
> This is critical. Development normally starts 2 years in advance or more for new designs and this would have put them right square in the design phase. If they didn't expect this but Pascal can help address the issues that Maxwell will face, I see an acceleration of the scaled-down consumer versions and a quick end to Maxwell at the high end consumer level. If not, we ought to see Volta fast-tracked to market as quickly as possible as it should definitively address it. And neither I think we'll see until later this fall, when it will either be clear or not that Nvidia is not able to solve/minimize this at a driver level.


I think Nvidia has been aware of the changes in DX12 and might be well prepared with Pascal. I remember when AMD introduced unified shaders on Xbox 360 GPU, while AMD touted the strength of its brand new tech while Nvidia went all defensive saying independent pixel/vertex shaders still got ways to go. Then Nvidia struck first with 8800GTX equipped with their own unified shaders lol.


----------



## CrazyHeaven

My 980 ti arrives on Tuesday. It is currently still in it's package though already registered to me. After reading this thread I'm afraid to open it.

I stepped up to it and before this thread started I was getting a dx12 compatible card with a lot of vram. Now I'm not so sure and feel I should have stuck with my 970 until the release. My only real saving grace is things are not clear yet and I'm interested in seeing Vulkan benchmarks as I may never run win 10.


----------



## Forceman

Quote:


> Originally Posted by *CrazyHeaven*
> 
> My 980 ti arrives on Tuesday. It is currently still in it's package though already registered to me. After reading this thread I'm afraid to open it.
> 
> I stepped up to it and before this thread started I was getting a dx12 compatible card with a lot of vram. Now I'm not so sure and feel I should have stuck with my 970 until the release. My only real saving grace is things are not clear yet and I'm interested in seeing Vulcan benchmarks as I may never run win 10.


You probably should not be making buying decisions off of one pre-release benchmark. The 980 Ti is a very fast card, in both DX11 and in this one DX12 benchmark, where it still basically matches a Fury (while annihilating it in DX11). Plus, the relative performance of the 980 Ti to the 970 isn't going to change at all in DX12.


----------



## ZealotKi11er

Quote:


> Originally Posted by *CrazyHeaven*
> 
> My 980 ti arrives on Tuesday. It is currently still in it's package though already registered to me. After reading this thread I'm afraid to open it.
> 
> I stepped up to it and before this thread started I was getting a dx12 compatible card with a lot of vram. Now I'm not so sure and feel I should have stuck with my 970 until the release. My only real saving grace is things are not clear yet and I'm interested in seeing Vulkan benchmarks as I may never run win 10.


You are fine. By the time DX12 mean anything your card will be old news.


----------



## p4inkill3r

Quote:


> Originally Posted by *CrazyHeaven*
> 
> My 980 ti arrives on Tuesday. It is currently still in it's package though already registered to me. After reading this thread I'm afraid to open it.
> 
> I stepped up to it and before this thread started I was getting a dx12 compatible card with a lot of vram. Now I'm not so sure and feel I should have stuck with my 970 until the release. My only real saving grace is things are not clear yet and I'm interested in seeing Vulkan benchmarks as I may never run win 10.


The 980Ti is not a card to feel hesitant about purchasing under any circumstance.


----------



## semitope

Quote:


> Originally Posted by *ZealotKi11er*
> 
> You are fine. By the time DX12 mean anything your card will be old news.


olds news is not irrelevant. An asteroid wiping out new york would be old news in a year but boy would it still matter. Also, dx12 games are coming much sooner than a lot think. Hitman is confirmed to be dx12 and that comes in december

http://www.computerbase.de/2015-08/directx-12-neue-api-bringt-hoeheren-detailgrad-in-hitman/

It's much easier for them to take advantage of certain things in dx12 because of the consoles. asynchronous shader support was already in popular engines because of the consoles.
Quote:


> Originally Posted by *p4inkill3r*
> 
> The 980Ti is not a card to feel hesitant about purchasing under any circumstance.


its a $649 card that may end up under-delivering at the cutting edge while only marginally beating the competition in the old API.

that particular card is very questionable now because a 290x has similar performance numbers. Without the dx11 issues it may look much worse later on than the $649 780 or $699 780ti look now.

_The saving grace of course is that if you buy an expensive card and it turns out to be a bad purchase, you can always sell it for a ton (as long as the price hasn't dropped), and buy whatever else you want._


----------



## Serandur

Quote:


> Originally Posted by *semitope*
> 
> its a $649 card that may end up under-delivering at the cutting edge while only marginally beating the competition in the old API.
> 
> that particular card is very questionable now because a 290x has similar specifications. Without the dx11 issues it may look much worse later on than the $649 780 or $699 780ti look now.
> 
> _The saving grace of course is that if you buy an expensive card and it turns out to be a bad purchase, you can always sell it for a ton (as long as the price hasn't dropped), and buy whatever else you want._


Absolutely not. Aftermarket/OC 980Tis significantly beat the Fury X across a wide variety of DX11 games (particularly at 1920x1080 and 2560x1440). It's not even a contest, for the most part. The 980 Ti's overclocking headroom/aftermarket model OC should still keep it ahead of the Fury X in this DX12 benchmark (based on the ones pitting the Fury X against the 980 Ti), as well. A 290X does not have similar specifications. Assuming you're referring to shader counts, they're two completely different microarchitectures.


----------



## Mahigan

Since a user PMd me about this, I figure I'll share with everyone here.

Simple ways to calculate some theoretical GPU performance figures

To calculate Peak Rasterization Rate: number of rasterizers * clock frequency = Gtris/s
To calculate Flops: shaders * 2 * clock frequency = GFlops
To calculate Fill Rate: number of raster operations (ROPs) * clock frequency = Gpixel/s

For an AMD R9 290x Hawaii:

1000 MHz Clock
4 Rasterizers
64 ROPs
2816 Shaders

To calculate Peak Rasterization Rate: 4 * 1000MHz = 4000 Triangles per clock or 4.0 Gtris/s
To calculate Flops: 2816 * 2 * 1000MHz = 5.6 TFlops
To calculate Fill Rate: 64 * 1000MHz = 64000 Mpixel/s or 64 Gpixel/s

For an nVIDIA GTX 980 Ti Maxwell 2:

1000 Mhz Clock (boost clock of 1075MHz for the shaders)
6 Rasterizers
96 ROPs
2816 Shaders

To calculate Peak Rasterization Rate: 6 * 1000MHz = 6000 Triangles per clock or 6.0 Gtris/s
To calculate Flops: 2816 * 2 * 1075MHz = 6.1 TFlops
To calculate Fill Rate: 96 * 1000MHz = 96000 Mpixel/s or 96 Gpixel/s

That's an easy way of achieving the theoretical numbers.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> Since a user PMd me about this, I figure I'll share with everyone here.
> 
> Simple ways to calculate some theoretical GPU performance figures
> 
> To calculate Peak Rasterization Rate: number of rasterizers * clock frequency = Gtris/s
> To calculate Flops: shaders * 2 * clock frequency = GFlops
> To calculate Fill Rate: number of raster operations (ROPs) * clock frequency = Gpixel/s
> 
> For an AMD R9 290x Hawaii:
> 
> 1000 MHz Clock
> 4 Rasterizers
> 64 ROPs
> 2816 Shaders
> 
> To calculate Peak Rasterization Rate: 4 * 1000MHz = 4000 Triangles per clock or 4.0 Gtris/s
> To calculate Flops: 2816 * 2 * 1000MHz = 5.6 GFlops
> To calculate Fill Rate: 64 * 1000MHz = 64000 Mpixel/s or 64 Gpixel/s
> 
> For an nVIDIA GTX 980 Ti Maxwell 2:
> 
> 1000 Mhz Clock (boost clock of 1075MHz for the shaders)
> 6 Rasterizers
> 96 ROPs
> 2816 Shaders
> 
> To calculate Peak Rasterization Rate: 6 * 1000MHz = 6000 Triangles per clock or 6.0 Gtris/s
> To calculate Flops: 2816 * 2 * 1075MHz = 6.1 GFlops
> To calculate Fill Rate: 96 * 1000MHz = 96000 Mpixel/s or 96 Gpixel/s
> 
> That's an easy way of achieving the theoretical numbers.


About the GFLOPs (which you've actually listed as TFLOPs), actual boost clocks for reference 980 Tis realistically tend to be in the 1200 MHz range. For aftermarket models, the 1300-1400 range isn't uncommon. You're looking more at 6.76 to 7.88 TFLOPs out of the box depending on the model. And 115.3 to 134.4 Gpixel/s as well; for theoretical peaks of course. Gtri/s should also scale up accordingly.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> Absolutely not. Aftermarket/OC 980Tis significantly beat the Fury X across a wide variety of DX11 games (particularly at 1920x1080 and 2560x1440). It's not even a contest, for the most part. The 980 Ti's overclocking headroom/aftermarket model OC should still keep it ahead of the Fury X in this DX12 benchmark, as well. A 290X does not have similar specifications. Assuming you're referring to shader counts, they're two completely different microarchitectures.


I would argue that if you are currently sitting on a Radeon R9 290x 4GB card, you probably don't need to upgrade for quite some time unless you want to play at 4K. That's the only reason one should upgrade (other than wanting to brag to his/her friends about benchmark numbers). I would not upgrade to either Fury-X or a GTX 980 Ti. It is pointless at this point. Running a game at 100FPS or running it at 65FPS isn't different. Throwing away $650, for no real tangible benefits, is wasting money. Might as well light that $650 on fire.

As for the difference in Microarchitectures or shader counts (2816 ALUs each in the case of a GTX 980 Ti and 290x), this *difference* (in utilization) is gone with DirectX 12. The main problem with AMD GCN cards was keeping them fed, in order to hit the theoretical compute performance. The GCN cards were made to be fed in Parallel. DirectX 11 is a serial API. The two, together, did not allow the GCN architecture to hit anywhere near its theoretical figures. This is why AMD GCN users are getting a *free* upgrade with DirectX 12. Why upgrade when you can play all of today's DirectX 11 games and get a boost in performance come DirectX 12? People, who bought these cards, lucked out.

Now if you're looking to game at 4K, on titles available today in DirectX 11, then the GTX 980 Ti is the card to get. There is no doubt about that. If not, then don't bother.

We really need to do away with all of this partisanship. It is turning the PC Gaming community into a Federal Election. I think we can all agree that Politicians, on either side, don't care about us. They care about their own interests. It is the same with the two gigantic Corporations involved in this *debate*.

Save some cash if you're not going to be running 4K on current titles.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> About the GFLOPs (which you've actually listed as TFLOPs), actual boost clocks for reference 980 Tis realistically tend to be in the 1200 MHz range. For aftermarket models, the 1300-1400 range isn't uncommon. You're looking more at 6.76 to 7.88 TFLOPs out of the box depending on the model. And 115.3 to 134.4 Gpixel/s as well; for theoretical peaks of course.


I picked the stock clocks as per the two respective companies websites. I placed the calculations there so people could calculate their theoreticals for their particular cards.

I've fixed the TFlops. When I was writting this it was for a forum member who wanted info on older hardware performance. I copy pasted it over here without making the changes to Tflops.

Worth mentioning that my own 290x cards are clocked at 1,300MHz (both of them). True... I have them watercooled but hitting 1,200MHz on after market coolers isn't that rare either. For compute work loads, the 290x cards are very capable. In fact the only nVIDIA card to surpass the 290x is the GTX 980 Ti (as the GTX 780 Ti couldn't achieve a high rate of utilization under DirectX 11 much less DirectX 12). This of course doesn't matter under DirectX 11. It does, however, matter under DirectX 12.

I'm not saying the 290x is a match for the GTX 980 Ti, it will likely end up slower even under DirectX 12, but the difference won't be as pronounced as it is currently under DirectX 11.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> I would argue that if you are currently sitting on a Radeon R9 290x 4GB card, you probably don't need to upgrade for quite some time unless you want to play at 4K. That's the only reason one should upgrade (other than wanting to brag to his/her friends about benchmark numbers). I would not upgrade to either Fury-X or a GTX 980 Ti. It is pointless at this point. Running a game at 100FPS or running it at 65FPS isn't different. Throwing away $650, for no real tangible benefits, is wasting money. Might as well light that $650 on fire.


The reasons why someone is upgrading or even what they're upgrading from/to are an individual case-by-case thing, but not my point of contention which is only that a 290X is not in the same league as a 980 Ti theoretically or otherwise. Though for whatever it's worth, I disagree there's no point to getting a 980 Ti. It's not the most cost-effective thing to buy any high-end card, but for those interested, there are a lot of games out there with different settings and demands, a lot more potential screen resolutions than just a single 1080p or 4K screen (like 2560x1440 or multi-monitor setups), and screens with higher refresh rates than 60 Hz.

I've personally owned a 290X and a 980 Ti both and the latter just scratched a 1440p itch the former couldn't.
Quote:


> As for the difference in Microarchitectures in shader counts, this difference is gone with DirectX 12. The main problem with AMD GCN cards was keeping them fed, in order to hit the theoretical compute performance. The GCN cards were made to be fed in Parallel. DirectX 11 is a serial API. The two, together, did not allow the GCN architecture to hit anywhere near its theoretical figures. This is why AMD GCN users are getting a *free* upgrade with DirectX 12.


All GPUs were made to be fed in parallel. That's why they have several hundred to several thousand shaders each (and scale well with the number, too). It's just semantics though since I know what you're getting at with ACEs and queues, but that's not the same thing as, say, saying Maxwell isn't a parallel processor.

DX12 doesn't eliminate the difference in microarchitecture since the difference is hardware-level with Maxwell having significantly higher core clockspeeds, stronger geometry capabilities, significantly greater cache, a greater number of higher-clocked ROPs, and a myriad of other obscure and highly-important differences including IPC and the designs of each architecture's schedulers. Even theoretical peak shader throughput on GM200 is far higher than Hawaii alone by virtue of Maxwell's high clock speeds.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> The reasons why someone is upgrading or even what they're upgrading from/to are an individual case-by-case thing, but not my point of contention which is only that a 290X is not in the same league as a 980 Ti theoretically or otherwise. Though for whatever it's worth, I disagree there's no point to getting a 980 Ti. It's not the most cost-effective thing to buy any high-end card, but for those interested, there are a lot of games out there with different settings and demands, a lot more potential screen resolutions than just a single 1080p or 4K screen (like 2560x1440 or multi-monitor setups), and screens with higher refresh rates than 60 Hz.
> 
> I've personally owned a 290X and a 980 Ti both and the latter just scratched a 1440p itch the former couldn't.


I have two 290x's and I run Eyefinity over three 1080p displays. I'm not your usual user but I must say I have no issues running Battlefield 4 with all the settings cranked up. Paying $650, at this point, will grant me no benefits whatsoever. This is why I'm waiting to see what Pascal will bring to the table. I'd like to get three 4K monitors, a pipe dream but working on my power point slides would be a dream across three 4K screens (gaming as well of course).
Quote:


> Originally Posted by *Serandur*
> 
> All GPUs were made to be fed in parallel. That's why they have several hundred to several thousand shaders each (and scale well with the number, too). It's just semantics though since I know what you're getting at with ACEs and queues, but that's not the same thing as, say, saying Maxwell isn't a parallel processor.
> 
> DX12 doesn't eliminate the difference in microarchitecture since the difference is hardware-level with Maxwell having significantly higher core clockspeeds, stronger geometry capabilities, significantly greater cache, a greater number of higher-clocked ROPs, and a myriad of other obscure and highly-important differences including IPC and the designs of each architecture's schedulers. Even theoretical peak shader throughput on GM200 is far higher than Hawaii alone by virtue of Maxwell's high clock speeds.


Maxwell 2 is not as Parallel. This is why it drops in performance under Ashes of the Singularity. It should beat a 290x or 390x and quite handily, but it doesn't. In fact in many cases it derives better performance under DirectX 11 than it does under DirectX 12. Under Parallel conditions its performance drops (when I say drops I mean comparing Star Swarm to Ashes of the Singularity where one title has Post Processing effects in the form of Asynchronous Shading and the other does not. Both run on the Nitrous engine). Since we will likely see Asynchronous shading across a series of DirectX 12 titles (it's the main feature for handling multiple light sources and other post processing effects not possible under DirectX 11 after-all) we may find ourselves in a position where those who kept their 2 year old GCN CPUs get to enjoy the same titles at the same settings as those who upgraded to the latest and greatest from the Green team come DirectX 12. I understand that Ashes of the Singularity is only one benchmark but after reading what Johan Andersson had to say about Frostbite in 2016, not entirely impossible to see many other titles showcasing this sort of behavior after-all. I was skeptical at first, but the more I read what developers are saying... the less skeptical I am becoming.

As for the geometry capabilities, ex. Tessellation, unless a game goes overboard with un-needed tessellation... it's not likely to be that much of a factor. We'll have to see on that one. For the ROPs, I completely concede that point. ROPs are likely to play a big role as the resolution is scaled up (ex 4K) in MMORPGs. These are games I tend to enjoy myself and the lack of color compression, on the Hawaii GPU, hurts it greatly on that front. The two main limitations I see are ROPs and Rasterizers. Both of which could affect future MMORPGs (Fable Legends comes to mind). Compute performance is not a factor which concerns me one bit with DirectX 12 and GCN though. It's not just Asynchronous Shading, it's the ability of a Parallel API to keep the Hawaii cores fed. If anything a 290x/390x will perform at more than playable frame rates in 1080p with all the settings turned up. Two of them should still allow me to run Eyefinity (Multi-Adapter will help greatly in order to achieve this by granting me an 8GB Frame buffer, under SFR, rather than 4GB under DirectX 11 due to the Texture redundancy of AFR).

You have to admit it, AMD GCN 1.1 (290 series) owners lucked out so far. I'll wait until the next DirectX 12 titles come out before I can form a solid opinion. The next title should let us know more about what to expect.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Mahigan*
> 
> Now if you're looking to game at 4K, on titles available today in DirectX 11, then the GTX 980 Ti is the card to get. There is no doubt about that. If not, then don't bother.


I was planning on picking up a 4k monitor around Black Friday. My original idea was to a another 980 ti towards the release of pascal if gpus price do not drop for bf. Thinking I might rethink that plan. It's just that my last gpu was the 560 ti and was hoping for something a little more from my 980ti. I miss the days of the 8800gtx that lasted forever for me.

Now I'm thinking I'll use a single 980ti to get the most enjoyment from the witcher 3 and just wait and see what is going on with pascal. I'll only buy a 4k monitor if there is a significant price drop on a high end brand with free sync or equivalent.

I've learned a lot from this thread. Gratz to everyone for putting all this info together.


----------



## Mahigan

Quote:


> Originally Posted by *CrazyHeaven*
> 
> I was planning on picking up a 4k monitor around Black Friday. My original idea was to a another 980 ti towards the release of pascal if gpus price do not drop for bf. Thinking I might rethink that plan. It's just that my last gpu was the 560 ti and was hoping for something a little more from my 980ti. I miss the days of the 8800gtx that lasted forever for me.
> 
> Now I'm thinking I'll use a single 980ti to get the most enjoyment from the witcher 3 and just wait and see what is going on with pascal. I'll only buy a 4k monitor if there is a significant price drop on a high end brand with free sync or equivalent.
> 
> I've learned a lot from this thread. Gratz to everyone for putting all this info together.


If you're grabbing a 4K monitor for the Witcher 3 then you have no other solution than to get a GTX 980 Ti. It's the only card which will be able to drive that game. May have to drop the settings down a little at 4K but it should run great.


----------



## Noufel

What's the next game on dx12 and when it's comming ???


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> What's the next game on dx12 and when it's comming ???


Ark: Survival Evolved, though it is only getting DirectX 12 through a patch. This patch was supposed to be out by mid-August. It is a little late.

Followed by

Deus Ex: Mankind Divided (Q1 2016)
Fable Legends (Q4 2015)
Gears of War: Ultimate Edition (Q4 2015)
Sea of Thieves (Q1 2016)

To name a few.

Fable Legends will be an interesting title. It will allow for Cross-platform gameplay with XBox One players.

Fable Legends was built for the XBox One. Therefore in that title we're likely to see less of an emphasis on Asynchronous Shading by virtue of the Xbox One having only two ACEs (16 Queues) available. Maxwell 2 shouldn't suffer the sort of performance drops it does under Ashes of the Singularity under that title. Asynchronous Shading is being used under Fable Legends though. Each spell shoots out multiple light sources across the screen. A GTX 980 see's a healthy boost of over 10FPS when running DirectX 12 in this title (over DirectX 11). Going from 30 some FPS to around 40 some FPS.

Fable Legends developer comments:
Quote:


> A dev from Lionhead says they're just getting started with DX12. One of the things they're most excited about is asynchronous shaders, which allow other compute tasks to run in parallel with graphics. AMD's GCN has been "amazing" for that. Means they can consume extra GPU power without compromising in-game performance. Showing a demo of DX12 in-game effects in Fable Legends. Looks nice and runs smoothly. Win10 beta is coming very soon.


Deus Ex: Mankind Divided will be supporting Asynchronous Shaders as well as TressFX 3.0.

Frostbite 3: Asynchronous Shaders, throughout come 2016.

Batman: Arkham Knight will be re-released using DirectX 12

The Witcher 3 is getting DirectX 12 by a patch (increased view distance and with more detailed world)

Heck the more I read up on this the more I'm starting to see a pattern. DirectX 12 will be big news for GCN users.


----------



## JunkoXan

I suppose with my 280x/7970 I should feel quite content and comfortable for a few more years.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> snip


It's not that I disagree with Hawaii being a very good chip and at various points of its life, a very good deal as well. But I'm not understanding where the conclusions people are arriving to regarding GCN or rather Fiji/Hawaii in particular versus the 980 Ti in DX12 are coming from. Not only is this only one benchmark, but skimming through this thread, I only see two sites that even compare Hawaii and GM200 running it in the first place. The first is computerbase.de:



Which shows both the 980 Ti and the Fury X capping off at a nearly identical average FPS about 15% above the 390. That seems an awful lot like a potential CPU limitation and the review didn't even test the 390 in the higher-resolution section to mitigate or test for the limitation. Which leaves only the Arstechnica review:



Which shows very questionable results as they would suggest, in tandem with other reviews showing the stock 980 Ti and Fury X being roughly equal, that Hawaii would be about as fast as stock Fiji as well. This result also conflicts with the pcper review in which they found a 390X (clocked higher than a 290X, of course) beating a stock 980 in DX12 by only ~5-10%, so a stock 290X shouldn't even get close to the 980 Ti. The results aren't coherent. Not that I'm implying you were saying these are conclusive or anything as you actually stated the direct opposite, but I'm directing this more towards the general opinions I'm seeing expressed.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> It's not that I disagree with Hawaii being a very good chip and at various points of its life, a very good deal as well. But I'm not understanding where the conclusions people are arriving to regarding GCN or rather Fiji/Hawaii in particular versus the 980 Ti in DX12 are coming from. Not only is this only one benchmark, but skimming through this thread, I only see two sites that even compare Hawaii and GM200 running it in the first place. The first is computerbase.de:
> 
> 
> 
> Which shows both the 980 Ti and the Fury X capping off at a nearly identical average FPS about 15% above the 390. That seems an awful lot like a potential CPU limitation and the review didn't even test the 390 in the higher-resolution section to mitigate or test for the limitation.


The CPU frame rate, also available in Ashes of the Singularity, shows that both Fiji, and Maxwell 2, are bottlenecking the CPU and not the other way around.

If you start reading this thread at around page 41, you'll see I wrote up a great detailed analysis about this issue. Using logical deduction I was able to discern that the Fury-X is limited, most likely, by its ability to draw triangles (many many units on the screen) whereas the GTX 980 Ti is limited by its Serial nature (for processing Asynchronous shaders).

It took a lot of work in order for me to deduce these conclusions. I think it is worth the read.
Quote:


> Originally Posted by *Serandur*
> 
> Which leaves only the Arstechnica review:
> 
> 
> 
> Which shows very questionable results as they would suggest, in tandem with other reviews showing the stock 980 Ti and Fury X being roughly equal, that Hawaii would be about as fast as stock Fiji as well. This result also conflicts with the pcper review in which they found a 390X (clocked higher than a 290X, of course) beating a stock 980 in DX12 by only ~5-10%, so a stock 290X shouldn't even get close to the 980 Ti. The results aren't coherent. Not that I'm implying you were saying these are conclusive or anything as you actually stated the direct opposite, but I'm directing this more towards the general opinions I'm seeing expressed.


Fiji and Hawaii are near equal because they both have the same Gtris/s rate. Both have 4 Rasterizers clocked at the same frequency. Both are limited by their ability to draw all those triangles (polygons) on the screen in Ashes of the Singularity. the GTX 980 Ti (and GTX 980) are both limited by their Serial nature in processing Asynchronous Shaders. While the GCN 1.1 (290 series) and GCN 1.2 architectures incorporate 8 Asynchronous Compute Engines (each working independently from one another and thus best able to work in tandem with a Multi-Core CPU) the Maxwell 2 parts use a single unit (Grid Management Unit) for this task. What's more is that the Maxwell 2 can only prioritize 1 Graphics and 31 Compute threads per clock while also exhibiting a higher degree of latency (around 20ms more per clock). The GCN parts have 8 Asynchronous shaders able to queue up 8 threads each for a total of 64 Compute as well as 1 Graphics by the Graphics Command Processor. The ACEs in GCN are also able to talk directly to the Compute Units. No middle man (which Maxwell 2 has in the form of a Work distributor which is what causes the approx 20ms more in latency).

In essence the GCN 1.1 (290 series) and GCN 1.2 architectures are superior at handling Parallel Compute tasks without requiring any sort of driver optimizations or interventions. This is what DirectX 12 demands. Maxwell 2 would benefit best from driver interventions which are not really possible under DirectX 12 due to the API being closer to metal.

In the end you have a Maxwell 2 being less efficient in compute than a GCN 1.1 (290 series) and GCN 1.2. The exact reversal of what we are used to seeing under DirectX 11.

In DirectX 12 games which do not tax the rasterizers, but make heavy use of compute, we should see Fiji flying past Maxwell 2 and Hawaii.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> The CPU frame rate, also available in Ashes of the Singularity, shows that both Fiji, and Maxwell 2, are bottlenecking the CPU and not the other way around.
> 
> If you start reading this thread at around page 41, you'll see I wrote up a great detailed analysis about this issue. Using logical deduction I was able to discern that the Fury-X is limited, most likely, by its ability to draw triangles (many many units on the screen) whereas the GTX 980 Ti is limited by its Serial nature (for processing Asynchronous shaders).
> 
> It took a lot of work in order for me to deduce these conclusions. I think it is worth the read.
> Fiji and Hawaii are equal because they both have the same Gtris/s rate. Both have 4 Rasterizers clocked at the same frequency. Both are limited by their ability to draw all those triangles (polygons) on the screen in Ashes of the Singularity. the GTX 980 Ti (and GTX 980) are both limited by their Serial nature in processing Asynchronous Shaders. While the GCN 1.1 (290 series) and GCN 1.2 architectures incorporate 8 Asynchronous Compute Engines (each working independently from one another and thus best able to work in tandem with a Multi-Core CPU) the Maxwell 2 parts use a single unit for the task. What's more is that the Maxwell 2 can only prioritize 1 Graphics and 31 Compute threads per clock while also exhibiting a higher degree of latency (around 20ms more per clock). The GCN parts have 8 Asynchronous shaders able to queue up 8 threads each for a total of 64. The ACEs in GCN are also able to talk directly to the Compute Units. No middle man (which Maxwell 2 has in the form of a Work distributor).
> 
> In essence the GCN 1.1 (290 series) and GCN 1.2 architectures are superior at handling Parallel Compute tasks without requiring any sort of driver optimizations or interventions. This is what DirectX 12 demands. Maxwell 2 would benefit best from driver interventions which are not really possible under DirectX 12 due to the API being closer to metal.
> 
> In the end you have a Maxwell 2 being less efficient in compute than a GCN 1.1 (290 series) and GCN 1.2. The exact reversal of what we are used to seeing under DirectX 11.


Its very hard for people to believe that. I dont want to get into any speculations but Nvidia is completely fine with this. It just gives their user reason to upgrade considering they control most of the market share so now they are fighting for their own users. AMD usually project their architecture well into the future because they have too due to limiting R&D.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Its very hard for people to believe that. I dont want to get into any speculations but Nvidia is completely fine with this. It just gives their user reason to upgrade considering they control most of the market share so now they are fighting for their own users. AMD usually project their architecture well into the future because they have too due to limiting R&D.


You should see the hate mail I've received


----------



## JunkoXan

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ZealotKi11er*
> 
> Its very hard for people to believe that. I dont want to get into any speculations but Nvidia is completely fine with this. It just gives their user reason to upgrade considering they control most of the market share so now they are fighting for their own users. AMD usually project their architecture well into the future because they have too due to limiting R&D.
> 
> 
> 
> You should see the hate mail I've received
Click to expand...

need a elephant to carry all that hate mail?


----------



## Mahigan

Quote:


> Originally Posted by *JunkoXan*
> 
> need a elephant to carry all that hate mail?


On top of the hate mail... I've received some rather negative comments, in a variety of forums and comment threads where my findings have been published by other users. Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.

The usual partisan drivel.

For once, it would appear, that AMD weren't exaggerating the performance of their GPU. Fiji on DirectX 11 is terrible. Fiji on DirectX 12 will be quite amazing.


----------



## Casey Ryback

Quote:


> Originally Posted by *Mahigan*
> 
> On top of the hate mail... I've received some rather negative comments, in a variety of forums and comment threads where my findings have been published by other users. Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.
> 
> The usual partisan drivel.


Welcome to the internet, it's been controlled by Nvidia for some time now.

And they call you a fanboi lol, what about the herd mentality on the web..............


----------



## semitope

some of those people think nvidia can do no wrong. they have so much money and so much resources that everything they do must be right. If something is wrong, its with the game, the api, or anyone who claims something is wrong.


----------



## ZealotKi11er

Quote:


> Originally Posted by *semitope*
> 
> some of those people think nvidia can do no wrong. they have so much money and so much resources that everything they do must be right. If something is wrong, its with the game, the api, or anyone who claims something is wrong.


Nvidia does more right then wrong. Their market dominance shows.


----------



## p4inkill3r

Quote:


> Originally Posted by *Mahigan*
> 
> Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.


I think you're going to fit in well around here.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> On top of the hate mail... I've received some rather negative comments, in a variety of forums and comment threads where my findings have been published by other users. Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.
> 
> The usual partisan drivel.
> 
> For once, it would appear, that AMD weren't exaggerating the performance of their GPU. Fiji on DirectX 11 is terrible. Fiji on DirectX 12 will be quite amazing.


Did you expect anything less.?... Lol
I have only come across half intelligent ( if that) responses by people who refute your analysis. Mind you, I don't follow the forums much, and have very limited understanding (read none...lol) of the engineering and software side of the GPU universe, but I do get the business angle (or at least I have convinced myself that I do...







)

I knew it would be just a matter of time before the machinery on the internet starts turning again, after seeng some your posts.







.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> The CPU frame rate, also available in Ashes of the Singularity, shows that both Fiji, and Maxwell 2, are bottlenecking the CPU and not the other way around.


Which chart are you talking about?
Quote:


> If you start reading this thread at around page 41, you'll see I wrote up a great detailed analysis about this issue. Using logical deduction I was able to discern that the Fury-X is limited, most likely, by its ability to draw triangles (many many units on the screen) whereas the GTX 980 Ti is limited by its Serial nature (for processing Asynchronous shaders). It took a lot of work in order for me to deduce these conclusions. I think it is worth the read.
> Fiji and Hawaii are near equal because they both have the same Gtris/s rate. Both have 4 Rasterizers clocked at the same frequency. Both are limited by their ability to draw all those triangles (polygons) on the screen in Ashes of the Singularity. the GTX 980 Ti (and GTX 980) are both limited by their Serial nature in processing Asynchronous Shaders. While the GCN 1.1 (290 series) and GCN 1.2 architectures incorporate 8 Asynchronous Compute Engines (each working independently from one another and thus best able to work in tandem with a Multi-Core CPU) the Maxwell 2 parts use a single unit (Grid Management Unit) for this task. What's more is that the Maxwell 2 can only prioritize 1 Graphics and 31 Compute threads per clock while also exhibiting a higher degree of latency (around 20ms more per clock). The GCN parts have 8 Asynchronous shaders able to queue up 8 threads each for a total of 64 Compute as well as 1 Graphics by the Graphics Command Processor. The ACEs in GCN are also able to talk directly to the Compute Units. No middle man (which Maxwell 2 has in the form of a Work distributor which is what causes the approx 20ms more in latency).


Assuming it were true, how would you explain the 980 Ti's relatively small lead over the 970 in the benchmark? In purely GPU-limited scenarios, the stock 980 Ti demonstrates a ~50% improvement over the 970 from the same site's (computerbase.de) own testing as opposed to just ~30% in the DX12 benchmark. The 980 Ti has no such strange limitations versus the 970 as a Fury X might have relative to a 390 since GM200 is basically GM204 with more of everything. That right there suggests a CPU limitation, as well as the Fury X's result being a near-perfect match to the 980 Ti. Additionally, if it were true, then it's a conclusion rendering Fiji in a very tough position as well as every GCN chip other than Hawaii and Tonga (since every other one is limited to 2 ACEs).

Quote:


> In DirectX 12 games which do not tax the rasterizers, but make heavy use of compute, we should see Fiji flying past Maxwell 2 and Hawaii.


Based on what? Certainly none of these benchmarks nor actual compute tasks (where Fiji and GM200 trade many blows with each having areas they excel over the other).

Quote:


> For once, it would appear, that AMD weren't exaggerating the performance of their GPU. Fiji on DirectX 11 is terrible. Fiji on DirectX 12 will be quite amazing.


Again, based on what? Not even these DX12 benchmarks show anything of the sort.


----------



## Mahigan

Quote:


> Originally Posted by *p4inkill3r*
> 
> I think you're going to fit in well around here.


It seems to be a far less "insane" forum than say Anandtech or the worst one so far, Kitguru. That's for sure









Quote:


> Originally Posted by *ZealotKi11er*
> 
> Nvidia does more right then wrong. Their market dominance shows.


I think people don't care about morality as much as they do their own personal interests. Ayn Rand was probable correct on at least one thing.
Quote:


> Originally Posted by *semitope*
> 
> some of those people think nvidia can do no wrong. they have so much money and so much resources that everything they do must be right. If something is wrong, its with the game, the api, or anyone who claims something is wrong.


That's the usual sentiment I've received. Everything is blamed except the Corporation whose hardware resides in the individual's own PC.
Quote:


> Originally Posted by *Casey Ryback*
> 
> Welcome to the internet, it's been controlled by Nvidia for some time now.
> 
> And they call you a fanboi lol, what about the herd mentality on the web..............


It has gotten far worse than it was back in the 3Dfx Voodoo days that's for sure. Before I went on a hiatus, back in late 2013 approximately, I had begun to notice the uplift in this sort of behavior. I must admit that in my younger years... I was quite the 3Dfx fan... but I was never as bad as the "F u mother f'er (insert preferred Corporation) rules!". We at least argued over the details... like Fill Rate or Compute capabilities. Now it's like I'm playing Xbox and having someone claiming he "f'd my mother and tea bagged her". Maybe I'm too old for this LOL


----------



## Xuper

Quote:



> Originally Posted by *Mahigan*
> 
> On top of the hate mail... I've received some rather negative comments, in a variety of forums and comment threads where my findings have been published by other users. Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.
> 
> The usual partisan drivel.
> 
> For once, it would appear, that AMD weren't exaggerating the performance of their GPU. Fiji on DirectX 11 is terrible. Fiji on DirectX 12 will be quite amazing.


Like this guy : Themisseble.He has no idea what he's talking about.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> Which chart are you talking about?
> Assuming it were true, how would you explain the 980 Ti's relatively small lead over the 970 in the benchmark? In purely GPU-limited scenarios, the stock 980 Ti demonstrates a ~50% improvement over the 970 from the same site's (computerbase.de) own testing as opposed to just ~30% in the DX12 benchmark. The 980 Ti has no such strange limitations versus the 970 as a Fury X might have relative to a 390 since GM200 is basically GM204 with more of everything. That right there suggests a CPU limitation, as well as the Fury X's result being a near-perfect match to the 980 Ti. Additionally, if it were true, then it's a conclusion rendering Fiji in a very tough position as well as every GCN chip other than Hawaii and Tonga (since every other one is limited to 2 ACEs).
> Based on what? Certainly none of these benchmarks nor actual compute tasks (where Fiji and GM200 trade many blows with each having areas they excel over the other).
> Again, based on what? Not even these DX12 benchmarks show anything of the sort.


ArsTechnica
CPU Framerate (it's in the graph):


Extreme Tech:
Quote:


> Ashes of the Singularity also includes a CPU benchmark that can be used to simulate an infinitely fast GPU - useful for measuring how GPU-bound any given segment of the game actually is.


The author, Joel Hruska, in the comment section of his article, on Extreme Tech, on CPU Bottle neck:
Quote:


> Joel Hruska:
> No. Let me explain.
> 
> You simulate an unlimited GPU to detect what frame rate the CPU could drive if the GPU was infinitely fast.
> 
> Let's say that using DX12, the game returns a frame rate of 48 FPS, but the infinite GPU test shows that the game would run at 98 FPS on a Haswell-E. I've seen figures similar to this in my own testing.
> 
> That means the GPU is the bottleneck. The CPU could be pushing almost 2x faster if the GPU was faster.


Quote:


> Joel Hruska:
> Ashes in Dx12 is GPU-bound. By simulating an infinitely fast GPU we can measure CPU perf.


Quote:


> Joel Hruska:
> The game is 99.9% GPU-bound at these levels when using Haswell-E. CPU bottlenecks are not the problem in DX12.


Quote:


> Joel Hruska:
> I just realized why you may not be getting this. Let me explain how the CPU performance figures in the DX12 test work.
> 
> The game reports an average frame rate of, say, 48 FPS. It breaks that down into Normal, Medium, and Heavy batches. It also shows the frame rate if the CPU in question was equipped with an infinitely fast GPU.
> 
> Here's what that data looks like:
> 
> Average Frame Rate: 48 FPS
> Percent GPU Bound: 99.9%
> CPU frame rate: 98.5 FPS
> 
> When the reported CPU frame rate is 2x faster than the measured GPU frame rate, the game is GPU bound.


Quote:


> Joel Hruska:
> No, there isn't.
> 
> I took your questions and comments directly to Dan Baker at Oxide. The game is GPU-bound in DX12, not CPU-bound. Period.
> 
> You don't have to believe me, but I don't think the programmer and co-founder who built the engine is lying.


Quote:


> Joel Hruska:
> GamerK,
> 
> http://oxidegames.com/2015/08/...
> 
> "The first new number is the percent GPU bound. Under D3D12 it is possible with a high degree of accuracy to calculate whether we are GPU or CPU bound. For the technical savvy, what we are doing is tracking the status of the GPU fence to see if the GPU has completed the work before we are about to submit the next frame. If it hasn't, then the CPU must wait for the GPU. There will sometimes be a few frames for a run where the CPU isn't waiting on the GPU but the GPU is still mostly full. Therefore, generally if you see this number above 99%, it's within the margin of error."
> 
> Under DX12, both AMD and NV's highest-end cards were GPU-bound at 99%, not CPU-bound. I ran these figures past Dan Baker himself and had a conversation about what they meant. They're backed up by his blog.
> 
> If I pair the same cards with a Core i3, the game will become CPU-bound. with a Core i7-5960X, it isn't. Now, assume that there's a 10% margin for driver perf improvements and some further tuning on Oxide's end, and the end result is that AMD and NV actually end up in about the same place once you strip out the in-driver improvements NV typically relies on to boost performance and replace them with a closer-to-metal API.


I think that about answers it.

http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head/3

Now you ask me why the Fiji will surpass GTX 980 Ti in Compute performance (for games). Well that's simple. DirectX 12 allows your Multi-Core CPU to feed the Compute Units in Parallel (something compute applications do not do). Since we see a 290x keeping up with a GTX 980 Ti (The GTX 980 Ti has a higher theoretical Compute limit than the 290x but the GTX 980 Ti not being as efficiently used due to the aforementioned latency and Grid Management Unit bottleneck) under DirectX 12 then we can deduce that Fiji will be able to surpass the GTX 980 Ti in Asynchronous shading. How? Because complex compute tasks can be broken down into smaller, easier to execute, tasks thanks to the Asynchronous Compute Engines. That's their purpose. When you break down complex shaders into smaller parts then you're pretty much left with ALU throughput as your performance determining factor (when AMD talks about increasing compute efficiency through Asynchronous Shading... that's what they mean). If Fiji and Hawaii weren't bottle necked at the Rasterizer (peak Rasterization rate) you would see Fiji power beyond the GTX 980 Ti. The game is also not CPU bottlenecked. I have explained all of this in greater detail in previous posts. You really ought to take the time to read it all if you want to understand what is happening.



I don't think that lower end and older GCN chips will lack in efficiency. Yes, they're limited to 2 ACE's or 16 Compute Queues and 1 Graphics Queue per cycle. Given that they have less Compute units to feed, that shouldn't be a problem (and it isn't a problem on the XBox One which also uses 2 ACEs). That being said, they will perform much better than their intended competitors, over on the Green team, because their intended competitors, on the green team, are Kepler and Maxwell parts which don't even handle their Graphics and Compute tasks in Parallel at all.


(for the R9 390, it has 10% less ALUs than the R9 390x. If we assume 100% compute efficiency then we assume we can add 4.85FPS to the 48.5FPS figure for a total of 53.5FPS which is near the 55.4 FPS of the GTX 980 Ti.. this is what the ArsTechnica test shows.. a 290x/390x nearly matching and sometimes surpassing the GTX 980 Ti (if we add the clock rate deficit from R9 390 to R9 390x, 50MHz, we make up any difference). We can deduce that AMD GCN derives near perfect Compute scaling from using Asynchronous Shading). We can also deduce that Maxwell 2 does not.

Kepler (700 and 800 series) suffer greatly while Kepler 780/Titan and Maxwell 1 take two separate cycles in order to prioritize 1 Graphics or 32 Compute Queues. That means an incredible amount of latency.



Basically.. all GCN parts will get a *free* upgrade with DirectX 12. They'll live to game another day.


----------



## Mahigan

I think I've made a rather compelling case. I have yet to have someone demonstrate the contrary. Believe you me... many have tried so far. I've been called every name in the book. Rest assured I'm not trying to take a dump on your hardware. I just saw a problem and I wanted to explain it. That's it.


----------



## STEvil

I miss 3DFX.. lol.


----------



## mtcn77

Quote:


> Originally Posted by *Mahigan*
> 
> Ark: Survival Evolved, though it is only getting DirectX 12 through a patch. This patch was supposed to be out by mid-August. It is a little late.
> 
> Followed by
> 
> Deus Ex: Mankind Divided (Q1 2016)
> Fable Legends (Q4 2015)
> Gears of War: Ultimate Edition (Q4 2015)
> Sea of Thieves (Q1 2016)
> 
> To name a few.
> 
> Fable Legends will be an interesting title. It will allow for Cross-platform gameplay with XBox One players.
> 
> Fable Legends was built for the XBox One. Therefore in that title we're likely to see less of an emphasis on Asynchronous Shading by virtue of the Xbox One having only two ACEs (16 Queues) available. Maxwell 2 shouldn't suffer the sort of performance drops it does under Ashes of the Singularity under that title. Asynchronous Shading is being used under Fable Legends though. Each spell shoots out multiple light sources across the screen. A GTX 980 see's a healthy boost of over 10FPS when running DirectX 12 in this title (over DirectX 11). Going from 30 some FPS to around 40 some FPS.
> 
> Fable Legends developer comments:
> Deus Ex: Mankind Divided will be supporting Asynchronous Shaders as well as TressFX 3.0.
> 
> Frostbite 3: Asynchronous Shaders, throughout come 2016.
> 
> Batman: Arkham Knight will be re-released using DirectX 12
> 
> The Witcher 3 is getting DirectX 12 by a patch (increased view distance and with more detailed world)
> 
> Heck the more I read up on this the more I'm starting to see a pattern. DirectX 12 will be big news for GCN users.


Xbox One has 48 queues, not 16, according to its sdk manual.
[RedTechGaming]


----------



## mav451

I clicked through the SDK breakdown on RedGamingTech's link:
Quote:


> Another slight problem with the Xbox One's GPU (compared to say the Playstation 4's, or indeed a more modern PC GPU such as for example the R9 290) is that the total number of ACE's is lower. *While the two ACE's on the Xbox One can handle eight queues each*, the Playstation 4 (and modern desktop GPU's) support 8 ACEs. In the case of the PS4, this means that the GPU can handle a total number of 64 compute queues, which combined with the Level 2 Volatile Bit certainly gives the PS4 a bit of a helping hand in certain situations.


Now I did see the number 48 come up - when they are discussing _contexts_.
Quote:


> Although the system a_llows a maximum of 48 deferred contexts to exist at any one time_, in general you shouldn't create more than six deferred contexts at once, because that's how many cores you have in the game OS. Of course, it is up to you to precisely tailor your thread usage for maximum efficiency. For example, if one deferred-context thread is waiting for a direct memory access (DMA) operation, you can swap in another deferred context thread to use the otherwise-wasted CPU time on the same core. In this case, having more than one deferred context and deferred context threads per core prevents a CPU bubble."


I'm going to assume that contexts are not the same thing as queues


----------



## mtcn77

Quote:


> Originally Posted by *mav451*
> 
> I clicked through the SDK breakdown on RedGamingTech's link:
> Now I did see the number 48 come up - when they are discussing _contexts_.
> I'm going to assume that contexts are not the same thing as queues


You are welcome to do so, my friend.








Now, if we could have some embellishments for XBox One, too, that would be nice. I have planned out just on which shelf I am contemplating of putting it.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> I think I've made a rather compelling case. I have yet to have someone demonstrate the contrary. Believe you me... many have tried so far. I've been called every name in the book. Rest assured I'm not trying to take a dump on your hardware. I just saw a problem and I wanted to explain it. That's it.


Perhaps because there is only one dx12 benchmark out there and it is based on the game that is still the work in progress. It is too early to judge whether it is limitations of Maxwell hardware that is holding back their dx12 performance since there are not enough samples to judge on. When more dx12 games show up and if we see the result different from what we are seeing with Ashes of the Singularity, then your theory might become questionable. Yes, you did make a compelling case, but will only remain true if other dx12 games benchmarks resemble AotS results.

Ashes of the Singularity's DX11 and DX12 versions look identical. Your theory only becomes true if DX12 version of the game dramatically looks better or has far more complex scenes than DX11 counterpart and/or both Nvidia and AMD's DX12 performance is dramatically better than their DX11 performances and AMD still pushes ahead. AMD's improvement is expected since their DX11 driver already suffered from high CPU overhead which is eliminated with DX12, but I think something might be wrong with Maxwell results since DX11 version still outperforms DX12 pushing identical scenes with same level of complexity. Maxwell also benefits from dramatically increased draw-call throughput, so I don't think their dx12 performance should be lower than dx11 if there is nothing wrong with the code and the driver.
I think current state of AotS dx12 performance on Nvidia GPU is not the hardware-limitation related.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Mahigan*
> 
> On top of the hate mail... I've received some rather negative comments, in a variety of forums and comment threads where my findings have been published by other users. Some people questioning if I work for AMD or if I'm bought while others are simply brushing off what I've demonstrated and calling me a fanboi.
> 
> The usual partisan drivel.
> 
> For once, it would appear, that AMD weren't exaggerating the performance of their GPU. Fiji on DirectX 11 is terrible. Fiji on DirectX 12 will be quite amazing.


I don't understand the reason behind all the hate. I'm going to send some rep your way. Your post were well thought out and you took the time to answer the concerns of each person.


----------



## Dudewitbow

Quote:


> Originally Posted by *CrazyHeaven*
> 
> I don't understand the reason behind all the hate. I'm going to send some rep your way. Your post were well thought out and you took the time to answer the concerns of each person.


its relatively common human mentality to defend their life choices, of those, only a handful actually accept some facts and intake it as knowledge. Some of the others will attack regardless if they are right or wrong(this is the bad side of the mentality). "Fanboyism" of any sort can be categorized in such a way. There's those who prefer a produce but knows and accepts their shortcomings at times(e.g accepting that AMD has weak tessellation performance, accepting that Nvidia has lower asyncronous performance in this case), there's the other who confirms their life choice by attacking the other choice, whether right or wrong, just to make their selves feel better because of a choice they made previously.


----------



## Themisseble

Fact:
DX12 shows that IPC/ST performance is not more more important than multicore performance.

- Does AotS really support async shaders?


----------



## Lantian

how will funny will it be to watch all of you so called amd experts and gpu engineers (maxwell a serial uarch lol), how many times will you guys get burned all on your own, this is a single example of what is probably a broken alpha benchmark, have you all forgotten their previous one same exact story, besides, some of you think they have suddenly become gpu engineers, perhaps one of you can explain why Thief didn't see any huge gains on mantle? or why amd themselves stated that anything above 8 ace is overkill and not needed, until there is any real proof or even untill the benchmarks comes out of alpha, even better a real benchmarks comes out from a known company like 3dmark and see them until that happens everything in this thread can be considered mild speculation at best, people please start treating it as such


----------



## Casey Ryback

Quote:


> Originally Posted by *Mahigan*
> 
> everything in this thread can be considered mild speculation at best, people please start treating it as such


I don't think anyone has jumped to any conclusions as you are implying, I could've missed those posts though.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> I think I've made a rather compelling case. I have yet to have someone demonstrate the contrary. Believe you me... many have tried so far. I've been called every name in the book. Rest assured I'm not trying to take a dump on your hardware. I just saw a problem and I wanted to explain it. That's it.


That's disgusting to be called names over it and you have my sympathies. However, please note I have not and will not insult you for this nor do I care about what hardware I'm currently using (I go through many, constantly). I'm simply discussing and want nothing but a friendly discussion.









You have made a compelling case; I'm just saying there are a couple holes in it, not the least of which is a lack of consistency across different reviews.
Quote:


> Originally Posted by *Mahigan*
> 
> ArsTechnica
> CPU Framerate (it's in the graph):
> 
> 
> Extreme Tech:
> The author, Joel Hruska, in the comment section of his article, on Extreme Tech, on CPU Bottle neck:
> 
> I think that about answers it.
> 
> http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head/3


Thank you. However, I was not referring to the Arstechnica nor the Extreme Tech reviews regarding the CPU, but rather the computerbase.de review. Despite what the Extreme Tech review stated, the 980 Ti's relatively small lead versus the 970 in the computerbase.de review is suspect:



The DX12 results for the 980 Ti are 55.4 FPS whereas the 970's DX12 results are at 42.2. 55.4/42.2 = 31%. In contrast, the same site's own 980 Ti review showed presumably the same 980 Ti beating presumably the same 970 in GPU-limited scenarios by ~50% across many games.

That disparity is why I question the possibility of a CPU limitation; it's completely uncharacteristic of the 980 Ti (which, architecturally, has no relative weakspots versus a 970; but improves everything from geometry calculations, to cache, to shaders, to bandwidth, to ROP throughput, etc. proportionally; all with the exception of a slight reference clock speed deficit). Something's fishy there. Computerbase.de's 980 Ti review:

http://www.computerbase.de/2015-06/geforce-gtx-980-ti-test-nvidia-titan/

Quote:


> snip (compute and ACE stuff)


It's not that I doubt the ACEs are there and relevant on some level, however the only chip exhibiting any exceptional gains from it seems to be Hawaii and even then only from two short reviews for this one benchmark (one of which I've made my reservations about very clear). The 280X in the computerbase.de chart barely exceeds the 770 by even 11% in DX12, which is an obvious win but not even all that much considering Kepler's recent fall from grace (in post-Maxwell DX11 games) and that GK104 and Tahiti were once close competitors.

Especially not since Kepler is supposed to be woefully inadequate in this area versus even Maxwell 2. Similarly, the Fury X exhibits no such advantage over the 980 Ti (regardless of hypothetical reasons, it simply doesn't) and the 370 barely shows much improvement at all in that DX12 benchmark. Also, there is one other game (as per AMD) that has utilized asynchronous shaders with a low-level API:



Which would be Thief 4, and the 290X doesn't even demonstrate more than a 2% advantage at its most GPU-intensive against even a 780 Ti, let alone the 980 Ti:



Additionally, the pcper review shows:





...a clear influence of the CPU on these results with just a 390X and a 980 (even between powerful CPUs such as Haswell-E and Skylake) as well as a slight advantage for the 390X over the 980, but nothing major enough to get close to a 980 Ti, especially considering the 390X has both 10% more shaders and 5% higher "stock" clock speeds (unclear due to lack of reference models) than the 390 in that computerbase.de review.

I agree that what you're saying about GCN seems logical, however I'm simply making it clear why I don't believe it should be considered conclusive as a lot of it is still conjecture (about microarchitectural facts mind you, but without much actual context of how it translates to the real world) and there is a lot of non-quantifiable and conflicting data about it.


----------



## SpeedyVT

Quote:


> Originally Posted by *Serandur*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> That's disgusting to be called names over it and you have my sympathies. However, please note I have not and will not insult you for this nor do I care about what hardware I'm currently using (I go through many, constantly). I'm simply discussing and want nothing but a friendly discussion.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You have made a compelling case; I'm just saying there are a couple holes in it, not the least of which is a lack of consistency across different reviews.
> Thank you. However, I was not referring to the Arstechnica nor the Extreme Tech reviews regarding the CPU, but rather the computerbase.de review. Despite what the Extreme Tech review stated, the 980 Ti's relatively small lead versus the 970 in the computerbase.de review is suspect:
> 
> 
> 
> The DX12 results for the 980 Ti are 55.4 FPS whereas the 970's DX12 results are at 42.2. 55.4/42.2 = 31%. In contrast, the same site's own 980 Ti review showed presumably the same 980 Ti beating presumably the same 970 in GPU-limited scenarios by ~50% across many games. That disparity is why I question the possibility of a CPU limitation; it's completely uncharacteristic of the 980 Ti (which, architecturally, has no relative weakspots versus a 970; but improves everything from geometry calculations, to cache, to shaders, to bandwidth, to ROP throughput, etc. proportionally; all with the exception of a slight reference clock speed deficit). Something's fishy there. Computerbase.de's 980 Ti review:
> 
> http://www.computerbase.de/2015-06/geforce-gtx-980-ti-test-nvidia-titan/
> It's not that I doubt the ACEs are there and relevant on some level, however the only chip exhibiting any exceptional gains from it seems to be Hawaii and even then only from two short reviews for this one benchmark (one of which I've made my reservations about very clear). The 280X in the computerbase.de chart barely exceeds the 770 by even 11% in DX12, which is an obvious win but not even all that much considering Kepler's recent fall from grace (in post-Maxwell DX11 games) and that GK104 and Tahiti were once close competitors. Especially not since Kepler is supposed to be woefully inadequate in this area versus even Maxwell 2. Similarly, the Fury X exhibits no such advantage over the 980 Ti (regardless of hypothetical reasons, it simply doesn't) and the 370 barely shows much improvement at all in that DX12 benchmark. Also, there is one other game (as per AMD) that has utilized asynchronous shaders with a low-level API:
> 
> 
> 
> Which would be Thief 4, and it doesn't even demonstrate more than a 2% advantage at its most GPU-intensive against even a 780 Ti, let alone the 980 Ti:
> 
> 
> 
> I agree that what you're saying about GCN seems logical, however I'm simply making it clear why I don't believe it should be considered conclusive as a lot of it is still conjecture (about microarchitectural facts mind you, but without much actual context of how it translates to the real world) and there is a lot of unquantifiable and conflicting data about it.


It's a stress issue. NVidia is going to top out on the frames but bottoms up on the stress tests that utilize numerous amounts of shader ques. 290x can handle up to 64 where as 980 ti can only handle 31. This is why the 280x under performs the 970 in benchmark performance in DX12. You'd think because an AMDerp wants to tell you GCN is better it'll perform better or an NVidiot will tell that Maxwell is the best. Practically all lies. They are all engines built differently. You don't stick a truck engine in a Ferrari. NVidia is a speed demon, but horrible with wide loads and that's where the truck engine (AMD) comes in.


----------



## mtcn77

Quote:


> Originally Posted by *Serandur*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> That's disgusting to be called names over it and you have my sympathies. However, please note I have not and will not insult you for this nor do I care about what hardware I'm currently using (I go through many, constantly). I'm simply discussing and want nothing but a friendly discussion.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You have made a compelling case; I'm just saying there are a couple holes in it, not the least of which is a lack of consistency across different reviews.
> Thank you. However, I was not referring to the Arstechnica nor the Extreme Tech reviews regarding the CPU, but rather the computerbase.de review. Despite what the Extreme Tech review stated, the 980 Ti's relatively small lead versus the 970 in the computerbase.de review is suspect:
> 
> 
> 
> The DX12 results for the 980 Ti are 55.4 FPS whereas the 970's DX12 results are at 42.2. 55.4/42.2 = 31%. In contrast, the same site's own 980 Ti review showed presumably the same 980 Ti beating presumably the same 970 in GPU-limited scenarios by ~50% across many games.
> 
> That disparity is why I question the possibility of a CPU limitation; it's completely uncharacteristic of the 980 Ti (which, architecturally, has no relative weakspots versus a 970; but improves everything from geometry calculations, to cache, to shaders, to bandwidth, to ROP throughput, etc. proportionally; all with the exception of a slight reference clock speed deficit). Something's fishy there. Computerbase.de's 980 Ti review:
> 
> http://www.computerbase.de/2015-06/geforce-gtx-980-ti-test-nvidia-titan/
> It's not that I doubt the ACEs are there and relevant on some level, however the only chip exhibiting any exceptional gains from it seems to be Hawaii and even then only from two short reviews for this one benchmark (one of which I've made my reservations about very clear). The 280X in the computerbase.de chart barely exceeds the 770 by even 11% in DX12, which is an obvious win but not even all that much considering Kepler's recent fall from grace (in post-Maxwell DX11 games) and that GK104 and Tahiti were once close competitors.
> 
> Especially not since Kepler is supposed to be woefully inadequate in this area versus even Maxwell 2. Similarly, the Fury X exhibits no such advantage over the 980 Ti (regardless of hypothetical reasons, it simply doesn't) and the 370 barely shows much improvement at all in that DX12 benchmark. Also, there is one other game (as per AMD) that has utilized asynchronous shaders with a low-level API:
> 
> 
> 
> Which would be Thief 4, and it doesn't even demonstrate more than a 2% advantage at its most GPU-intensive against even a 780 Ti, let alone the 980 Ti:
> 
> 
> 
> Additionally, the pcper review shows:
> 
> 
> 
> 
> 
> ...a clear influence of the CPU on these results with just a 390X and a 980 (even between powerful CPUs such as Haswell-E and Skylake) as well as a slight advantage for the 390X over the 980, but nothing major enough to get close to a 980 Ti, especially considering the 390X has both 10% more shaders and 5% higher "stock" clock speeds (unclear due to lack of reference models) than the 390 in that computerbase.de review.
> 
> I agree that what you're saying about GCN seems logical, however I'm simply making it clear why I don't believe it should be considered conclusive as a lot of it is still conjecture (about microarchitectural facts mind you, but without much actual context of how it translates to the real world) and there is a lot of non-quantifiable and conflicting data about it.


The implied reason for these discrepancies - considering the benchmark is working - is one architecture is scaling better than the other. I've looked over your post and I understand your frustration and you are right: these results are indeed _flawed_ in the sense you wouldn't _try that_ in a Directx 11 game. Notice, there has never been any former Mantle game that tried what this benchmark is doing. Prior to that, Mantle showcases always competed against Directx 11, in deliberate consideration of Directx 11's capabilities and inherent limits. Mantle used the "same" shaders and there is no clear case for Mantle, if you don't exploit it - it won't help you serialize work faster, only that it will increase over all throughput if work is parallelised. So, one might still be working faster, albeit serially and thus fail to complete comparably fast.
What Directx 11 can do is linear next to 12. Work is being done "as fast as possible", but if you harness Directx 12, it scales exponentially, just as much as it wreaks havoc with a less parallelised routine execution.


----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> Now the problem is you're trying to derive CPU performance conclusions based on the use of nVIDIA GPU serial hardware under either a Serial API, with driver interventions, vs its performance under a Parallel API, absent driver interventions. Evidently the conclusions drawn will be false. The reason being is that you're not taking into account the nVIDIA driver interventions being made under DirectX 11. Basically... nVIDIAs GPU is not processing the same shaders under DirectX 11 that it does under DirectX 12. You would need to look at the AMD GPUs in order to derive any conclusions as to the parallel performance and look at nVIDIA GPUs in order to derive conclusions as to the serial performance. Since both AMD GPUs and nVIDIA GPUs are like comparing apples to oranges... this wouldn't work.
> 
> There's a way around this. The Ashes to the Singularity benchmark comes with a CPU Frame rate meter. You could compare the CPU Framerate, using the same GPU and CPU, with various Cores disabled as well as with Hyperthreading enabled and disabled.


I said a long time Nvidia sold you old tech with the Titan x and 980ti.
However its not to late to have a 390 or a fury x for those guys getting sided with dx12.
Quote:


> Originally Posted by *OneB1t*
> 
> its all GPU bound under DX12 nothing interesting about CPU performance with these results


Thats why we recive Dx12 removes cpu limits
Quote:


> Originally Posted by *Mahigan*
> 
> And with less room for driver intervention in DirectX 12, improvements will be far trickier than they were in the past.


I foresee Nvidia games will put a limit to the other guys hardware by shady nature, they done so before and will do again.
No moral and ethics.
Quote:


> Originally Posted by *Mahigan*
> 
> If anyone wants to use this information for whatever reason... I've since revised my thoughts and I believe these to be final accurate assessment..
> 
> Anandtech also had barely no information on Maxwell's capabilities. Ryan Smith, the Graphics author over at Anandtech, assumed that Maxwell's queues were its dedicated compute units. Therefore Anandtech published that Maxwell 2 had a total of 32 Compute Units. This information is false.
> 
> A few obvious facts come to light. AMDs implementation incurs less latency as well as having the ability of making more efficient use of the available Compute resources.
> 
> This explains why Maxwell 2 (GTX 980 Ti) performs so poorly under Ashes of the Singularity under DirectX 12 when compared to a lowly R9 290x. Asynchronous Shading kills its performance compared to GCN 1.1 (290 series)/GCN 1.2. The latter's performance is barely impacted.
> 
> .


Thats the issue with hardware sites, they dont know enough.
The danger are they are settings trends people believe without the facts right.

Quote:


> Originally Posted by *Mahigan*
> 
> Now if you're looking to game at 4K, on titles available today in DirectX 11, then the GTX 980 Ti is the card to get. There is no doubt about that. If not, then don't bother.
> 
> .


Fury X offers same gaming performance at 4k as a 980ti.
as a user you wont notice difference.
Quote:


> Originally Posted by *Mahigan*
> 
> You should see the hate mail I've received


Truth hurts people that lives in a cult.
Nvidia sold them old technology with the 980ti and Maxwell no denying that.
not to late buying the future AMD card for them.


----------



## Cyro999

Quote:


> Nvidia sold them old technology with the 980ti and Maxwell no denying that.


Enjoying the 2012 GCN?







Or is GCN 1.0/1.1 and Bulldozer still better than everything else?


----------



## loveuguys

But guys, here is something wrong with this PR scores:


Because this:


Sub 20M with simple 980....

Any explanation?


----------



## Cyro999

Shouldn't trust any numbers from AMD or Nvidia, but that test in particular could scale with CPU performance


----------



## loveuguys

I think that in the first pic it was used a 5820K or 5960X,
but the second one is mine on stock since xmas 2013, I only upgraded from 290 to 980


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> That's disgusting to be called names over it and you have my sympathies. However, please note I have not and will not insult you for this nor do I care about what hardware I'm currently using (I go through many, constantly). I'm simply discussing and want nothing but a friendly discussion.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You have made a compelling case; I'm just saying there are a couple holes in it, not the least of which is a lack of consistency across different reviews.
> Thank you. However, I was not referring to the Arstechnica nor the Extreme Tech reviews regarding the CPU, but rather the computerbase.de review. Despite what the Extreme Tech review stated, the 980 Ti's relatively small lead versus the 970 in the computerbase.de review is suspect:
> 
> 
> 
> The DX12 results for the 980 Ti are 55.4 FPS whereas the 970's DX12 results are at 42.2. 55.4/42.2 = 31%. In contrast, the same site's own 980 Ti review showed presumably the same 980 Ti beating presumably the same 970 in GPU-limited scenarios by ~50% across many games.
> 
> That disparity is why I question the possibility of a CPU limitation; it's completely uncharacteristic of the 980 Ti (which, architecturally, has no relative weakspots versus a 970; but improves everything from geometry calculations, to cache, to shaders, to bandwidth, to ROP throughput, etc. proportionally; all with the exception of a slight reference clock speed deficit). Something's fishy there. Computerbase.de's 980 Ti review:
> 
> http://www.computerbase.de/2015-06/geforce-gtx-980-ti-test-nvidia-titan/
> It's not that I doubt the ACEs are there and relevant on some level, however the only chip exhibiting any exceptional gains from it seems to be Hawaii and even then only from two short reviews for this one benchmark (one of which I've made my reservations about very clear). The 280X in the computerbase.de chart barely exceeds the 770 by even 11% in DX12, which is an obvious win but not even all that much considering Kepler's recent fall from grace (in post-Maxwell DX11 games) and that GK104 and Tahiti were once close competitors.
> 
> Especially not since Kepler is supposed to be woefully inadequate in this area versus even Maxwell 2. Similarly, the Fury X exhibits no such advantage over the 980 Ti (regardless of hypothetical reasons, it simply doesn't) and the 370 barely shows much improvement at all in that DX12 benchmark. Also, there is one other game (as per AMD) that has utilized asynchronous shaders with a low-level API:
> 
> 
> 
> Which would be Thief 4, and the 290X doesn't even demonstrate more than a 2% advantage at its most GPU-intensive against even a 780 Ti, let alone the 980 Ti:
> 
> 
> 
> Additionally, the pcper review shows:
> 
> 
> 
> 
> 
> ...a clear influence of the CPU on these results with just a 390X and a 980 (even between powerful CPUs such as Haswell-E and Skylake) as well as a slight advantage for the 390X over the 980, but nothing major enough to get close to a 980 Ti, especially considering the 390X has both 10% more shaders and 5% higher "stock" clock speeds (unclear due to lack of reference models) than the 390 in that computerbase.de review.
> 
> I agree that what you're saying about GCN seems logical, however I'm simply making it clear why I don't believe it should be considered conclusive as a lot of it is still conjecture (about microarchitectural facts mind you, but without much actual context of how it translates to the real world) and there is a lot of non-quantifiable and conflicting data about it.


What we're looking at isn't the CPU. I've demonstrated that. It doesn't matter which reviews we're looking at. Unless the reviewer is using an i3 4330/FX 6300/FX 8350 etc you're not going to see the CPU become the limitation. The GPU is hitting its Compute limits. This is what Ashes of the Singularity taxes the most (with the exception of the Fiji cards).

You also can't use DirectX 11 benchmarks in order to try and figure out what is happening under DirectX 12. The two cannot be compared. This is because DirectX 11 is open to driver interventions (shaders replacements for example) and is a serial API. Therefore looking at a 50% performance difference between a GTX 970 and GTX 980 Ti under DirectX 11 and wondering why this difference doesn't carry over to DirectX 12 is most illogical.

If you make the assumption for a Compute bottleneck, all these figures make sense.

Take a GTX 970, its theoretical Tflops (vary by model from 3.9 Tflops to 4.2 Tflops) since Computerbase.de used a Strix model (the model they reviewed not long ago) I will use the 4.2 Tflops figure. For the GTX 980 Ti I will use the 6.2 TFlops figure (of course any overclocks on either changes this figure and would affect the benchmark results). The difference between 4.2 Tflops and 6.2Tflops is around 32%. That's the difference you're seeing in the benchmark. Asynchronous Shading taxes the Compute capabilities of a GPU but allows for an incredible amount of Post Processing effects to be displayed onto the screen.

Either way we have our 32% difference figure and are thus able to deduce that the performance figure difference you mention can be explained by my theory. Once again my theory holds up. If a theory holds up, no matter what you throw at it, it is the truth. It goes from a hypothesis into the realm of a scientific fact. I've already had other people question the CPU. I have already demonstrated to them that the CPU is not the bottleneck. Thus far no opposing theory holds weight.

The same logic can be applied to why an R9 290x can keep up with a GTX 980 Ti. It's all about their theoretical compute capabilities. Their ALU throughput in other words. Ashes of the Singularity is tied to the compute capabilities of an architecture, for the most part, on the high end of the GPU market (a GTX 960 is crippled by its 128-bit memory interface for example). Ashes of the Singularity is also tied to the Peak Rasterization rate (many individual units made up of many triangles permeate the screen). This is what limits Fiji. Fiji has the same Peak Rasterization rate as an R9 290x. In an RTS game of this magnitude, the Fiji architecture is bottlenecked on this front.

A good example of the Nitrous engine, absent Asynchronous Shading, is Star Swarm. If we look at Star Swarm we see that a GTX 980 should be much faster than an R9 290x:


Therefore we can most likely assume that if the Ashes of the Singularity benchmark made no use of Post Processing Effects (using Asynchronous Shading) then the GTX 980 Ti, like the GTX 980, would handily defeat the R9 290x. This is due to the GTX 980 Ti's far better Rasterization rate. It can both handle the Draw Calls as well as Polygon count demanded. Where the GTX 980 Ti lacks power, over the 290x, is in compute. The two are capable of nearly the same degree of performance. (I've already explained why Asynchronous Shading is near 100% efficient on GCN but less efficient on Maxwell 2 in my previous post).

Based on those two facts, we can deduce several conclusions. Conclusions which are demonstrable through the Ashes of the Singularity benchmark results.

This does not mean that DirectX 12 will show these results throughout every single title. Of course not. Because every title will make use of Asynchronous Shading to varying degrees. Ashes of the Singularity makes heavy use of Asynchronous Shading (many units each emitting their own light sources).

This is why titles which use Asynchronous Shading, but to a lesser degree such as Fable Legends, will most likely perform better on a GTX 980 Ti than they do an R9 290x. That being said, titles like Fable Legends don't require the amount of triangle throughput that Ashes of the Singularity requires. Thus we can assume that Fiji will surpass the GTX 980 Ti in that title. Only time will tell but I feel quite confident in this statement.


----------



## OneB1t

dont think that 290x is limited with triangles see this







1.1mil triangles (like 3-4 times what you can see in benchmark)
http://postimg.org/image/fg2uglmiv/full/

this is entire map full of units


----------



## Themisseble

Quote:


> Originally Posted by *loveuguys*
> 
> But guys, here is something wrong with this PR scores:
> 
> 
> Because this:
> 
> 
> Sub 20M with simple 980....
> 
> Any explanation?


actually overclock

Stock i5 scores around 11M when you OC its around 16M. I think that with OC-ed 8 core on mantle you should get near 28M.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> What we're looking at isn't the CPU. I've demonstrated that. It doesn't matter which reviews we're looking at. Unless the reviewer is using an i3 4330/FX 6300/FX 8350 etc you're not going to see the CPU become the limitation. The GPU is hitting its Compute limits. This is what Ashes of the Singularity taxes the most (with the exception of the Fiji cards).
> 
> You also can't use DirectX 11 benchmarks in order to try and figure out what is happening under DirectX 12. The two cannot be compared. This is because DirectX 11 is open to driver interventions (shaders replacements for example) and is a serial API. Therefore looking at a 50% performance difference between a GTX 970 and GTX 980 Ti under DirectX 11 and wondering why this difference doesn't carry over to DirectX 12 is most illogical.
> 
> If you make the assumption for a Compute bottleneck, all these figures make sense.
> 
> Take a GTX 970, its theoretical Tflops (vary by model from 3.9 Tflops to 4.2 Tflops) since Computerbase.de used a Strix model (the model they reviewed not long ago) I will use the 4.2 Tflops figure. For the GTX 980 Ti I will use the 6.2 TFlops figure (of course any overclocks on either changes this figure and would affect the benchmark results). The difference between 4.2 Tflops and 6.2Tflops is around 32%. That's the difference you're seeing in the benchmark. Asynchronous Shading taxes the Compute capabilities of a GPU but allows for an incredible amount of Post Processing effects to be displayed onto the screen.
> 
> Either way we have our 32% difference figure and are thus able to deduce that the performance figure difference you mention can be explained by my theory. Once again my theory holds up. If a theory holds up, no matter what you throw at it, it is the truth. It goes from a hypothesis into the realm of a scientific fact. I've already had other people question the CPU. I have already demonstrated to them that the CPU is not the bottleneck. Thus far no opposing theory holds weight.
> 
> The same logic can be applied to why an R9 290x can keep up with a GTX 980 Ti. It's all about their theoretical compute capabilities. Their ALU throughput in other words. Ashes of the Singularity is tied to the compute capabilities of an architecture, for the most part, on the high end of the GPU market (a GTX 960 is crippled by its 128-bit memory interface for example). Ashes of the Singularity is also tied to the Peak Rasterization rate (many individual units made up of many triangles permeate the screen). This is what limits Fiji. Fiji has the same Peak Rasterization rate as an R9 290x. In an RTS game of this magnitude, the Fiji architecture is bottlenecked on this front.
> 
> A good example of the Nitrous engine, absent Asynchronous Shading, is Star Swarm. If we look at Star Swarm we see that a GTX 980 should be much faster than an R9 290x:
> 
> 
> Therefore we can most likely assume that if the Ashes of the Singularity benchmark made no use of Post Processing Effects (using Asynchronous Shading) then the GTX 980 Ti, like the GTX 980, would handily defeat the R9 290x. This is due to the GTX 980 Ti's far better Rasterization rate. It can both handle the Draw Calls as well as Polygon count demanded. Where the GTX 980 Ti lacks power, over the 290x, is in compute. The two are capable of nearly the same degree of performance. (I've already explained why Asynchronous Shading is near 100% efficient on GCN but less efficient on Maxwell 2 in my previous post).
> 
> Based on those two facts, we can deduce several conclusions. Conclusions which are demonstrable through the Ashes of the Singularity benchmark results.
> 
> This does not mean that DirectX 12 will show these results throughout every single title. Of course not. Because every title will make use of Asynchronous Shading to varying degrees. Ashes of the Singularity makes heavy use of Asynchronous Shading (many units each emitting their own light sources).
> 
> This is why titles which use Asynchronous Shading, but to a lesser degree such as Fable Legends, will most likely perform better on a GTX 980 Ti than they do an R9 290x. That being said, titles like Fable Legends don't require the amount of triangle throughput that Ashes of the Singularity requires. Thus we can assume that Fiji will surpass the GTX 980 Ti in that title. Only time will tell but I feel quite confident in this statement.


I am interested in your theory of "driver intervention". Can you please elaborate how this may impact a multi tiered SKU strategy with otherwise little differentiation between the skus? Also, the Nvidia cards prior to Maxwell 2 (Gk110 specifically) were not compute crippled, so wouldn't we see those cards show a meaningful leap under Dx12, that is assuming Nvidia even bothered to optimize pre Maxwell 2 cards to leverage their compute capability for a DX12 API? I can't see them disrupting their priced to performance sku strategy by optimizing Kepler over Maxwell, even if Gk110's compute is superior to Maxwell 2. May be I am confused about how Nvidia differentiates/manages performance through driver intervention, if indeed it does?


----------



## ToTheSun!

Quote:


> Originally Posted by *provost*
> 
> Also, the Nvidia cards prior to Maxwell 2 (Gk110 specifically) were not compute crippled, so wouldn't we see those cards show a meaningful leap under Dx12?


Well, according to his theory, they're still serial in nature, which is sub-optimal for AotS DX12.


----------



## Mahigan

Quote:


> Originally Posted by *OneB1t*
> 
> dont think that 290x is limited with triangles see this
> 
> 
> 
> 
> 
> 
> 
> 1.1mil triangles (like 3-4 times what you can see in benchmark)
> http://postimg.org/image/fg2uglmiv/full/
> 
> this is entire map full of units


It's not just the amount of units but also the complexity of each unit. There are over 16,000 (nearly 17,000 in this shot) units on the screen in Ashes of the Singularity. Units, which are not on your view-able screen are still rendered:


Each unit is rather complex. When you zoom in, you see this:


----------



## anubis44

Thank you very, VERY much, Mahigan, for this detailed and intelligently presented explanation of compute (asynchronous shaders) vs. triangle setup (rendering) and their implementations within AMD Radeon and nVidia hardware!

The hate mail you're getting is from nVidia supporters whose minds you have blown. They don't want to believe that it's possible that they actually bet on the losing horse, after all that smug self-assurance and vitriol they've been heaping on Radeon card owners, like me. They're used to being able to argue any critic of nVidia to a standstill and out-shout everybody else, but with you, they cannot do this. You are actually a GPU engineer, telling them straight out with facts and proof, that their beloved GeForce cards have weaknesses, and that just kills them, because you are incontrovertibly right.


----------



## OneB1t

test this in singleplayer mode and you will see that such scene have under 300 000 triangles...
so your assumption about 290x and triangle count limit is wrong

290x can handle 3x time triangle count with same FPS in singleplayer


----------



## wiak

well AMD has had modern tessellation since the HD 2000 series












it was just unused in DX10

or even the ye oldy 9800 pro had tessellation
http://www.beyond3d.com/content/reviews/37/2

for me it seems amd just went full circle
DX9 = amd faster
DX10 = nvidia faster
DX11 = nvidia faster
DX12 = amd faster

when AMD released Mantle, alot of big developers (DICE, Valve) jumped on the chance to help to build a better API (Vulkan) together with them

people have to remember that most developers will develop cross platform
that means that they have to adhere to GCN hardware features unless they want to totally fck up their game(s)
the xbone runs on windows 8.x, that means that it will be upgrade to DX12 when Windows 10 for xbone is out. a lot of the optimizations done to xbone hardware can be seen on the pc. i have a 7970 card and that went from 10fps (DX11) to 19fps (DX12) in the ashes benchmark on my system (FX-8350, 32GB, 512GB SSD)

so on a xbone the perf will double when its updated to windows 10 and running a dx12 compatible game (Fable Legends comes to mind)
just my taught on why amds performance is pretty high in DX12, perhaps they got extra engineering and programming support from microsoft due to xbox one?


----------



## provost

This whole DX 12 vs DX 11, and hardware focus going forward has me intrigued, if nothing else for the sake of intellectual curiosity. From a consumer's point of view, I always believed that the performance for GPUs in our pcs should be driven more by the nuts and bolts hardware than be left up to the whims of the video card companies through software intervention.

I don't read reviews much, but I happen to check out this review from PCPEr and what pcper said in its conclusion is rather interesting, if we read it carefully:

http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark/Closing-Thoug
Quote:


> From what I can tell going forward, *it is going to be a new kind of battle for optimization in the world of DX12*. If a game engine developer wants to take optimal advantage of the GPUs from AMD, NVIDIA or even Intel, those devs need to be very hardware aware or each IHV is going to have to spend more time and engineering to teach and implement those developers about each new Radeon or GeForce feature that comes around. *If you can't count on the API to automatically give you access to each and every hardware tweak and shader improvement capability that NVIDIA/AMD devise, then direct developer to GPU vendor communication is going to need to be more heavily prioritized*.


To me, what PCPER seems to be saying is that either the developer will have easy access to the hardware from both GPU vendors or they won't. If the developers do have access to the hardware, as per this article, I believe this will be the only way to gain the most benefits out of DX 12 lower cpu overhead API, for the PC gamers. If not, as implied by this article's euphemistic phrase alluding to "GPU vendor communication", the pc gamers will end up at the same place and status quo as it exists with DX 11. GPu vendor sponsorship ( meaning "communication", skeptically speaking) would ensure that there is an inherent disadvantage for a non sponsoring GPU vendor. Real world economics always win over the noblest of innovative ideas... lol

Edit: This also means that, if we go by some of the theories postulated in this thread, AMD would have to spend less to match Nvidia's performance, or conversely, Nvidia would have to spend more, either by delivering more of the performance through its hardware (not sure what this means for the thousand of software personnel resources at NV), or through soft dollars to developers to compete in the DX 12 pc gaming environment.


----------



## Mahigan

Quote:


> Originally Posted by *provost*
> 
> [/spoiler]
> 
> I am interested in your theory of "driver intervention". Can you please elaborate how this may impact a multi tiered SKU strategy with otherwise little differentiation between the skus? Also, the Nvidia cards prior to Maxwell 2 (Gk110 specifically) were not compute crippled, so wouldn't we see those cards show a meaningful leap under Dx12, that is assuming Nvidia even bothered to optimize pre Maxwell 2 cards to leverage their compute capability for a DX12 API? I can't see them disrupting their priced to performance sku strategy by optimizing Kepler over Maxwell, even if Gk110's compute is superior to Maxwell 2. May be I am confused about how Nvidia differentiates/manages performance through driver intervention, if indeed it does?


Interestingly enough driver interventionism, under DirectX 11, plays into the ability of taping into the compute potential of nVIDIAs various architectures (Kepler, Maxwell 1 and 2). Therefore I can answer both your questions by explaining just how driver interventionism works and how this benefits nVIDIAs GPU architectures moreso than AMDs.
Quote:


> There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, D3D11 has more opportunities for driver intervention. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor's driver teams. On a closed system, this may not be the best choice if you're burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game's updates.


Source: http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/

Lets discuss a scenario:

DirectX 11
A game is released, well call it "X". X is a DirectX 11 title. nVIDIA did not work closely with the developers of X. X comes programmed with the request to process a particular shader which does not run efficiently on nVIDIAs architectures. This particular shader ends up taking up a lot of GPU time to Compute. The net result of this inefficiency is a drop in performance which is seen as a drop in frames per second being rendered. We'll just say that when this shader is run, nVIDIAs architectures drop down to 32FPS. With DirectX 11, nVIDIA could release a driver update in order to tackle this particular shader. Since DirectX 11 does not make use of all the available CPU cores and since DirectX 11 is is not an API that is closer to Metal, nVIDIA could address this issue by swapping the shader request, the game is making, for an alternative shader which runs more efficiently but results in the same visual effect. The extra CPU cores (or cycles) not being utilized can thus be used to perform this swap. The end result is that through a driver update, nVIDIA is able to bring the performance up to 56FPS when that particular shader request is made. This is possible, on nVIDIAs hardware, because of the Serial nature of their Graphics pipeline and because of the serial nature of DirectX 11 (which leads to several CPU resources remaining un-tapped).

DirectX 12
A patch is released for the game "X". nVIDIA did not work closely with the developers of X. X comes programmed with the request to process a particular shader which does not run efficiently on nVIDIAs architectures. This particular shader ends up taking up a lot of GPU time to Compute. The net result of this inefficiency is a drop in performance which is seen as a drop in frames per second being rendered. We'll just say that when this shader is run, nVIDIAs architectures drop down to 32FPS. With DirectX 12, nVIDIA cannot perform a shader swap due to the fact that DirectX 12 is closer to Metal and thus does not allow for such a driver intervention to occur. nVIDIA must work with the developer, in order for the developer to make the particular shader swap in their code when the engine detects nVIDIA hardware. This was actually what was done in Ashes of the Singularity (if you read Oxide's notes on this particular issue available here: http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/

Here is the particular case mentioned:
Quote:


> *To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year.* We have received a huge amount of feedback. For example, when *Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.*


Therefore what we can conclude is that nVIDIA is working closely with Oxide and has been for over a year (also mentioned in the same blog post). This would alleviate any claims of an AMD bias on Oxide's part.

Under DirectX 11
Now tying into this is nVIDIAs compute efficiency under DirectX 11 vs. DirectX 12. DirectX 11 makes serial compute requests. DirectX 11 will ask for a Graphics command to be executed in one cycle, in the following cycle DirectX 11 will ask for a Compute command to be executed and so on and so forth. For nVIDIA Kepler GK10x parts, this works efficiently as these architectures can only process a Graphics command or a Compute command. For nVIDIA Kepler GK110 and Maxwell 1 this works equally as efficiently (though there are untapped resources). For nVIDIA Maxwell 2 this also works efficiently (though there are untapped resources). Neither of these architectures, except Maxwell 2, can process both a Graphics and a Compute command in Parallel (unlike GCN which could do this since its inception). If you want more details on this you can view the graph bellow (you can ignore the GCN information as it is erroneous):


We can thus conclude that nVIDIA is able to hit a high degree of Compute efficiency under DirectX 11, regardless of the architecture of theirs being used.

For AMD GCN, this is not the case. AMD GCN has 8 ACEs (Asynchronous Compute Engines) which were designed to work concurrently and independently. This means that they were designed to work in parallel with your a Multi-Core CPUs various cores. Each core of your Multi-Core CPU can work with an ACE. Therefore if one ACE is busy, has queued up 8 work loads, in a cycle, your CPUs cores can talk to another ACE and so on and so forth. This is because each ACE has a queue with a depth of 8 (64 Queues total). Due to the DirectX 11s serial nature, the ACEs just sit there doing nothing. DirectX 11, instead, talks to AMDs Graphics Command Processor (a single unit) in order to process either a Graphics command or a Compute command per cycle. The Graphics Command Processor cannot do both on the same cycle. This leads AMDs Compute Units to sit there idling. Hurting efficiency thus limiting performance. You can have a better idea of how GCN works by looking at a block diagram of Hawaii bellow:


Under DirectX 12
A new feature is introduced which is called Asynchronous Shading. Both AMD and nVIDIAs DirectX 12 compatible hardware can perform Asynchronous Shading. AMD performs Asynchronous shading through their ACEs (Asynchronous Compute Engines) and nVIDIA perform Asynchronous Shading through a feature they call HyperQ. Now there's a difference between how both of these functions work.

*nVIDIA*:
For nVIDIA, instead of ACEs, Kepler/Maxwell 1 and 2, have a unit called the Grid Management Unit. This single unit can queue up to a depth of 32. This is what I began to mention above in the DirectX 11 section. To re-iterate.
nVIDIA Kepler GK10x: 1 Graphics command or a Compute command (Serial)
nVIDIA Kepler GK110 and Maxwell 1: 1 Graphics command or 32 Compute commands (Serial)
nVIDIA Maxwell 2: 1 Graphics command and 31 Compute commands (Parallel)

The Grid Management Unit cannot work Asynchronously. The Grid Management unit works in a static way, in order, on a per cycle basis. Therefore the Grid Management unit cannot take as large of an advantage of the various CPU cores feeding it under DirectX 12. Having a single unit, with a Queue depth of 31 under Parallel conditions, also means that the Grid Management Unit cannot break a complex shader into smaller, easier to compute, fragments the way GCN can. If an un-optimal shader makes it into the pipeline. nVIDIAs architectures will suffer. All of this leads to a loss in efficiency. That's the first issue. The second issue. Once the Grid Management Unit receives these commands, it passes them onto a second unit which is called the Work Distributor. Now for nVIDIA Kepler GK10x and nVIDIA Kepler GK110 and Maxwell 1, the Grid Managment Unit can only send the work in Serial to the Work Distributor. For Maxwell 2, that is not the case as mentioned above ( 1 Graphics command and 31 Compute commands in Parallel). Once the work distributor receives the commands, it sends them off to an available nVIDIA Compute Unit (SMX).
What is important to take note here is that there is a degree of latency introduced, in nVIDIAs HyperQ solution, by having two units (Grid Management Unit and Work Distributor) working in a hierarchical order. This means that is takes two cycles, for nVIDIA HyperQ, to both Queue the commands and send them off to an SMX. Added latency means a loss in efficiency.

*AMD*
For AMD, their architectures contain what are called ACEs, as I've mentioned before. These ACEs (2 for GCN 1.0/1.1 parts except the 290 series and 8 for GCN 1.1 290 series and 1.2 parts) receive work from the various multi-core CPUs in an independent fashion. They can work to break down complex shaders into smaller fragments which are easier to compute. Once they receive the work they prioritize the work and speak DIRECTLY to the Compute Units. That's it. There is no middle man.

The end result is that AMD GCN does not need driver optimizations for DirectX 12 (in the form of driver interventions). GCN also doesn't need shader swaps. Any shader can be broken apart and executed with a high degree of efficiency. It's architecture was built to be as Parallel as possible. AMD GCN based Graphics Cards were made to work in tandem with Multi-Core CPUs over a Parallel Graphics API. They were designed for Mantle/DirectX 12/ Vulcan in mind. It also plays well with VR (LiquidVR comes to mind) as less latency is what VR demands).

I hope this answers your questions.


----------



## loveuguys

Quote:


> for me it seems amd just went full circle
> DX9 = amd faster
> DX10 = nvidia faster
> DX11 = nvidia faster
> DX12 = amd faster


I agree, but the 9xx series was made for todays dx11 games to deliver amazing performance and quietness,
while there are no DX12 games at the moment.

When we will have at least 20 AAA DX12 Titles, we will already use Pascal gpus. People buy gpus, so they play the current games, not the ones two years in the future and forget about current past games like they are nothing.

Also you can not say that the comfort of using maxwell is the same as a Fury Trix. I compared systems and Fury Trix systems are only good for people where you don't disturb other at night.

What I'm most scared of is that the bad DX11 AMD drivers history can easily repeat itself, after those 20 AAA Titles in the future, since AMD has now even less money to optimise their drivers like in the past before.

I just hope, that this little dx12 advantage will stay and its not just a temporal boos, because nvidia did an excelent job for current DX11 Titles all the 2014 and soma amazing 2015 Titles.

I don't like when people talk about buying stuff for something that does not exist in a meaningful quantiti and want to play those non-existent titles, whyle they forget about the amazing big variety of current DX11 games.
Quote:


> DX12 = amd faster


I think that we should discuss this again in the future, with at least 5-10 popular dx12 games on the scene, but then again people will not buy R9 3xx series and 9xx series cards in the future, but their succesors.

Go play the games with what you have now, talking 2 years ahead and making conclusion is nonsense. Most people will upgrade to 16/14nm gpus for that.


----------



## th3illusiveman

Quote:


> Originally Posted by *Serandur*
> 
> CPU scaling from PC Perspective:


I wonder if this really represents how well Nvidia perfectly tuned DX11. It looks like they squeezed all the efficiency they could out of it while AMD was an optimized mess.


----------



## provost

Quote:


> Originally Posted by *loveuguys*
> 
> I agree, but the 9xx series was made for todays dx11 games to deliver amazing performance and quietness,
> while there are no DX12 games at the moment.
> 
> When we will have at least 20 AAA DX12 Titles, we will already use Pascal gpus. People buy gpus, so they play the current games, not the ones two years in the future and forget about current past games like they are nothing.
> 
> Also you can not say that the comfort of using maxwell is the same as a Fury Trix. I compared systems and Fury Trix systems are only good for people where you don't disturb other at night.
> 
> What I'm most scared of is that the bad DX11 AMD drivers history can easily repeat itself, after those 20 AAA Titles in the future, since AMD has now even less money to optimise their drivers like in the past before.
> 
> I just hope, that this little dx12 advantage will stay and its not just a temporal boos, because nvidia did an excelent job for current DX11 Titles all the 2014 and soma amazing 2015 Titles.
> 
> I don't like when people talk about buying stuff for something that does not exist in a meaningful quantiti and want to play those non-existent titles, whyle they forget about the amazing big variety of current DX11 games.
> I think that we should discuss this again in the future, with at least 5-10 popular dx12 games on the scene, but then again people will not buy R9 3xx series and 9xx series cards in the future, but their succesors.
> 
> Go play the games with what you have now, talking 2 years ahead and making conclusion is nonsense. Most people will upgrade to 16/14nm gpus for that.


Sorry, to insert myself in to this conversation, but isn't this the type of future proofing that people spending $500-$1000 should be getting? Any gpu that can handle the upcoming ex 12 api better than another comparable gpu, wins, in my opinion. Or else, we are just renting our high end gpus for few a months with an accelerated depreciation of its useful life. Then, the question is how much is one willing to spend to get the best gaming experience for a year or less, and then amortize that cost, less a very low resale value, over a monthly basis to figure out the rental cost of one's pc gaming entertainment budget by month.


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> This does not mean that DirectX 12 will show these results throughout every single title. Of course not. Because every title will make use of Asynchronous Shading to varying degrees. Ashes of the Singularity makes heavy use of Asynchronous Shading (many units each emitting their own light sources).
> 
> This is why titles which use Asynchronous Shading, but to a lesser degree such as Fable Legends, will most likely perform better on a GTX 980 Ti than they do an R9 290x. That being said, titles like Fable Legends don't require the amount of triangle throughput that Ashes of the Singularity requires. Thus we can assume that Fiji will surpass the GTX 980 Ti in that title. Only time will tell but I feel quite confident in this statement.


fable legends is actually one of the games associated with asynchronous shading. It seems the devs have made good use of it on xbox and PC. If the compute becomes a limitation for the maxwell 2 cards the situation might be worse. What is done in compute has to be done somehow if you don't use ACEs. Actually... that game is dx12 ONLY. they can't avoid the situation if it turns out that way.


----------



## Mahigan

Quote:


> Originally Posted by *th3illusiveman*
> 
> I wonder if this really represents how well Nvidia perfectly tuned DX11. It looks like they squeezed all the efficiency they could out of it while AMD was an optimized mess.


Probably done on purpose on both nVIDIA and AMDs parts.

nVIDIA would prefer that Developers continue to use DirectX 11 as it cripples their competition.

AMD would prefer that Developers move to DirectX 12 as it makes efficient use of their architecture.


----------



## CrazyElf

As it stands right now:

AMD Advantages

More parallelism
Their cards do seem to age better
Crossfire also scales better
They are ahead on HBM and I suspect that next generation, they will probably have a better memory controller than Nvidia
Nvidia Advantages

Tessellation and complex geometry
Better optimization of memory bandwidth for color compression
Rasterization
Net it works out to more power efficiency on DX11 and more OC headroom, although I will note that Nvidia cards don't scale linearly with overclocks (AMD ones generally do)
Like it or not, it's in everyone's best interests for the two companies to be roughly equal - perhaps even an advantage right now for AMD because of how bad their financial situation is right now.

Monopoly = loss for consumer. If Nvidia ever get's its wish, we will see stagnant GPUs and high prices.

Quote:


> Originally Posted by *Mahigan*
> 
> You should see the hate mail I've received


If it happens here, I'd recommend that you just report it. Don't bother with a PM-flame war. It's just a hair ripping exercise in futility.

Notice that earlier, I opposed the idea that Nvidia was the underdog. The reason why is because they have mindshare. IN the eyes of many buyers, Nvidia will always be "better" and what AMD has to offer is irrelevant to their purchasing choices. There are also a few AMD fans that think similarly. Most important of all, Nvidia is rich with $$$. AMD is in financial trouble.

On the CPU front, too, there are some AMD loyalists too, but it's much harder to conceal when the performance per thread difference between say, an FX-8350 and a i7 6700 is huge.

Either way, monopoly is bad for us. Intel right now is arguably an example of us enthusiasts losing - they haven't done much for us.

Quote:


> Originally Posted by *Dudewitbow*
> 
> its relatively common human mentality to defend their life choices, of those, only a handful actually accept some facts and intake it as knowledge. Some of the others will attack regardless if they are right or wrong(this is the bad side of the mentality). "Fanboyism" of any sort can be categorized in such a way. There's those who prefer a produce but knows and accepts their shortcomings at times(e.g accepting that AMD has weak tessellation performance, accepting that Nvidia has lower asyncronous performance in this case), there's the other who confirms their life choice by attacking the other choice, whether right or wrong, just to make their selves feel better because of a choice they made previously.


If you are interested in reading more about this:
http://arstechnica.com/science/2011/08/users-treat-criticism-of-favorite-brands-as-threat-to-self-image/

In general I find that the people on this website are of better technical knowledge than most other places, but especially in the GPU section, it does tend to get bad.

Quote:


> Originally Posted by *provost*
> 
> Sorry, to insert myself in to this conversation, but isn't this the type of future proofing that people spending $500-$1000 should be getting? Any gpu that can handle the upcoming ex 12 api better than another comparable gpu, wins, in my opinion. Or else, we are just renting our high end gpus for few a months with an accelerated depreciation of its useful life. Then, the question is how much is one willing to spend to get the best gaming experience for a year or less, and then amortize that cost, less a very low resale value, over a monthly basis to figure out the rental cost of one's pc gaming entertainment budget by month.


If what Mahigan is saying is true, then you are "renting" with an accelerated depreciation.


When DX12 comes out, the performance of the Nvidia GPUs, Maxwell and before will suffer; and maybe even Pascal won't do too well simply because it was Compute oriented (higher DP performance), so it may only be until Volta that we see real change
The Fury and Fury X are bottlenecked by their rasterization, and are not really upgrades compared to the 290X (they kept the rasterization and the same control structure) and they on have 4GB of VRAM
16nm GPUs will be much more future proof

Neither side has a particularly "great future-proof" offering this generation. That's not always been true - if you think about it, the 290 and 290X for example have been quite good in holding up their value, even if more efficient (from a power consumption perspective) chips have come out.


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> fable legends is actually one of the games associated with asynchronous shading. It seems the devs have made good use of it on xbox and PC. If the compute becomes a limitation for the maxwell 2 cards the situation might be worse. What is done in compute has to be done somehow if you don't use ACEs. Actually... that game is dx12 ONLY. they can't avoid the situation if it turns out that way.


I suppose it will depend on how much Asynchronous Shading is used compared to other aspects of the rendering pipeline. It is probably the game I am most looking forward too in 2015/Q1 2016.


----------



## loveuguys

Enthusiat cards were never ment to be future proof, It just for showing off how much money one has, since the performance does not scale together with the price, never has and never will.

I personally never buy cards that cost more than 500$, instead of buying a +500-1000$ gpu for 3-4 years I buy average 400$ gpu for 2 years (depending on API change, actuall gain, never nm process, improvement, aditional features).
Sell the old for aproximatelly half the price, lets say 200$ and buy the next gen for +200$.
So I have spent actually 400$ in 3-4 years instead of, lets say 700$ and still have the equivalent or even better gpu at that actual point in time.
I am doing this since 1997 and its just great.

What if microsoft decides next year to do a 12_2 feature level? I would rather change my card to a new 14nm that supports it, has maybe some new features like HW hevc, than to stick to a 700$ overpriced card without that.

Regardless if it's AMD or Nvidia, I just upgrade to the next 400$ gpu that is made for the current games in their point in time.
Altough 300$ per 2 years is nothing for me. It's just 50 cents per day or 1/5th of my monthly salary which I gladly give away every year or two and make someone, who buys old cards happy. People spend here 5$/ day on coffe and drinks for example.

If you have money to buy 600-1000$ or even more than one card I think money and future proofing is not an issue, its more of a show off.
I know that since I know a person who buys every single enthusiast gpu... on a 1,5 year basis regularly and he never cares about future...

The recent thing I did was buy the 290X in January 2014 for 450$, sold it in december for 250$ and bought the 980 for 490$, now I wait for the next one.
And remember, there is always a rich guy who will sell his expensive card after the release of a new card.

Make use of that, since most 1yr old cards still have 1more yr warranty. ( I know depends on Country)


----------



## Mahigan

Quote:


> Originally Posted by *loveuguys*
> 
> Enthusiat cards were never ment to be future proof, It just for showing off how much money one has, since the performance does not scale together with the price, never has and never will.
> 
> I personally never buy cards that cost more than 500$, instead of buying a +500-1000$ gpu for 3-4 years I buy average 400$ gpu for 2 years (depending on API change, actuall gain, never nm process, improvement, aditional features).
> Sell the old for aproximatelly half the price, lets say 200$ and buy the next gen for +200$.
> So I have spent actually 400$ in 3-4 years instead of, lets say 700$ and still have the equivalent or even better gpu at that actual point in time.
> 
> What if microsoft decides next year to do a 12_2 feature level? I would rather change my card to a new 14nm that supports it, has maybe some new features like HW hevc, than to stick to a 700$ overpriced card without that.
> 
> Regardless if it's AMD or Nvidia, I just upgrade to the next 400$ gpu that is made for the current games in their point in time.
> Altough 300$ per 2 years is nothing for me. It's just 50 cents per day or 1/5th of my monthly salary which I gladly give away every year or two and make someone, who buys old cards happy.
> 
> If you have money to buy 600-1000$ or even more than one card I think money and future proofing is not an issue, its more of a show off.
> I know that since I know a person who buys every single enthusiast gpu... on a 1,5 year basis regularly and he never cares about future...


Say you bought an R9 290 for $349 back in 2013...

You'd be quite content in seeing the potential of DirectX 12 no?

If you bought two for $700 back in 2013...

You'd be pretty happy to hear about DirectX 12 with Multi-Adapter and Split Frame Rendering technology no? Because that 4GB frame buffer limit under DirectX11 now turns into an 8GB frame buffer. You can play DirectX 12 games throughout 2016 at a 4K resolution, on that new 4K monitor you bought instead of upgrading your GPU, while having played every game in 2014 and 2015 without so much as a hitch.

Now that's quite the investment wouldn't you agree?

Now compare that to the $650 GTX 780 Ti you bought in Q1 2014...


----------



## p4inkill3r

Quote:


> Originally Posted by *loveuguys*
> 
> Enthusiat cards were never ment to be future proof, It just for showing off how much money one has, since the performance does not scale together with the price, never has and never will.
> 
> I personally never buy cards that cost more than 500$, instead of buying a +500-1000$ gpu for 3-4 years I buy average 400$ gpu for 2 years (depending on API change, actuall gain, never nm process, improvement, aditional features).
> Sell the old for aproximatelly half the price, lets say 200$ and buy the next gen for +200$.
> So I have spent actually 400$ in 3-4 years instead of, lets say 700$ and still have the equivalent or even better gpu at that actual point in time.
> I am doing this since 1997 and its just great.
> 
> What if microsoft decides next year to do a 12_2 feature level? I would rather change my card to a new 14nm that supports it, has maybe some new features like HW hevc, than to stick to a 700$ overpriced card without that.
> 
> Regardless if it's AMD or Nvidia, I just upgrade to the next 400$ gpu that is made for the current games in their point in time.
> Altough 300$ per 2 years is nothing for me. It's just 50 cents per day or 1/5th of my monthly salary which I gladly give away every year or two and make someone, who buys old cards happy.
> 
> If you have money to buy 600-1000$ or even more than one card I think money and future proofing is not an issue, its more of a show off.
> I know that since I know a person who buys every single enthusiast gpu... on a 1,5 year basis regularly and he never cares about future...


Nobody is showing off by buying top of the line hardware; you can justify your own stance on it however you like, but impugning those that want the highest performing hardware do so because they're *enthusiasts*, first and foremost.


----------



## Mahigan

Looks like we hit the page limit on this thread LOL


----------



## azanimefan

Quote:


> Originally Posted by *Mahigan*
> 
> Say you bought an R9 290 for $349 back in 2013...
> 
> You'd be quite content in seeing the potential of DirectX 12 no?
> 
> If you bought two for $700 back in 2013...
> 
> You'd be pretty happy to hear about DirectX 12 with Multi-Adapter and Split Frame Rendering technology no? Because that 4GB frame buffer limit under DirectX11 now turns into an 8GB frame buffer. You can play DirectX 12 games throughout 2016 at a 4K resolution, on that new 4K monitor you bought instead of upgrading your GPU, while having played every game in 2014 and 2015 without so much as a hitch.
> 
> Now that's quite the investment wouldn't you agree?
> 
> Now compare that to the $650 GTX 780 Ti you bought in Q1 2014...


shhhhhhhhhh

lets not even open the can of worms that is the hd 7970 / r9-280x


----------



## loveuguys

No, look.... you got it all wrong
I don't mind to switch cards on a regular basis for a weekly salary per year or 1.5yrs
I pay more taxes for one months of grocery shopping, than in one year for a gpu and I always sell it to someone, who can not afford expensive new cards. So i feel good.


----------



## pengs

Quote:


> Originally Posted by *wiak*
> 
> well
> for me it seems amd just went full circle
> DX9 = amd faster
> DX10 = nvidia faster
> DX11 = nvidia faster
> DX12 = amd faster


Yeah that's an interesting observation. It may be a coincidence but I've also noticed DX9 played a bit better on a Radeon which makes sense when you take a look at the past generation of consoles and realize that most console to PC ports originated from the 360 and it's ATI Xeno's GPU over the PS3, NV's RSX and most importantly Sony's Cell CPU which was complicated and did not fair well in transitioning. This may explain why the frame times seem better using an AMD card and the low frame rates don't seem as prevalent (this is my own observation). AMD having any slight advantage in DX9 was very important if you look at how long that era lasted.

It seems that a good portion of AMD's foothold has been due to them planting themselves in the doorway which connects both platforms - console and PC -- and again with DX12 but on a much larger scale as all of the consoles are running on their architecture this time and not just the GPU, that and the inherent advantage they have with their asynchronous shading method at the hardware level. This coherency of hardware (throwing in x86) should allow for the ease of porting from one console to the next, console to PC, PC to console (circular) and then beyond as developers start to utilize DX12 on the next gens.


----------



## p4inkill3r

Quote:


> Originally Posted by *azanimefan*
> 
> shhhhhhhhhh
> 
> lets not even open the can of worms that is the hd 7970 / r9-280x


You mean its remarkable staying power?


----------



## GorillaSceptre

What about Conservative Rasterization and the other 12.1 features, won't that level the playing field for nVidia? Especially if they use their dGPU market share to sponsor even more games in the future.


----------



## agentx007

Quote:


> Originally Posted by *Mahigan*
> 
> Now compare that to the $650 GTX 780 Ti you bought in Q1 2014...


I'm still a happy user that bought GTX 780 Ti in Q1 2014








Reason ?
It's still really good in every DX11 title now, and I bet it keeps giving me great graphics, until "Big" Pascal (or GCN 2.0/GCN2.1 ?) come along.
With all the goodies like HBM2, improved DX12 architectures, etc.

Also I can't see GTX 780 Ti type struggle for example to run ALL 2016's DX12 titles at "High" settings.

But let's be clear here : EVERY GPU get's old.
And if U R buying one - U must know that in future U *will* be forced to change it again.
That's how technology works.

I really don't think G80 type GPU (8800 GTX) will be ever made again since it needs A LOT of things to came together (manufacturing, pricing, effeciency, software, etc.).

If we take back a step, this situation (NV vs. AMD GPU) has A LOT in common with difference between X800 XT and 6800 Ultra WAAAYYYY back in 2004.
Former was faster now with DX9, latter was "future proof" with newer DirectX 9c support.
Guess what ?
Crysis 1 can't run too well on X800/X850 (shader errors), but CAN run on 6800 Ultra (altho 6800 Ultra isn't fast enough in some places to get 30FPS on low settings...).


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> What we're looking at isn't the CPU. I've demonstrated that. It doesn't matter which reviews we're looking at. Unless the reviewer is using an i3 4330/FX 6300/FX 8350 etc you're not going to see the CPU become the limitation. The GPU is hitting its Compute limits. This is what Ashes of the Singularity taxes the most (with the exception of the Fiji cards).
> 
> You also can't use DirectX 11 benchmarks in order to try and figure out what is happening under DirectX 12. The two cannot be compared. This is because DirectX 11 is open to driver interventions (shaders replacements for example) and is a serial API. Therefore looking at a 50% performance difference between a GTX 970 and GTX 980 Ti under DirectX 11 and wondering why this difference doesn't carry over to DirectX 12 is most illogical.
> 
> If you make the assumption for a Compute bottleneck, all these figures make sense.
> 
> Take a GTX 970, its theoretical Tflops (vary by model from 3.9 Tflops to 4.2 Tflops) since Computerbase.de used a Strix model (the model they reviewed not long ago) I will use the 4.2 Tflops figure. For the GTX 980 Ti I will use the 6.2 TFlops figure (of course any overclocks on either changes this figure and would affect the benchmark results). The difference between 4.2 Tflops and 6.2Tflops is around 32%. That's the difference you're seeing in the benchmark. Asynchronous Shading taxes the Compute capabilities of a GPU but allows for an incredible amount of Post Processing effects to be displayed onto the screen.
> 
> Either way we have our 32% difference figure and are thus able to deduce that the performance figure difference you mention can be explained by my theory. Once again my theory holds up. If a theory holds up, no matter what you throw at it, it is the truth. It goes from a hypothesis into the realm of a scientific fact. I've already had other people question the CPU. I have already demonstrated to them that the CPU is not the bottleneck. Thus far no opposing theory holds weight.
> 
> The same logic can be applied to why an R9 290x can keep up with a GTX 980 Ti. It's all about their theoretical compute capabilities. Their ALU throughput in other words. Ashes of the Singularity is tied to the compute capabilities of an architecture, for the most part, on the high end of the GPU market (a GTX 960 is crippled by its 128-bit memory interface for example). Ashes of the Singularity is also tied to the Peak Rasterization rate (many individual units made up of many triangles permeate the screen). This is what limits Fiji. Fiji has the same Peak Rasterization rate as an R9 290x. In an RTS game of this magnitude, the Fiji architecture is bottlenecked on this front.
> 
> A good example of the Nitrous engine, absent Asynchronous Shading, is Star Swarm. If we look at Star Swarm we see that a GTX 980 should be much faster than an R9 290x:
> 
> 
> Therefore we can most likely assume that if the Ashes of the Singularity benchmark made no use of Post Processing Effects (using Asynchronous Shading) then the GTX 980 Ti, like the GTX 980, would handily defeat the R9 290x. This is due to the GTX 980 Ti's far better Rasterization rate. It can both handle the Draw Calls as well as Polygon count demanded. Where the GTX 980 Ti lacks power, over the 290x, is in compute. The two are capable of nearly the same degree of performance. (I've already explained why Asynchronous Shading is near 100% efficient on GCN but less efficient on Maxwell 2 in my previous post).
> 
> Based on those two facts, we can deduce several conclusions. Conclusions which are demonstrable through the Ashes of the Singularity benchmark results.
> 
> This does not mean that DirectX 12 will show these results throughout every single title. Of course not. Because every title will make use of Asynchronous Shading to varying degrees. Ashes of the Singularity makes heavy use of Asynchronous Shading (many units each emitting their own light sources).
> 
> This is why titles which use Asynchronous Shading, but to a lesser degree such as Fable Legends, will most likely perform better on a GTX 980 Ti than they do an R9 290x. That being said, titles like Fable Legends don't require the amount of triangle throughput that Ashes of the Singularity requires. Thus we can assume that Fiji will surpass the GTX 980 Ti in that title. Only time will tell but I feel quite confident in this statement.


You did the math wrong. 6.2/4.2 = 47.6% (right about the performance difference in the GPU benchmarks I linked), not 32. And as I linked, there is another review (pcper) on the same benchmark showing a clear CPU difference between Haswell-E and a Skylake i7. So no, your theory doesn't hold up here. Pretending otherwise would be cherry-picking.

Edit: I meant to say that 6.2/4.2 = 47.6 and it seems as though you calculated 6.2/4.7 = 32% (emphasis on the 4.7 as a typo).


----------



## Mahigan

Quote:


> Originally Posted by *loveuguys*
> 
> No, look.... you got it all wrong
> I don't mind to switch cards on a regular basis for a weekly salary per year or 1.5yrs
> I pay more taxes for one months of grocery shopping, than in one year for a gpu and I always sell it to someone, who can not afford expensive new cards. So i feel good.


Don't take this the wrong way. I am not attacking you.

Everyone is entitled to their opinion. The fact that your opinion is illogical and borders on being irrational does raise an eyebrow. While I don't question that irrational behaviors permeate societies across the globe, I do find it peculiar that your specific behavior seems to be a perfect match for one particular GPU makers strategy.

This leads me to formulate two conclusions which are most probable.

1. You are a victim of Consumerism and, like the vast majority of individuals in a given society infested with advertising and marketing campaigns, feel the need to purchase new goods in order to obtain a degree of satisfaction and self worth. When taken to an extreme, this can lead to hoarding and other unhealthy habits. Don't take this as a judgement because I speak from personal experience.

2. You own an nVIDIA Graphics card and feel threatened, personally, from the topics covered in this thread and are attempting to justify your life choices at all costs. Even if that means making irrational statements in a public forum.

In either case, that is you prerogative and you're entitled to your own life choices. What I might suggest is that if you don't intend to participate in this discussion, by either arguing in support or in opposition to the conclusions derived out of the Ashes of the Singularity benchmark, it would be in our collective interests if you do not waste our time with your flame baiting.

If nVIDIA had released a card, back in 2013, which was set to be able to play the latest titles in 2016 with great looking graphics, this thread would have me arguing the architectural merits of their products. I think that to detract from an intellectual topic, because you feelings are hurt, does a great injustice to the readers of Overclock.net who must scroll over countless posts which contain no intellectual merit whatsoever.

Just as I have respected every single user who has questioned me, and will continue to do so, I think it would be best if you do the same. It's better for the community and in the long run it will be better for you. It may help you better understand the various aspects of modern Graphics Processing Units

Take Care and I apologize if this comes off as mean spirited as this is not my intention.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> You did the math wrong. 6.2/4.2 = 47.6% (right about the performance difference in the GPU benchmarks I linked), not 32. And as I linked, there is another review (pcper) on the same benchmark showing a clear CPU difference between Haswell-E and a Skylake i7. So no, your theory doesn't hold up here. Pretending otherwise would be cherry-picking.
> 
> Edit: I meant to say that 6.2/4.2 = 47.6 and it seems as though you calculated 6.2/4.7 = 32% (emphasis on the 4.7 as a typo).


The difference between 6.2 and 4.2 is 2.

(2 x 100)/6.2 = 32.25% The difference in compute capabilities is 32.25%.

You can also calculate it as:

4.2/6.2 = x/100
4.2 x 100 = 420
420/6.2 = 67.74
x = 67.74
4.2 is thus 67.74% of 6.2
100 minus 67.74 = 32.26% the difference is 32.3%


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> The difference between 6.2 and 4.2 is 2.
> 
> (2 x 100)/6.2 = 32.25% The difference in compute capabilities is 32.25%.


I'm talking about how much faster the 980 Ti is than the 970 with the 970 as a baseline. The computerbase.de review showed the 980 ti at 55.4 average FPS versus the 970's 42.2. I calculated 55.4/42.2 = 31.3% faster.

Similarly, the theoretical increase of the 980 Ti over the 970 as a baseline (represented as 6.2/4.2) is a 47.6% increase in the 980 Ti's favor.

What you calculated shows what fraction of a 980 Ti the increase is. Calculate for the 970 and you get:

(2x100)/4.2 = 47.6


----------



## ToTheSun!

Quote:


> Originally Posted by *Serandur*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Mahigan*
> 
> The difference between 6.2 and 4.2 is 2.
> 
> (2 x 100)/6.2 = 32.25% The difference in compute capabilities is 32.25%.
> 
> 
> 
> I'm talking about how much faster the 980 Ti is than the 970 with the 970 as a baseline. The computerbase.de review showed the 980 ti at 55.4 average FPS versus the 970's 42.2. I calculated 55.4/42.2 = 31.3% faster.
> 
> Similarly, the theoretical increase of the 980 Ti over the 970 as a baseline (represented as 6.2/4.2) is a 47.6% increase in the 980 Ti's favor.
> 
> What you calculated shows what fraction of a 980 Ti the increase is. Calculate for the 970 and you get:
> 
> (2x100)/4.2 = 47.6
Click to expand...

Wouldn't that logic only apply if performance was purely and exclusively based on compute, though?


----------



## gerpogi

Dang I just got a 980 2 weeks ago at a very nice price too... I had the option to get a 390x or a 980 for 400$ and ended up going for the 980.. now im kinda regretting that decision..


----------



## loveuguys

No offence taken







My unibrows are thick









I swapped, I think somewhere 10-12 gpus mixed, all kinds (loved the unlockable 9500, the mx legends, the voodoos all of them)
And no I did not spend more than 3000 maybe 3500$ together on them in the last 20 years and had nice experiences from both manufacturers.
I've met people who always bought 2-3x more expensive cards, but swapped only a couple of months after me...

But ok, i don't force this thinking.
Everybody can go and buy the 700$ (fury, 980ti) gpus now for the unknown future, your choice.

Maybe you will be happy in 2017 or maybe you will not, its quite a gamble.

One tip bro, don't speculate to much ahead. Life i short and only as complicated as you think it is.
I like it simple









Carpe diem to you








Quote:


> does a great injustice to the readers of Overclock.net who must scroll over countless posts which contain no intellectual merit whatsoever.


Sory, can't let you go with that







Look at your theoretical posts about people, their choices, for example your reply to me.... some scrolling needed there too


----------



## ZealotKi11er

Quote:


> Originally Posted by *loveuguys*
> 
> No offence taken
> 
> 
> 
> 
> 
> 
> 
> My unibrows are thick
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I swapped, I think somewhere 10-12 gpus mixed, all kinds (loved the unlockable 9500, the mx legends, the voodoos all of them)
> And no I did not spend more than 3000 maybe 3500$ together on them in the last 20 years and had nice experiences from both manufacturers.
> I've met people who always bought 2-3x more expensive cards, but swapped only a couple of months after me...
> 
> But ok, i don't force this thinking.
> Everybody can go and buy the 700$ (fury, 980ti) gpus now for the unknown future, your choice.
> 
> Maybe you will be happy in 2017 or maybe you will not, its quite a gamble.
> 
> One tip bro, don't speculate to much ahead. Life i short and only as complicated as you think it is.
> I like it simple
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Carpe diem to you


I used to upgrade a lot more then i do now because really there is no need. 2-3 year per GPU is normal now. The thing is now you can spend $700 for a GPU and keep it for 3 year which would have been the same some year back spending $500 and keeping for 2 years.


----------



## Casey Ryback

Quote:


> Originally Posted by *gerpogi*
> 
> Dang I just got a 980 2 weeks ago at a very nice price too... I had the option to get a 390x or a 980 for 400$ and ended up going for the 980.. now im kinda regretting that decision..


You have a fast card with plenty of vram, don't take too much from an early benchmark and speculation.

The 980 is a great card I would've picked it over the 390X, and I'd prefer to support AMD at this stage.


----------



## loveuguys

True story, 28nm tech is stuck
to Zealotk11ler


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> I'm talking about how much faster the 980 Ti is than the 970 with the 970 as a baseline. The computerbase.de review showed the 980 ti at 55.4 average FPS versus the 970's 42.2. I calculated 55.4/42.2 = 31.3% faster.
> 
> Similarly, the theoretical increase of the 980 Ti over the 970 as a baseline (represented as 6.2/4.2) is a 47.6% increase in the 980 Ti's favor.
> 
> What you calculated shows what fraction of a 980 Ti the increase is. Calculate for the 970 and you get:
> 
> (2x100)/4.2 = 47.6


2/6.2 = x/100

(2x100)/6.2 = 32.25%

2Tflops represents a 32.25% loss of compute capabilities from 6.2Tflops

55.4 - 32.25% = 17.86
55.4 - 17.86 = 37.5FPS

This is assuming the same overall boost clocks (which vary while the game is running between both the GTX 980 Ti and the 970). The GTX 970 also has a maximum higher boost clock. The GTX 970 is not a normal GTX 970. It's the ASUS GTX 970 STRIX DC II OC.

Also my compute calculations between a 290x and 290 were based on rather similar specifications between the two (Only ALU count and TMU count differ). Since Ashes of the Singularity is not some texturing masterpiece... I don't believe it to be limited by TMU output.

Texturing performance, on Hawaii, is not a problem.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> 2Tflops represents a 32.25% loss of compute capabilities from 6.2Tflops
> 
> 55.4 - 32.25% = 17.86
> 55.4 - 17.86 = 37.5FPS
> 
> This is assuming the same overall boost clocks (which vary while the game is running between both the GTX 980 Ti and the 970).
> 
> Also my compute calculations between a 290x and 290 were based on rather similar specifications between the two (Only ALU count and TMU count differ). Since Ashes of the Singularity is not some texturing masterpiece... I don't believe it to be limited by TMU output.


You're calculating percentage decreases from the 980 Ti to the 970, which is a different figure than I did (2 TFLOPs being a 47.6% increase from 4.2, the 970; 12.2 FPS increase being 31.3% from 42.2, also the 970). Doing it your way:

55.4 - 42.2 = 13.2

(13.2 x 100)/55.4 = 23.8%

Which is not the theoretical 32% you calculated. Neither is the real 42.2 FPS versus the theoretical 37.5 FPS you calculated. The point remains; this benchmark shows a 980 Ti only 31% faster than a 970 whereas other GPU-bound ones from the same site as well as the theoretical figures show ~48%.

Maxwell sticks to its boost clock pretty consistently, from experience; unless the workload is relatively light or the GPU's getting hot.


----------



## agentx007

U guys are seriously arguing over +(1)47,6% vs. -32,25% ?

Here's a bomb shell : It's the same.
Exact value depends on your point of view.

From 970 perpective :
980 Ti is ~47,6% *faster* (or is equal to ~147,6% performance of GTX 970).

From 980 Ti perspective :
970 is ~32,25% *slower* (or is equal to ~67,75% perf. of 980 Ti).

Diffrent spin on numbers - that's all.

PM waiting...


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> You're calculating percentage decreases from the 980 Ti to the 970, which is a different figure than I did (2 TFLOPs being a 47.6% increase from 4.2, the 970; 12.2 FPS increase being 31.3% from 42.2, also the 970). Doing it your way:
> 
> 55.4 - 42.2 = 13.2
> 
> (13.2 x 100)/55.4 = 23.8%
> 
> Which is not the theoretical 32% you calculated. Neither is the real 42.2 FPS versus the theoretical 37.5 FPS you calculated. The point remains; this benchmark shows a 980 Ti only 31% faster than a 970 whereas other GPU-bound ones from the same site as well as the theoretical figures show ~48%.
> 
> Maxwell sticks to its boost clock pretty consistently, from experience; unless the workload is relatively light or the GPU's getting hot.


With all due respect, I believe you're confusing yourself.

What we're looking at isn't to determine the exact percentage between the frames per second. What we're doing is looking at the difference in compute performance between the 970 and the 980 Ti. That's it. We're then applying this percentage in difference to the Frames Per Second. We're not expecting an EXACT result. It's impossible to get an exact result. We're looking to see if it makes sense.

Now

2/6.2 = x/100

(2x100)/6.2 = 32.25%

2Tflops represents a 32.25% loss of compute capabilities from 6.2Tflops

55.4 - 32.25% = 17.86
55.4 - 17.86 = 37.5FPS

So we would be expecting a result of 38 FPS if the game was GPU Bottlenecked on the Compute side (assuming there were no dynamic boost clocks involved which change the working frequency of the Shaders based on heat and load) What we see in Ashes of the Singularity is a result of 42.2 FPS. This is close enough to 38 FPS that we can, by looking at the behavior between a 390 and 390x in the same game, infer that the game is compute limited (GPU Bottleneck). This makes absolute sense because of the use of Asynchronous Shading. This is what Oxide says. This is what Extreme Tech says and this is what Ars Technica say. You're the only person here, a novice amongst professionals, who says otherwise.

We have data and evidence to back up our claims. You have your opinion.

Do you understand now?


----------



## gerpogi

Quote:


> Originally Posted by *Casey Ryback*
> 
> You have a fast card with plenty of vram, don't take too much from an early benchmark and speculation.
> 
> The 980 is a great card I would've picked it over the 390X, and I'd prefer to support AMD at this stage.


I agree it is a very good card for what I use it for currently, which is 1440p gaming. My only concern is that , like every other buyer, I'm looking to keep it for atleast 2 years. Now nvidia might( or might not) come up with a solution to even out the competition driver wise, but the 390x looks to be in a safer spot right now.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> With all due respect, I believe you're confusing yourself.
> 
> What we're looking at isn't to determine the exact percentage between the frames per second. What we're doing is looking at the difference in compute performance between the 970 and the 980 Ti. That's it. We're then applying this percentage in difference to the Frames Per Second. We're not expecting an EXACT result. It's impossible to get an exact result. We're looking to see if it makes sense.
> 
> Now
> 
> 2/6.2 = x/100
> 
> (2x100)/6.2 = 32.25%
> 
> 2Tflops represents a 32.25% loss of compute capabilities from 6.2Tflops
> 
> 55.4 - 32.25% = 17.86
> 55.4 - 17.86 = 37.5FPS
> 
> So we would be expecting a result of 38 FPS (assuming there were no dynamic boost clocks involved which change the working frequency of the Shaders based on heat and load) What we see in Ashes of the Singularity is a result of 42.2 FPS. This is close enough to 38 FPS that we can, by looking at the behavior between a 390 and 390x in the same game, that the game is compute limited.
> 
> Do you understand now?


It was you who originally tried to equate the performance increase calculation from the benchmark with a theoretical performance decrease.

And saying they're just close enough by eyeballing them with your opinion doesn't explain away a very sizeable percentage difference disparity even if the benchmark were purely compute-bound.


----------



## provost

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I used to upgrade a lot more then i do now because really there is no need. 2-3 year per GPU is normal now. The thing is now you can spend $700 for a GPU and keep it for 3 year which would have been the same some year back spending $500 and keeping for 2 years.


Not sure if this 2-3 year upgrade cycle would jibe with the quarterly earning expectations of the gpu makers, particularly Nvidia... Lol
With its dominant market share in the discreet gpu consumer market, and a captive install base, it behoves Nvidia to churn this install base as quickly as commercially possible. Much like the subscription model of the mobile phone companies, except in this case the service is the "user experience" and the handset manufacturer is also the hardware provider. Imagine Apple having its own telco with 90% market share... controlling both the service part (software) and the hardware (gpu) is extremely powerful and it essentially means complete control over the upgrade cycle for Nv.
The downer for NV is AMD, that acts as a party crasher to an otherwise lucrative strategy, and the fact that the overall PC gaming market has other emerging platforms to contend with for the share of the customers wallet.
So, NV has to continually get more from fewer customers year over year, until it can replace some of the sun setting revenue streams from alternative channels, i.e Tegra, Auto, etc.

AMD can't really play this upgrade cycle game with its install base of discreet gpu customers , thus, it has to think long term before committing resources to a new gpu release with fewer refresh cycles between major arch and node changes. Case in point, look at how well AMD's older GPU now compete with Nvidia's cards, both new and slightly old.

So, from a consumer's point of view , sans any brand loyalty or epeen reasons, AMD's cards appear to be a much better buy over the long term, especially if one is not a chronic upgrader or trader.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> It was you who originally tried to equate the performance increase calculation from the benchmark with a theoretical performance decrease.
> 
> And saying they're just close enough by eyeballing them with your opinion doesn't explain away a very sizeable percentage difference disparity even if the benchmark were purely compute-bound.


I never said it had to be the exact same figure. You derived that conclusion yourself.

I did the math to show the correlation. I stated that this was exactly what you would expect if the theory were true. The correlation matches what Oxide says and what every reviewer who has looked at this says. I posted a ton of comments made by the reviewers themselves including a screenshot of the game results window which shows the CPU Framerate vs the GPU Framerate. You can see that if an infinitely powerful GPU were used, it would achieve the CPU framerate. At that point the CPU would be the bottleneck.

The bottleneck is the GPU Compute performance for most highend GPUs (except Fiji). Fiji ought to perform much better but it doesn't. This is likely because Fiji is bottlenecked elsewhere. That's why I proposed the Peak Rasterization Rate of Fiji as being the likely culprit.


----------



## Mahigan

Now why do I think the Peak Rasterization Rate of Fiji is holding her back?

1. Memory bandwidth: Fiji has more memory bandwidth in theory and in practice than any other GPU.

2. Peak shader arithmetic rate: Fiji has a higher peak shader arithmetic rate in theory and in practice than any other GPU.

3. Pixel fill rate: Fiji has the same pixel fill rate as the 290x. That being said we're looking at a bottleneck which shows up at 1080p, the issue should cripple Fiji under 1440p and 4K if that is the case. It doesn't. Therefore Fill Rate is not the issue.

4. Texture fill rate: Fiji has the highest texture fill rates, with the exception of int16, of any other GPU. If Texture fillrate int16 were the bottleneck, a 290x would be crippled in AotS and would not come near Fiji's performance.

5. Geometry performance: If Tessellation were the issue then a 290x would not come near the performance of Fiji.

6. Polygon throughput (Peak Rasterization Rate): The same performance as a 290x. Likely culprit because.
a. There are many triangles on the screen with nearly 17,000 individuals units being drawn. Each unit is made up of many polygons. Units out of the view of the screen are still drawn.

Conclusion: Fiji is likely being bottlenecked by its Peak Rasterization Rate under AotS.

Could it be a driver issue? Not likely for several reasons:
1. Fiji is based on GCN and GCN drivers are very mature (all previous Mantle driver Optimizations would, for the most part, carry over to DirectX 12 drivers).
2. DirectX 12 is closer to Metal thus not likely to suffer from as many driver related issues.
3. If a driver issue were the culprit it would affect all GCN products.

Conclusion: Fiji is likely being bottlenecked by its Peak Rasterization Rate under AotS.

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> I never said it had to be the exact same figure. You derived that conclusion yourself.
> 
> I did the math to show the correlation. I stated that this was exactly what you would expect if the theory were true. The correlation matches what Oxide says and what every reviewer who has looked at this says. I posted a ton of comments made by the reviewers themselves including a screenshot of the game results window which shows the CPU Framerate vs the GPU Framerate. You can see that if an infinitely powerful GPU were used, it would achieve the CPU framerate. At that point the CPU would be the bottleneck.
> 
> The bottleneck is the GPU Compute performance for most highend GPUs (except Fiji). Fiji ought to perform much better but it doesn't. This is likely because Fiji is bottlenecked elsewhere. That's why I proposed the Peak Rasterization Rate of Fiji as being the likely culprit.


You stated, and I quote:
Quote:


> Therefore looking at a 50% performance difference between a GTX 970 and GTX 980 Ti under DirectX 11 and wondering why this difference doesn't carry over to DirectX 12 is most illogical.


Quote:


> Either way we have our 32% difference figure and are thus able to deduce that the performance figure difference you mention can be explained by my theory.


The first quote implies expecting a 50% difference is illogical when that is the actual theoretical difference according to the numbers you calculated with compute as its focus (keep in mind, DX11 itself showed extremely good scaling with Maxwell shaders; particularly because Nvidia beefed up everything other than the shaders in proportion). The second quote, following the first, very clear demonstrates why accurate numbers (not exact, but not the gap between a 31/32 and 48% improvement either) are important. You calculated a different figure and erroneously compared it to the figure you dismissed as incorrect/illogical. The numbers don't have to be exact, but you clearly dismissed one of the numbers (the one actually relevant to your theory) and used the wrong one under the pretense that it was more accurate. It's only afterwards you changed your statement to a dismissal of a subjectively minuscule difference. I am not confused, you made your original stance quite clear.

However, you also assume a perfect 10% shader scaling with the 390 versus the 390X showing a close estimate of theoretical compute differences as being relevant to your theory. I'd say the 12.5% difference with the 980 Ti that remains unaccounted for is significant then for those reasons and the one below:

You state you've accounted for the difference between the 390 and the 390X, but the Arstechnica review only shows a 290X (presumably reference) while the pcper review actually shows a 390X edging out a 980 by about 10%, give or take:



Take into account that there are no reference 390Xs/390s and that realistically, they're probably both at the same ~1050 MHz clock speeds (meaning a reference 290X at 1000 MHz should be within 5% of each; right in the middle). The 390X is theoretically 10% ahead of the 390 in compute performance which would put the 390 equal to the 980 in this scenario. As far as compute goes, the 980 Ti is roughly 29% higher than the 980 (6.7 TFLOPs vs 5.2) which, when translated over to the Computerbase.de review, should put it at ~62 FPS as opposed to the actual 55.4 FPS (calculated by multiplying the 390 result by 1.29). That's also right in line with where the 980 Ti should be going by compute performance versus the 970 (42.2 FPS x 1.476 = 62).

You're coming at it from the other angle with the Arstechnica review, but the Pcper review suggests something different. If, for whatever reason, the 980 Ti's boost speed isn't being maintained in Computerbase.de's benchmark despite holding up just fine in their 980 Ti review; that could explain it, but that would be abnormal and not representative of other 980 Ti models out there that don't drop their boost clocks ever and maintain ~20-30% higher clocks than the 980 Ti's reference 1075 MHz. And, again, the Pcper review did suggest a CPU influence with the 5960X versus the 6700K (up to 10%). When there are very few of these reviews to even go by, it's arguably a significant consideration. *If you're going to respond, please do not ignore the Pcper results again. They're one of only three even involving Hawaii and they show a CPU influence as well as evidence that a 390 should be a rough match for the 980, not close to the 980 Ti.*

With that said, I have nothing more I wish to say on the subject. I am not even directly disagreeing with what you're saying about theoretical compute scaling, but I am simply saying there is some conflicting info regarding the consistency or theoretical outcomes of the results.


----------



## Mahigan

Alright folks,

I have news for you from Tim Kipp over at Oxide on the CPU Optimizations found in Ashes of the Singularity. Tim also went into detail about the Memory bandwidth issues which can arise (which may explain the AMD CPU issues in the benchmark) as well as other tidbits of information we can all use in order to better understand just what is happening, behind the scenes, which leads to the results we are seeing.
Quote:


> = = = =
> 
> Hi xxxxxxxxx,
> 
> Thanks for your interest in the Ashes of the Singularity benchmark.
> 
> In order to get an accurate picture of how well a given CPU will perform, it's important to look at the CPU Frame rate with Infinite GPU on and off ( a check box exists on the benchmark settings panel ). Note, while on, you may see some graphical corruption due to use of async shaders, however the results will be valid.
> 
> With Infinite GPU, you should see %90+ workload on your CPU. In this mode, we do not "wait" in the case where the GPU is still busy. You should see excellent scaling between 4-16 thread machines.This can only be tracked on DX12.
> 
> Without Infinite GPU, the CPU will "Wait" on a signal from the GPU that the ready to process another frame. During this wait, the CPU tends to power down when there isn't any additional work to do and effectively serializes a portion of the frame. This serialization is what causes the CPU frame rate discrepancy between Infinite GPU on and off.
> 
> In addition, due to this "wait", one interesting stat to track is your power draw. On DX11 the power draw tends to be much higher than on DX12, as the additional serial threads that the driver needs to process the GPU commands effectively forces the CPU to be active even if it is only using a fraction of it's cores. This tends to be an overlooked benefit to DX12 since the API is designed so that engines can evenly distribute work.
> 
> Regarding specific CPU workloads and the differences between AMD and Intel it will be important to note a few things.
> 
> 1. We have heavily invested in SSE ( mostly 2 for compatibility reasons ) and a significant portion of the engine is executing that code during the benchmark. It could very well be 40% of the frame. Possibly more.
> 
> 2. While we do have large contiguous blocks of SSE code ( mainly in our simulations ) it is also rather heavily woven into the entire game via our math libraries. Our AI and gameplay code tend to be very math heavy.
> 
> 3. The Nitrous engine is designed to be data oriented ( basically we know what memory we need and when ). Because of this, we can effectively utilize the SSE streaming memory instructions in conjunction with prefetch ( both temporal and non temporal ). In addition, because our memory accesses are more predictable the hardware prefetcher tends to be better utilized.
> 
> 4. Memory bandwidth is definitely something to consider. The larger the scope of the application, paired with going highly parallel puts a lot of pressure on the Memory System. On my i7 3770s i'm hitting close to peak bandwith on 40% of the frame.
> 
> I hope this information helps point you in the right direction for your investigation into the performance differences between AMD and Intel. We haven't done exhaustive comparative tests, but generally speaking we have found AMD chips to compare more favorably to Intel than what is displayed via synthetic benchmarks. I'm looking forward to your results.
> 
> # # #


*Notes* (added as time permits):

- The good news is there are no AVX optimisations. Oxide have used SSE2 instead, for compatibility reasons as mentioned. This should give Intel processors only a slight edge in FPU tasks, while it should give the AMD FX 8350 an edge in integer performance, though nothing incredible. What is an eye opener is the way both the AMD FX 8350 and a 4350 perform under SSE2 optimized code. Pay special attention to the FX 4350 running 5.1GHz and the FX 8350 running 5GHz. You don't see the performance doubling (up 17.5%) which is what we've been witnessing, as seen below:


- The better utilization of the hardware prefetcher would point to far better performance on Vishera than on Bulldozer. One of Vishera's selling points, over Bulldozer, were the improvements to its hardware prefetcher. Steamroller did have some further improvements in terms of prefetching therefore some of the better performance of the A10-7870K can be attributed to this factor. The integer and floating point register files were increased in size in Steamroller while Load operations (two operands) were compressed in order to fit a single entry in the physical register file, which helps increase the effective size of each RF over both Bulldozer and Vishera. This would give Steamroller an edge in terms of integer execution which could account for some of the performance variance between Steamroller and Vishera\Bulldozer. The scheduling windows were made bigger in Steamroller which allow for greater utilization of execution resources (better for Draw Call execution for example). These improvements, together, could account for some of the performance increase we see with Steamroller. Steamroller also benefits from around 30% overall Ops per cycle over Vishera as a result of its improved FPU. the following slide, provided by AMD, gives us a glimpse as to some of the improvements that arrived with Steamroller:


If we look to some benchmarks we see that Branch prediction, as mentioned in AMDs slide, is indeed improved going from Vishera (Piledriver) to Steamroller. Single Threaded performance improves by 7% and Multi-Threaded performance by 14.4%:


Now if we look at FPU performance, in order to verify the claims made in the AMD slide, we again see some improvements to the FPU performance with Steamroller going anywhere from 2% to 13%:

http://www.extremetech.com/computing/177099-secrets-of-steamroller-digging-deep-into-amds-next-gen-core/1

- Memory bandwidth is also an important part of the equation. The Core i7-3770K is an Ivy-Bridge part which is most likely paired with a socket 1155 motherboard utilizing the Z77 chipset. The usual memory configuration for an Ivy Bridge part is a Dual Channel 1600MHz DDR3 configuration. This usually allows for around 20GB/s of Read and Write bandwidth. If 40% of the frame is leading to peak bandwidth usage then the same should be considered for an AMD FX-8350 part paired with an AMD 990FX Chipset whose peak bandwidth is around 19GB/s. It is no secret that the AMD FX-8350 benefits, moreso, than Intel parts by running faster memory (usually 1866MHz is recommended) due to the architecture being memory bandwidth starved. Therefore we can conclude that memory bandwidth could be, at least partially, to blame for the performance difference between AMDs FX series and Intel's Core ix series in AotS.

- Caching is also another area of interest which could shed light as to why Steamroller appears to do better than Vishera (Piledriver), while both still aren't nearly as good performers as Intel Core ix series in this regard:




If we take into account all of these improvements, from Vishera (Piledriver) to Steamroller, I suppose it is really not that surprising to see Steamroller, even with only 2 modules vs 4 modules, output the performance it is outputting under AotS.

What remains puzzling, to me, is the Intel vs. AMD discrepancies. I suppose that for now I'll have to simply attribute those discrepancies to the higher IPC count, caching and better memory subsystem on the Intel Processors.


----------



## agentx007

Seriously, SSE2 ?
That's Pentium 4 and early Athlon64 compatibility coding - lol.
They know that those are... Single Cored CPU's, Ie. not exacly "parallel fiendly" types, right ?

Well... Intel WILL be surprised to know that, since Prescott NEEDs SSE3, to be faster than Northwood








Even on Laptops, I don't think Banias qualifies as Windows 10 compatible...

Hell - even Prescott isn't compatible with x64 edition of Windows 8.1/10.

But back to real question :
Why would they use SSE2 compatibility in year 2015 ?


----------



## Vesku

Quote:


> Originally Posted by *provost*
> 
> [/spoiler]
> 
> I am interested in your theory of "driver intervention". Can you please elaborate how this may impact a multi tiered SKU strategy with otherwise little differentiation between the skus? Also, the Nvidia cards prior to Maxwell 2 (Gk110 specifically) were not compute crippled, so wouldn't we see those cards show a meaningful leap under Dx12, that is assuming Nvidia even bothered to optimize pre Maxwell 2 cards to leverage their compute capability for a DX12 API? I can't see them disrupting their priced to performance sku strategy by optimizing Kepler over Maxwell, even if Gk110's compute is superior to Maxwell 2. May be I am confused about how Nvidia differentiates/manages performance through driver intervention, if indeed it does?


They key is "asynchronous" compute, the GPU is asked to complete lots of different compute tasks at once. Maxwell should be able to better process such requests compared to Kepler, since it has a more robust method of executing multiple compute requests, just as GCN 1.1+ is better equipped than Maxwell in this particular area.


----------



## Digidi

Quote:


> Originally Posted by *Mahigan*
> 
> Now why do I think the Peak Rasterization Rate of Fiji is holding her back?
> 
> 1. Memory bandwidth: Fiji has more memory bandwidth in theory and in practice than any other GPU.
> 
> 2. Peak shader arithmetic rate: Fiji has a higher peak shader arithmetic rate in theory and in practice than any other GPU.
> 
> 3. Pixel fill rate: Fiji has the same pixel fill rate as the 290x. That being said we're looking at a bottleneck which shows up at 1080p, the issue should cripple Fiji under 1440p and 4K if that is the case. It doesn't. Therefore Fill Rate is not the issue.
> 
> 4. Texture fill rate: Fiji has the highest texture fill rates, with the exception of int16, of any other GPU. If Texture fillrate int16 were the bottleneck, a 290x would be crippled in AotS and would not come near Fiji's performance.
> 
> 5. Geometry performance: If Tessellation were the issue then a 290x would not come near the performance of Fiji.
> 
> 6. Polygon throughput (Peak Rasterization Rate): The same performance as a 290x. Likely culprit because.
> a. There are many triangles on the screen with nearly 17,000 individuals units being drawn. Each unit is made up of many polygons. Units out of the view of the screen are still drawn.
> 
> Conclusion: Fiji is likely being bottlenecked by its Peak Rasterization Rate under AotS.
> 
> Could it be a driver issue? Not likely for several reasons:
> 1. Fiji is based on GCN and GCN drivers are very mature (all previous Mantle driver Optimizations would, for the most part, carry over to DirectX 12 drivers).
> 2. DirectX 12 is closer to Metal thus not likely to suffer from as many driver related issues.
> 3. If a driver issue were the culprit it would affect all GCN products.
> 
> Conclusion: Fiji is likely being bottlenecked by its Peak Rasterization Rate under AotS.
> 
> "Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."


The rasterizer is not the bottleneck. If it were the bottleneck AMD wouldnt lead the 3dmark overhead test.

furyx can do 18.000.000 drawcalls in 3dmark test. Each drawcall have 112 polygons. So the polygonoutput from AMD is 2.000.000.000 Polygons each second. Nvidia can do only 13.000.000 drawcalls, so the polygonoutput is 1.500.000.000 Polygons per second. So AMD's rasterizer is close to its maximum. Nvidia is far away because of limitation of the command processor!

Gcn can handle in the worste case (when one polygon is as big like a pixel) 4 pixel per clock. That means at 1050mhz you get 2.300.000.000 Polygons. Thats realy close to the drawcalls test.

If a polygon is big as a pixel under UHD AMD can output 290fps per second.

2.400.000.00/(3840*2160)=290fps


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> You stated, and I quote:
> 
> The first quote implies expecting a 50% difference is illogical when that is the actual theoretical difference according to the numbers you calculated with compute as its focus (keep in mind, DX11 itself showed extremely good scaling with Maxwell shaders; particularly because Nvidia beefed up everything other than the shaders in proportion). The second quote, following the first, very clear demonstrates why accurate numbers (not exact, but not the gap between a 31/32 and 48% improvement either) are important. You calculated a different figure and erroneously compared it to the figure you dismissed as incorrect/illogical. The numbers don't have to be exact, but you clearly dismissed one of the numbers (the one actually relevant to your theory) and used the wrong one under the pretense that it was more accurate. It's only afterwards you changed your statement to a dismissal of a subjectively minuscule difference. I am not confused, you made your original stance quite clear.
> 
> However, you also assume a perfect 10% shader scaling with the 390 versus the 390X showing a close estimate of theoretical compute differences as being relevant to your theory. I'd say the 12.5% difference with the 980 Ti that remains unaccounted for is significant then for those reasons and the one below:
> 
> You state you've accounted for the difference between the 390 and the 390X, but the Arstechnica review only shows a 290X (presumably reference) while the pcper review actually shows a 390X edging out a 980 by about 10%, give or take:
> 
> 
> 
> Take into account that there are no reference 390Xs/390s and that realistically, they're probably both at the same ~1050 MHz clock speeds (meaning a reference 290X at 1000 MHz should be within 5% of each; right in the middle). The 390X is theoretically 10% ahead of the 390 in compute performance which would put the 390 equal to the 980 in this scenario. As far as compute goes, the 980 Ti is roughly 29% higher than the 980 (6.7 TFLOPs vs 5.2) which, when translated over to the Computerbase.de review, should put it at ~62 FPS as opposed to the actual 55.4 FPS (calculated by multiplying the 390 result by 1.29). That's also right in line with where the 980 Ti should be going by compute performance versus the 970 (42.2 FPS x 1.476 = 62).
> 
> You're coming at it from the other angle with the Arstechnica review, but the Pcper review suggests something different. If, for whatever reason, the 980 Ti's boost speed isn't being maintained in Computerbase.de's benchmark despite holding up just fine in their 980 Ti review; that could explain it, but that would be abnormal and not representative of other 980 Ti models out there that don't drop their boost clocks ever and maintain ~20-30% higher clocks than the 980 Ti's reference 1075 MHz. And, again, the Pcper review did suggest a CPU influence with the 5960X versus the 6700K (up to 10%). When there are very few of these reviews to even go by, it's arguably a significant consideration. *If you're going to respond, please do not ignore the Pcper results again. They're one of only three even involving Hawaii and they show a CPU influence as well as evidence that a 390 should be a rough match for the 980, not close to the 980 Ti.*
> 
> With that said, I have nothing more I wish to say on the subject. I am not even directly disagreeing with what you're saying about theoretical compute scaling, but I am simply saying there is some conflicting info regarding the consistency or theoretical outcomes of the results.


And that can be explained by what Oxide just mentioned to me in an email:
Quote:


> Without Infinite GPU, the CPU will "Wait" on a signal from the GPU that the ready to process another frame. During this wait, the CPU tends to power down when there isn't any additional work to do and effectively serializes a portion of the frame.


This wait time, likely more pronounced on a CPU with more cores (more cores, more wait, more wait more latency). Latency shows up as a small frame rate dip. It is nothing to be concerned about. I would assume that this wait time is more pronounced on Maxwell 2 because of the 31 compute queues per cycle vs GCN 1.1 290 series and GCN 1.2 with 64 queues. The CPU would be waiting on Maxwell 2 to a larger degree than on GCN as a result (as it takes two cycles alone to fill up the same amount of work on Maxwell 2 than on GCN 1.1 290 series and GCN 1.2).

We should see this wait time diminish when Oxide adds Multi-Adapter support to their engine. With multiple adapters the CPU will be waiting less on the GPU. This should show up as better Multi-Adapter scaling on GCN 1.1 290 series and GCN 1.2, by a fraction more, above Maxwell 2 because of the 128 Compute Queues now available on GCN 1.1 290 series and GCN 1.2 vs 62 Compute Queues on Maxwell 2.

As for the 6700K showing up as faster, this should change once more Graphics adapters are added and more CPU threads can be fully utilized. Right now, a faster Quad Core with HT seems to be ample enough to power a single Graphics adapter. What's more is that the 6700K is clocked higher therefore it can more quickly prepare its batches and send them to the GPU for computing. This would show up as increased performance over say a 5960x.

Again, just more proof of GCN handling Asynchronous Shading in a better fashion with its Asynchronous Compute Engine design over Maxwell 2 and its HyperQ design.

Hope that answers your questions.


----------



## Mahigan

Quote:


> Originally Posted by *Digidi*
> 
> The rasterizer is not the bottleneck. If it were the bottleneck AMD wouldnt lead the 3dmark overhead test.
> 
> furyx can do 18.000.000 drawcalls in 3dmark test. Each drawcall have 112 polygons. So the polygonoutput from AMD is 2.000.000.000 Polygons each second. Nvidia can do only 13.000.000 drawcalls, so the polygonoutput is 1.500.000.000 Polygons per second. So AMD's rasterizer is close to its maximum. Nvidia is far away because of limitation of the command processor!
> 
> Gcn can handle in the worste case (when one polygon is as big like a pixel) 4 pixel per clock. That means at 1050mhz you get 2.300.000.000 Polygons. Thats realy close to the drawcalls test.
> 
> If a polygon is big as a pixel under UHD AMD can output 290fps per second.
> 
> 2.400.000.00/(3840*2160)=290fps


You make a valid point.

The problem now is that there is no way to explain what is bottle necking the Fury-X?!


----------



## toekutr

Quote:


> Originally Posted by *Vesku*
> 
> They key is "asynchronous" compute, the GPU is asked to complete lots of different compute tasks at once. Maxwell should be able to better process such requests compared to Kepler, since it has a more robust method of executing multiple compute requests, just as GCN 1.1+ is better equipped than Maxwell in this particular area.


I think GCNs real advantage is the ability to handle any async compute task without any context switches. Having tons of ACE pipes is probably useful in some scenarios, but devs have said most of the benefit comes from 1 additional queue.


----------



## Mahigan

Quote:


> Originally Posted by *toekutr*
> 
> I think GCNs real advantage is the ability to handle any async compute task without any context switches. Having tons of ACE pipes is probably useful in some scenarios, but devs have said most of the benefit comes from 1 additional queue.


That depends entirely on how many Post Processing effects need to be calculated in Parallel. I mean if you have a game where there's a single light source... sure... having one Async Shader being queued along with one Graphics task will grant you the most benefits (there's no need for 8 ACEs and 64 Queues).

In the case of Ashes of the Singularity, you have about 17,000 units on the screen. Each unit is creating a light source when it fires a shot, blows up etc. When you have that much action taking place... the more Compute Queues the better (varying on how many independent post processing effects are occurring in tandem).

This is why Ashes of the Singularity is a good example of an Asynchronous Shading benchmark.

In a game like Fable Legends, it is arguable (unless you're in a multi-player battle) just how much GCN will benefit from having 8 ACEs. But then again that depends entirely on the environment in which the players find themselves.

Another benefit to ACEs is in the ability to break down a complex shader into smaller, easier to compute segments (which can be calculated in sequences). You can also prioritize your shader executions, queuing up a list of items to compute across 64 Compute queues while still being able to execute a Graphics task in Parallel (while the ALUs are busy crunching away).

At the end of the day, it depends entirely on the complexity of the rendered sequence. Since Fable Legends is built for cross-platform gaming (between the PC and Xbox One), I wouldn't think that more than two ACEs are of any utility. You can't have the PC rendering a more complex scene (post processing wise) to the XBox One. It just wouldn't work for cross platform gaming reasons.


----------



## toekutr

Quote:


> Originally Posted by *Mahigan*
> 
> In the case of Ashes of the Singularity, you have about 17,000 units on the screen. Each unit is creating a light source when it fires a shot, blows up etc. When you have that much action taking place... the more Compute Queues the better (varying on how many independent post processing effects are occurring in tandem).
> 
> This is why Ashes of the Singularity is a good example of an Asynchronous Shading benchmark.


No, I think it has to do with resource contention. Async compute doesn't magically make more compute cores, it just reduces the amount of time they spend idle.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> That depends entirely on how many Post Processing effects need to be calculated in Parallel. I mean if you have a game where there's a single light source... sure... having one Async Shader being queued along with one Graphics task will grant you the most benefits (there's no need for 8 ACEs and 64 Queues).
> 
> In the case of Ashes of the Singularity, you have about 17,000 units on the screen. Each unit is creating a light source when it fires a shot, blows up etc. When you have that much action taking place... the more Compute Queues the better (varying on how many independent post processing effects are occurring in tandem).
> 
> This is why Ashes of the Singularity is a good example of an Asynchronous Shading benchmark.
> 
> In a game like Fable Legends, it is arguable (unless you're in a multi-player battle) just how much GCN will benefit from having 8 ACEs. But then again that depends entirely on the environment in which the players find themselves.
> 
> Another benefit to ACEs is in the ability to break down a complex shader into smaller, easier to compute segments (which can be calculated in sequences). You can also prioritize your shader executions, queuing up a list of items to compute across 64 Compute queues while still being able to execute a Graphics task in Parallel (while the ALUs are busy crunching away).
> 
> At the end of the day, it depends entirely on the complexity of the rendered sequence. Since Fable Legends is built for cross-platform gaming (between the PC and Xbox One), I wouldn't think that more than two ACEs are of any utility. You can't have the PC rendering a more complex scene (post processing wise) to the XBox One. It just wouldn't work for cross platform gaming reasons.


Thank Xbox one for only having 2 ACEs.


----------



## mtcn77

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Thank Xbox one for only having 2 ACEs.


You need a beefed up cpu to fill all those ACE registers. Ironically, Xbox One has an even faster cpu overclock than PS4(6% I believe), so the ACE count won't present a problem. I actually think Xbox One has the upper hand in compute since it comes with an extra memory controller via ESRAM.


----------



## ZealotKi11er

Quote:


> Originally Posted by *mtcn77*
> 
> You need a beefed up cpu to fill all those ACE registers. Ironically, Xbox One has an even faster cpu overclock than PS4(6% I believe), so the ACE count won't present a problem. I actually think Xbox One has the upper hand in compute since it comes with an extra memory controller via ESRAM.


Always been a Xbox buy and well want to get a Xbox One but considering how much faster PS4 is , I have not bought either.


----------



## mtcn77

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Always been a Xbox buy and well want to get a Xbox One but considering how much faster PS4 is , I have not bought either.


I kind of like the XBox' color balance better, though as stupid as that may sound, but it is my genuine sentiment.


----------



## velocityx

getting up in the morning and check in the forums before work, and now my head is about to explode from reading the two last pages.....


----------



## STEvil

Quote:


> Originally Posted by *Mahigan*
> 
> You make a valid point.
> 
> The problem now is that there is no way to explain what is bottle necking the Fury-X?!


What are the single texture and multi-texture fillrates of Fury? maybe the culprit is a bit more old-school in nature..


----------



## infranoia

So when DICE shopped around the idea for a low-level API and Nvidia punted, did Nvidia understand their DX11 software engineering advantage and recognize that a low-level API revolution might actually remove that advantage?

Bit of a stretch, but it might really clarify the whole DICE-Nvidia-AMD-Mantle story, if AMD's so-called sucky drivers were simply an architectural limit for a "serial" API playing out against a parallel GCN architecture, a limit AMD couldn't resolve through driver tricks.


----------



## Kpjoslee

Quote:


> Originally Posted by *infranoia*
> 
> So when DICE shopped around the idea for a low-level API and Nvidia punted, did Nvidia understand their DX11 software engineering advantage and recognize that a low-level API revolution might actually remove that advantage?
> 
> Bit of a stretch, but it might really clarify the whole DICE-Nvidia-AMD-Mantle story, if AMD's so-called sucky drivers were simply an architectural limit for a "serial" API playing out against a parallel GCN architecture, a limit AMD couldn't resolve through driver tricks.


Nah, I don't believe there is any kind of conspiracy theory involved. They probably felt it is not yet time to jump onto dx12 only when it is only going to be 1 year after Win10 has launched.


----------



## Themisseble

Quote:


> Originally Posted by *Kpjoslee*
> 
> Nah, I don't believe there is any kind of conspiracy theory involved. They probably felt it is not yet time to jump onto dx12 only when it is only going to be 1 year after Win10 has launched.


Or maybe... you never know what is true.


----------



## Kuivamaa

Quote:


> Originally Posted by *agentx007*
> 
> Seriously, SSE2 ?
> That's Pentium 4 and early Athlon64 compatibility coding - lol.
> They know that those are... Single Cored CPU's, Ie. not exacly "parallel fiendly" types, right ?
> 
> Well... Intel WILL be surprised to know that, since Prescott NEEDs SSE3, to be faster than Northwood
> 
> 
> 
> 
> 
> 
> 
> 
> Even on Laptops, I don't think Banias qualifies as Windows 10 compatible...
> 
> Hell - even Prescott isn't compatible with x64 edition of Windows 8.1/10.
> 
> But back to real question :
> Why would they use SSE2 compatibility in year 2015 ?


AVX supporting games can probably still be counted using one hand. You may want to forward your complains to intel, they still as of today, in 2015 sell gaming CPUs (G3258) that do not support AVX. Btw this game may as well support SSE4.1, at no point Oxide's engineer clarifies.


----------



## Mahigan

Quote:


> Originally Posted by *toekutr*
> 
> No, I think it has to do with resource contention. Async compute doesn't magically make more compute cores, it just reduces the amount of time they spend idle.


It does both.

I mean it entirely depends on the complexity of the shader being executed. You don't need 44 CU's to compute a single light source shader. You can split your CUs resources. A few CUs used to process this, another few to process that etc keeping them all active and fed, all of the time.

This is especially true with the type of shaders being used in AotS. Each lightsource is a rather simple shader therefore you can have several of them being processed in parallel across a series of CUs.

DirectX 12 is... "about freaking time". Finally... more Cinematic effects are possible in games. Not just a little PhysX effect there (Batman's Cape flying around as you move Batman from left to right).

What the PC Gaming community needs, for its own interests, is less partisanship and more unity. We need to bridge this "left-right" (AMD-nVIDIA) paradigm. We ought to look out for our shared interests. We should come first. Not the Corporations. We are the consumers. They're supposed to create things for us. We're not mean't to swear allegiance to them.


----------



## Klocek001

so to sum up (and keepi it simple), 290x/390x should come close to 980ti when the game engine will use asynchronious compute but if it doesn't then dx11 status quo will probably remain, with 980ti pulling ahead ?


----------



## Mahigan

Quote:


> Originally Posted by *Klocek001*
> 
> so to sum up (and keepi it simple), 290x/390x should come close to 980ti when the game engine will use asynchronious compute but if it doesn't then dx11 status quo will probably remain, with 980ti pulling ahead ?


I don't think we can keep it simple. I think that's sort of the problem nowadays. People want things kept simple or they lose interest (ex tl;dr).

It will all depend on the amount of Post Processing, demanded by the game engine, requiring Asynchronous Shading. If the degree of post processing being demanded by an engine is large enough to tax the available compute resources while the demand for other segments of the graphics pipeline is not as large (Fill Rate, Texturing, Memory Bandwidth etc) then we will see a 290x/390x keeping up with a GTX 980 Ti.

The GTX 980 Ti isn't compute heavy. The GTX 980 Ti has just a tad more theoretical compute throughput than a 290x/390x but cannot utilize it as efficiently under DirectX 12. This is what is causing the results we're seeing.

Given that developers are making use of Asychronous Shading in upcoming titles (some available as soon as Fall/Winter 2015) then I think it is safe to say that the shelf life of a GTX 980 Ti, going forward, may in fact be relatively short. I think we may see GTX 980 Ti owners shelve their cards around the same time R9 290x/390x owners do.

For DirectX 11 titles, the GTX 980 Ti will always trounce both Fiji and Hawaii (except Fiji might pull ahead in some titles under 4K).


----------



## Dani011093

I used to be an AMD fanboy when i was younger... atm im an AMD user, (love my 5870 CF System, no doubt).... But this levels pf blindness or fanboyism are extreme.

First, i have read every single benchmark with different CPUs, and most (around 90%) of them show the 980Ti on par with the fury X (Fury X having and advantage of 1 or 2 fps tops), also they abvoiusly show that in the case theres no cpu benchmark (specially when taking it to 1440p)...

I know all the logical conclusions you guys are making may sound cool, im not saying amd isnt better at parallel processing, nor that nvidia is better in some other areas...

But all i can see is the same freaking performance on both cards in the benchmarks, which in the end just shows a poor optimization of amds DX11 driver and thats mainly it...

Seriously, theres benchmarks showing the GTX980 performing the same as a R9390x.. seriously this post made me just review all the benchmarks posted on this site, and now i ask.. whats the point of all this discussion if both cards are performing the same?

If we go to theoretical numbers, the fury X should put around 10 fps more giving its raw power which isnt happening... therefore, whats the point on all of this if theyre performing the same? Seriously, if you like this topic that much go work for amd or nvidia or do something more productive than trying to get to a point whitout even having a strong base to make those conclusions.


----------



## Cherryblue

Quote:


> Originally Posted by *Dani011093*
> 
> I used to be an AMD fanboy when i was younger... atm im an AMD user, (love my 5870 CF System, no doubt).... But this levels pf blindness or fanboyism are extreme.
> 
> First, i have read every single benchmark with different CPUs, and most (around 90%) of them show the 980Ti on par with the fury X (Fury X having and advantage of 1 or 2 fps tops), also they abvoiusly show that in the case theres no cpu benchmark (specially when taking it to 1440p)...
> 
> I know all the logical conclusions you guys are making may sound cool, im not saying amd isnt better at parallel processing, nor that nvidia is better in some other areas...
> 
> But all i can see is the same freaking performance on both cards in the benchmarks, which in the end just shows a poor optimization of amds DX11 driver and thats mainly it...
> 
> Seriously, theres benchmarks showing the GTX980 performing the same as a R9390x.. seriously this post made me just review all the benchmarks posted on this site, and now i ask.. whats the point of all this discussion if both cards are performing the same?
> 
> If we go to theoretical numbers, the fury X should put around 10 fps more giving its raw power which isnt happening... therefore, whats the point on all of this if theyre performing the same? Seriously, if you like this topic that much go work for amd or nvidia or do something more productive than trying to get to a point whitout even having a strong base to make those conclusions.


You clearly are overreacting here. We're just commenting gains with a new API; and observing the facts.

Yes, we also try to imagine what will this bring to us in some months. Because we are hardware and software fans, and we try to squeeze every bit of performance we can.

To imagine the future is the goal of a lot of discussion usually speaking.

Your reaction was irritating. Clearly. If you felt like you lost your time reading here, get over it, no one forced you to read this topic. Go be "productive" somewhere else in your own terms then.


----------



## Dani011093

Dear cherry, sorry if it sounded too edgy, the last two or three pages show clearly how a few members are backing up their "investigations" with nonexistant numbers... now id like to know, Where are those benchmarks showing a 390X performing like a 980ti? cause form the 6 benches i read, thats not the case.

Also, about the productiveness, yes, i see very good material here on theese forums, and could take that potential and use it in something else, lol not trying to be harmful <3

EDIT: Also, what happened with star swarm? does that count as a dx12 benchmark, im asking cause im not really sure why none is talking about it.


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> I used to be an AMD fanboy when i was younger... atm im an AMD user, (love my 5870 CF System, no doubt).... But this levels pf blindness or fanboyism are extreme.
> 
> First, i have read every single benchmark with different CPUs, and most (around 90%) of them show the 980Ti on par with the fury X (Fury X having and advantage of 1 or 2 fps tops), also they abvoiusly show that in the case theres no cpu benchmark (specially when taking it to 1440p)...
> 
> I know all the logical conclusions you guys are making may sound cool, im not saying amd isnt better at parallel processing, nor that nvidia is better in some other areas...
> 
> But all i can see is the same freaking performance on both cards in the benchmarks, which in the end just shows a poor optimization of amds DX11 driver and thats mainly it...
> 
> Seriously, theres benchmarks showing the GTX980 performing the same as a R9390x.. seriously this post made me just review all the benchmarks posted on this site, and now i ask.. whats the point of all this discussion if both cards are performing the same?
> 
> If we go to theoretical numbers, the fury X should put around 10 fps more giving its raw power which isnt happening... therefore, whats the point on all of this if theyre performing the same? Seriously, if you like this topic that much go work for amd or nvidia or do something more productive than trying to get to a point whitout even having a strong base to make those conclusions.


That's fair of you to say, but I would argue that the perspective you've shared is one of the primary reasons why we find ourselves in the state we are today. This state is one where PC Gaming has stagnated for far too long.

I think that what you haven't considered is this...

A Radeon R9 280x/Radeon HD 7970 user has been playing games since late 2012 and early 2013. At the time the GTX 680 was a better performer. This present day performance figure spurred its sales. It outsold the 7970 because of the lack of explanations, by Tech Review sites, as to the implications of both the Kepler and GCN architectures going forward. The emphasis has been, by every review site, to look at which cards perform best at present time. This may sound logical but only if you look at things from a Corporations perspective and not from a Consumers perspective.

From a Corporations perspective, you produce the fastest card in the present and you refresh and replace this card in the near future. This helps ensure continuous sales of your products. The goal of ensuring continuous sales is to promote Year over Year growth. Better year over year growth translates into better returns to investors. Better returns to investors attracts new investors. New investors means more money which translates into more investments into Research and Development. More research and development translates into newer products and so on and so forth (also translates into higher pay for company Executives). In political terms, this is the Republican or Conservative economic position. In theory it is supposed to translate into trickle down economics (more jobs) which is supposed to help lift the economy. What the Corporations seek is thus short term gains rather than a long term outlook.

From a Consumers perspective, you want to spend as little money as possible in order to gain as much performance as possible (bang for the buck). This helps to ensure that you spend less on gadgets and thus have more money left over for your retirement and other expenses. Therefore from a consumers perspective, it is better to buy a product with a longer shelf life than a product with a shorter shelf life. The performance in the present ought to matter less than the performance going forward. What consumer logic seeks is long term stability rather than short term gains.

Therefore it is only logical that for a consumer, spending money on GCN back in 2012/2013 has been the better investment.

The problem is that if you don't buy new products, more often, you starve a company of its capacity for research and development. This is what we're seeing with AMD. The market therefore demands a balance between the two and sadly our current market has no such balance. AMD had been selling cards at a lower price point than nVIDIA which made consumers happy but which has hurt AMD in the long run. AMD is thus shifting focus. We saw this shift recently with the introduction of the R9 390/390x as well as Fiji at price points which match nVIDIAs competing parts.

The reason for this is simple, with DirectX 12 arriving, the market has a new a player who will bring balance to the market (queue in Star Wars puns). Having more money on hand, to invest in drivers as well as proprietary perks, won't matter as much as the hardware architecture itself and how forward looking it is. Therefore we find ourselves in a period of market transition. This transition will likely end up hurting nVIDIA, and boosting AMD until a sort of parity is achieved. This will allow both companies the ability to invest in Research and Development while it will give we, the consumers, longer product refresh cycles and thus the ability to gain more band for our buck. Market balance is achieved. A sort of synergy between Producers and Consumers.

Therefore it absolutely matters that a Radeon R9 290/x purchased back in late 2013/early 2014 is able to compete with a GTX 980 Ti purchased in mid-2015 on titles being released in Q4 2015. Radeon R9 users have been able to play every singe title since late 2013/early 2014. Maybe not at the highest frame rates but in the end this is not what really matters. These same users will continue to be able to play up and coming titles which will produce even more eye candy. This is not the case for the users who purchased a GTX 680/770/780 etc. DirectX 12 will thus sort of provide a "free" upgrade to GCN users. That is, less money spent for more bang.

Going forward, however, nVIDIA and AMD will likely reach near parity. which will mean a more competitive market. We will likely see the pricing, on new hardware, drop from the current price levels (where nVIDIA dictates the pricing trends due to a near monopoly). If you don't think this matters, well you're entitled to your opinion, but something tells me that your opinion will come to change as 2016 rolls around and you start seeing just what will now be possible in games. PC Gaming is on the cusp of a revolution. We can't see it now. It feels stale. But changes of this sort always spur innovation.

I'm excited. And I haven't been this excited in years (close to a decade).

My 2 cents.


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> Dear cherry, sorry if it sounded too edgy, the last two or three pages show clearly how a few members are backing up their "investigations" with nonexistant numbers... now id like to know, Where are those benchmarks showing a 390X performing like a 980ti? cause form the 6 benches i read, thats not the case.
> 
> Also, about the productiveness, yes, i see very good material here on theese forums, and could take that potential and use it in something else, lol not trying to be harmful <3
> 
> EDIT: Also, what happened with star swarm? does that count as a dx12 benchmark, im asking cause im not really sure why none is talking about it.


Star Swarm was mentioned around 300,000 times (ok I'm exaggerating but its been mentioned often). Star Swarm placed an emphasis on draw calls, Star Swarm did not include Asynchronous Shading, the primary new feature of DirectX 12 and Vulcan APIs. This feature is also how PS4 and XBox One titles are taking advantage of the available hardware resources. When you factor Asynchronous Shading into the equation... you factor in Compute performance moreso than Texturing, Pixel Fill Rate, Memory Bandwidth and other factors. Asynchronous Shading is all about Parallelism. Going forward, Parallelism and compute performance will be the most important aspects of a GPU architecture. Times are about to become far more exciting.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> You make a valid point.
> 
> The problem now is that there is no way to explain what is bottle necking the Fury-X?!


I still think the throughput of the Rasterizers is playing a role in the bottleneck. There's something else that we are missing here then. We didn't get the kind of scaling in the Fury X that we would expect given the sheer number of shaders and other improvements.

What I have also found interesting is that they got some pretty good gains overclocking the HBM.

See here:
https://www.techpowerup.com/reviews/AMD/R9_Fury_X_Overvoltage/

This is what I found interesting:


Why is overclocking the HBM VRAM giving performance gains? Could there somehow be some sort of bottleneck there, despite the large bandwidth? Latency perhaps?

Although not representative of what a typical air or water user can expect, these are LN2 results:
http://forum.hwbot.org/showthread.php?t=142320
Quote:


> Originally Posted by *Mahigan*
> 
> A Radeon R9 280x/Radeon HD 7970 user has been playing games since late 2012 and early 2013. At the time the GTX 680 was a better performer. This present day performance figure spurred its sales. It outsold the 7970 because of the lack of explanations, by Tech Review sites, as to the implications of both the Kepler and GCN architectures going forward. The emphasis has been, by every review site, to look at which cards perform best at present time. This may sound logical but only if you look at things from a Corporations perspective and not from a Consumers perspective.


The same logic arguably applies in the case of the 290X vs the 780Ti/Titan Black debate.
Quote:


> Originally Posted by *Mahigan*
> 
> I don't think we can keep it simple. I think that's sort of the problem nowadays. People want things kept simple or they lose interest (ex tl;dr).
> 
> It will all depend on the amount of Post Processing, demanded by the game engine, requiring Asynchronous Shading. If the degree of post processing being demanded by an engine is large enough to tax the available compute resources while the demand for other segments of the graphics pipeline is not as large (Fill Rate, Texturing, Memory Bandwidth etc) then we will see a 290x/390x keeping up with a GTX 980 Ti.
> 
> The GTX 980 Ti isn't compute heavy. The GTX 980 Ti has just a tad more theoretical compute throughput than a 290x/390x but cannot utilize it as efficiently under DirectX 12. This is what is causing the results we're seeing.
> 
> Given that developers are making use of Asychronous Shading in upcoming titles (some available as soon as Fall/Winter 2015) then I think it is safe to say that the shelf life of a GTX 980 Ti, going forward, may in fact be relatively short. I think we may see GTX 980 Ti owners shelve their cards around the same time R9 290x/390x owners do.
> 
> For DirectX 11 titles, the GTX 980 Ti will always trounce both Fiji and Hawaii (except Fiji might pull ahead in some titles under 4K).


I think that Fury X will be retired at about the same time the 290 and 290X will be as well. As I've noted, the extra shaders and TMUs simply don't scale. In theory, AMD could advocate for higher resolution displays to push the extra bandwidth (and yes, at 4k, the Fury X is more or less tied with the 980Ti or at least within a couple of percentage points), but the problem is that it still only has 4GB of VRAM. Worsening that, Nvidia judging by the TR review seems more efficient at using what VRAM is available.

I think if there is a bottleneck, perhaps we should look at what they kept the same

The front end was basically unchanged (8 ACEs for example was kept)
So you think the ROPs are not the problem.
So you think the pixel fill and rasterization are not the problem? Another possibility is that the 290X never got anywhere close to the theoretical peak.
Either it's one of the above or the HBM has issues?


----------



## Dani011093

Quote:


> Originally Posted by *Mahigan*
> 
> Therefore it absolutely matters that a Radeon R9 290/x purchased back in late 2013/early 2014 is able to compete with a GTX 980 Ti purchased in mid-2015 on titles being released in Q4 2015. Radeon R9 users have been able to play every singe title since late 2013/early 2014. Maybe not at the highest frame rates but in the end this is not what really matters. These same users will continue to be able to play up ad coming titles which will produce even more eye candy. DirectX 12 will thus sort of provide a "free" upgrade to GCN users. That's is less money spent for more bang.
> .


i definetly get your point on the investment, now what i do not get is where are you backing up the fact that a 290x competes with a gx980 Ti?

Indeed now a gtx290x can EASILY compete with a gtx980, which was release more than a year after, that is a fact. But agtx 980Ti competing with a 290x is not what any of those benches show up... remember that PEAK compute power is far away from what youll see on gmaing.. example: the nintendo gameucbe had way better compute power tha the xbox... the xbox ended up in better overall graphics and stuff... we do not know how exactly do nvs and amds achs work... the fact that nvs 600 700 and 900 series have changed archs, compared to gcn being the same is a valid point. BUT stating that a 290x will compete agains a 980Ti without bechmarks to back it up and without a real knowdlege on the whole characteristics of the architecture is plain irresponsible. (dont take it wrong please).

You have stated many times that you do not know why the fury X is being bottlenecked... ofcourse, and thats normal because all we know if what they MARKET out of their architectures, and not really what they are made of or how they actually work... all we can see here is both archs performing the same... (and seriously, i know what parallellism is and i know itll bring boosts... but rasterization output in general is improtant too as MANY MANY other factors) So all in all, theres 1- No proof that a 290x competes with a 980Ti 2- You cant make a conclussion based on speculation 3- Benches are showing same performing architectures and 4- Basing a conclusion on PEAK (or avg idc) compute power, is not correct (just take old consoles examples and youll see, compute power, double precision compute and those things are not directly proportionall to gaming performance EVEN in low levels)...

Oh also, theres STILL going to be investment in drivers, DX12 is CLOSER to metal, but remember its not assembly level code xD


----------



## semitope

The points people care about would likely be that

1. directx 12 seems to bring the fury x equal or ahead of the 980ti even at lower resolutions
2. the 290x/390x approaching and even beating the 980ti.

The benchmarks show this. eg.

Fury X beating 980ti at high AND low resolutions



http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head/2

290x competing with 980ti




http://arstechnica.com/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/

Quote:


> Originally Posted by *Dani011093*
> 
> I used to be an AMD fanboy when i was younger... atm im an AMD user, (love my 5870 CF System, no doubt).... But this levels pf blindness or fanboyism are extreme.
> 
> First, i have read every single benchmark with different CPUs, and most (around 90%) of them show the 980Ti on par with the fury X (Fury X having and advantage of 1 or 2 fps tops), also they abvoiusly show that in the case theres no cpu benchmark (specially when taking it to 1440p)...
> 
> I know all the logical conclusions you guys are making may sound cool, im not saying amd isnt better at parallel processing, nor that nvidia is better in some other areas...
> 
> But all i can see is the same freaking performance on both cards in the benchmarks, which in the end just shows a poor optimization of amds DX11 driver and thats mainly it...
> 
> Seriously, theres benchmarks showing the GTX980 performing the same as a R9390x.. seriously this post made me just review all the benchmarks posted on this site, and now i ask.. whats the point of all this discussion if both cards are performing the same?
> 
> If we go to theoretical numbers, the fury X should put around 10 fps more giving its raw power which isnt happening... therefore, whats the point on all of this if theyre performing the same? Seriously, if you like this topic that much go work for amd or nvidia or do something more productive than trying to get to a point whitout even having a strong base to make those conclusions.


----------



## Dani011093

Quote:


> Originally Posted by *semitope*
> 
> The points people care about would likely be that
> 
> 1. directx 12 seems to bring the fury x equal or ahead of the 980ti even at lower resolutions
> 2. the 290x/390x approaching and even beating the 980ti.
> 
> The benchmarks show this. eg.
> 
> Fury X beating 980ti at high AND low resolutions
> 
> 
> 
> http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head/2
> 
> 290x competing with 980ti
> 
> 
> 
> 
> http://arstechnica.com/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/


indeed when is aid 90% of the benchmarks, i meant it. artechnica are the ONLY benches out there, that show those numbers, that do not really make sense with the rest of the benches out there... read the one form computerbase.de (also the ones you posted) and the ones on the first post, all show similar numbers except for theese. (When i check for benchmarks i just take many and get with the avergaes, and seriously avergaes show 980 competing against 390x... and 980ti competing with fury X)..

Also, the fury X vs 980 ti bench you posted, shows the Fury X winning by a extremely small margin on some tests ant the 980Ti winning by a even smaller margin on some dx12 test (pretty sure 1080P... all in all they show SAME performance.


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> Oh also, theres STILL going to be investment in drivers, DX12 is CLOSER to metal, but remember its not assembly level code xD


Driver interventions, in the form of shader replacements, are not possible under DirectX 12. Therefore you do need good drivers, but you can't circumvent the shader commands being made by the Game Engine in order to replace less favorable shader commands with more favorable shader commands. The GPU Manufacturers have to work with the developers in order to ensure the optimal shader commands are used.

That results in less investment in Research and Development of drivers and more investment in GPU architecture building as well as openness and transparency when working with developers.


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> indeed when is aid 90% of the benchmarks, i meant it. artechnica are the ONLY benches out there, that show those numbers, that do not really make sense with the rest of the benches out there... read the one form computerbase.de (also the ones you posted) and the ones on the first post, all show similar numbers except for theese. (When i check for benchmarks i just take many and get with the avergaes, and seriously avergaes show 980 competing against 390x... and 980ti competing with fury X)..
> 
> Also, the fury X vs 980 ti bench you posted, shows the Fury X winning by a extremely small margin on some tests ant the 980Ti winning by a even smaller margin on some dx12 test (pretty sure 1080P... all in all they show SAME performance.


Dani,

We've been discussing the topic of Fury-X vs 290/390x. See CrazyElf's post above. We've been attempting to figure out the likely culprit. Also worth pointing out that ArsTechnica may be the only website to show Radeon R9 290x keeping up with the GTX 980 Ti but they're also the only website to bench the two cards against one another.



Based on how well a Radeon R9 390 performs, relative to a GTX 980 Ti under the http://www.computerbase.de/ tests, available here: http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/#diagramm-normale-anzahl-an-draw-calls, we can conclude that the ArsTechnica numbers are valid.

How? If we assume a compute bottleneck and thus look to the ALU count differences between the R9 390 and R9 390x, we can see a 10% shader increase with the R9 390x. If we multiply the Frames per Second (48.5) by this 10% we achieve a number of 4.85 FPS. If we add 4.85FPS to 48.5FPS we get a result of 53.35FPS. This 53.4FPS figure is within striking distance of the GTX 980 Ti's 55.4FPS. This is what the ArsTechnica review showed.


----------



## Dani011093

Quote:


> Originally Posted by *Mahigan*
> 
> Driver interventions, in the form of shader replacements, are not possible under DirectX 12. Therefore you do need good drivers, but you can't circumvent the shader commands being made by the Game Engine in order to replace less favorable shader commands with more favorable shader commands. The GPU Manufacturers have to work with the developers in order to ensure the optimal shader commands are used.
> 
> That results in less investment in Research and Development of drivers and more investment in GPU architecture building as well as openness and transparency when working with developers.


Less, but not unexsistant. Still, i made more important points on my last post. All in all, there will be improovements based on software investment. NV will prolly invest even more on gameworks (which is funny and irritating).

Also, reffering to the benches again,if the 290X is performing or beating the 980Ti and the 980Ti is EQUAL to the fury X... then amd just launched the biggest scam ever... (it is not like that) just check every single other benchmakr out there, the 290x DOES NOT compete with the 980 Ti nor the fury X... the fury X and the 980ti are indeed FASTER than the 290x.. (Also more than 2 benches show the GTX 980 comepting with similar performance than the 390X)

DO NOT expect to stop seeing games run better on nvidia or amd because of more software investments (now those investments will go directly to developers)... this market is pretty dirty, dont forget that. D12 wont be a panacea, just an improovement in voerall graphics (developers will add better graphics instead of focusing on performance, the industry will always want your money)

sorry for the horrid grammas, im typing really fast atm... not much time left.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Dani,
> 
> We've been discussing the topic of Fury-X vs 290/390x. See CrazyElf's post above. We've been attempting to figure out the likely culprit. Also worth pointing out that ArsTechnica may be the only website to show Radeon R9 290x keeping up with the GTX 980 Ti but they're also the only website to bench the two cards against one another.
> 
> 
> 
> Based on how well a Radeon R9 390 performs, relative to a GTX 980 Ti under the http://www.computerbase.de/ tests, available here: http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/#diagramm-normale-anzahl-an-draw-calls, *we can conclude that the ArsTechnica numbers are valid*.


How do you figure that? The chart you just posted shows a 980Ti 15% faster than a 390,and a 290X is roughly on par with a 390. You are putting too much faith in your theoretical compute performance.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> a 290X is roughly on par with a 390.


Not in terms of compute performance. And Asynchronous Shading is all about compute performance.

If we assume a compute bottleneck and thus look to the ALU count differences between the R9 390 and R9 390x, we can see a 10% shader increase with the R9 390x. If we multiply the Frames per Second (48.5) by this 10% we achieve a number of 4.85 FPS. If we add 4.85FPS to 48.5FPS we get a result of 53.35FPS. This 53.4FPS figure is within striking distance of the GTX 980 Ti's 55.4FPS. This is what the ArsTechnica review showed.


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> Less, but not unexsistant. Still, i made more important points on my last post. All in all, there will be improovements based on software investment. NV will prolly invest even more on gameworks (which is funny and irritating).
> 
> Also, reffering to the benches again,if the 290X is performing or beating the 980Ti and the 980Ti is EQUAL to the fury X... then amd just launched the biggest scam ever... (it is not like that) just check every single other benchmakr out there, the 290x DOES NOT compete with the 980 Ti nor the fury X... the fury X and the 980ti are indeed FASTER than the 290x.. (Also more than 2 benches show the GTX 980 comepting with similar performance than the 390X)
> 
> DO NOT expect to stop seeing games run better on nvidia or amd because of more software investments (now those investments will go directly to developers)... this market is pretty dirty, dont forget that. D12 wont be a panacea, just an improovement in voerall graphics (developers will add better graphics instead of focusing on performance, the industry will always want your money)
> 
> sorry for the horrid grammas, im typing really fast atm... not much time left.


Judging by your use of capitalized lettering as well as insults towards one particular Graphics card manufacturer I can deduce that your bad grammar, as you put it, and your typing fast, is most likely caused by your typing out of emotions. You are emotionally compromised. Take some time and breathe. These results will have no impact, whatsoever, on the quality of life you currently enjoy.

Partisanship should be left aside my friend. It is unhealthy for the community as a whole.


----------



## semitope

Could it be possible that the bottleneck is not with the graphics card but with whatever cause the pooor performance on the 8 Core AMD chips? It just happens at a higher fps for intel systems. Could be a situation where more than just the CPU and GPU matter for dx12 but the pci-e subsystem and system RAM. AMD systems are still on pci-e 2.0 unless they are special boards, for example.

@Dani The reason a 980 and 980ti might end up at similar performance (i.e. both close to 290x/390x) would be if they have the same bottleneck regarding asynchronous shaders. Similar to the 290x/390x ending up close to fury x for whatever reason.


----------



## Dani011093

Quote:


> Originally Posted by *Mahigan*
> 
> Judging by your use of capitalized lettering as well as insults towards one particular Graphics card manufacturer I can deduce that your bad grammar, as you put it, and your typing fast, is most likely caused by your typing out of emotions. You are emotionally compromised. Take some time and breathe. These results will have no impact, whatsoever, on the quality of life you currently enjoy.
> 
> Partisanship should be left aside my friend. It is unhealthy for the community as a whole.


lol no, i was taking a break from work and got a call to come back to the office (servers been going crazy lately) had to run upstairs quickly back to the office, and now im here, ill be back on forums in a while.

Quick Edit before im off: ""How? If we assume a compute bottleneck and thus look to the ALU count differences between the R9 390 and R9 390x, we can see a 10% shader increase with the R9 390x. If we multiply the Frames per Second (48.5) by this 10% we achieve a number of 4.85 FPS. If we add 4.85FPS to 48.5FPS we get a result of 53.35FPS. This 53.4FPS figure is within striking distance of the GTX 980 Ti's 55.4FPS. This is what the ArsTechnica review showed.""

Indeed you are putting too much love into theoretical numbers, and this industry has shown us during the past decade and a half, that theoretical specs dont make a rule (not even in consoles)...

Also, theres no cpu bottleneck on those tests as far as ive read form your previous posts. All in all there is no proof that a 390/390x is performing the same as a fury X or a 980Ti... Also, you are all making conclussions out of a milimeter of info you have from the different architectures. No one here has a way to really back up all of theese speculations. Do you?

Edit: 390 vs 290x have roughly 500MFlops differece (that doesnt take it anywhere to a 15% improoved performance) not even with a multiplicative factor. (also dont forget the rasterization, it is as important as async compute)


----------



## Mahigan

Quote:


> Originally Posted by *CrazyElf*
> 
> I still think the throughput of the Rasterizers is playing a role in the bottleneck. There's something else that we are missing here then. We didn't get the kind of scaling in the Fury X that we would expect given the sheer number of shaders and other improvements.
> 
> What I have also found interesting is that they got some pretty good gains overclocking the HBM.
> 
> See here:
> https://www.techpowerup.com/reviews/AMD/R9_Fury_X_Overvoltage/
> 
> This is what I found interesting:
> 
> 
> Why is overclocking the HBM VRAM giving performance gains? Could there somehow be some sort of bottleneck there, despite the large bandwidth? Latency perhaps?


While it is true that the Fury-X does not benefit as well, as it should, from the extra memory bandwidth granted by the HBM memory (as seen below) it should still hold a clear advantage over the 390x if that was indeed the case.


Quote:


> The same logic arguably applies in the case of the 290X vs the 780Ti/Titan Black debate.
> I think that Fury X will be retired at about the same time the 290 and 290X will be as well. As I've noted, the extra shaders and TMUs simply don't scale. In theory, AMD could advocate for higher resolution displays to push the extra bandwidth (and yes, at 4k, the Fury X is more or less tied with the 980Ti or at least within a couple of percentage points), but the problem is that it still only has 4GB of VRAM. Worsening that, Nvidia judging by the TR review seems more efficient at using what VRAM is available.


Indeed the GTX 980 Ti has better compression algorithms and far likely a superior memory controller as witnessed when we compare the 290x to the GTX 980 Ti as shown in the memory bandwith test above.
Quote:


> I think if there is a bottleneck, perhaps we should look at what they kept the same
> 
> The front end was basically unchanged (8 ACEs for example was kept)
> So you think the ROPs are not the problem.
> So you think the pixel fill and rasterization are not the problem? Another possibility is that the 290X never got anywhere close to the theoretical peak.
> Either it's one of the above or the HBM has issues?


Figuring out what is the culprit is giving me quite the headache lol


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> Indeed you are putting too much love into theoretical numbers, and this industry has shown us during the past decade and a half, that theoretical specs dont make a rule (not even in consoles)...


The industry did not have DirectX 12 and the capacity to make efficient use of the available compute resources over the past decade either (see Asynchronous Shading).

The theoreticals are only used in order to show a correlation between the Ars Technica numbers and the Computerbase.de numbers in relation to a 390 vs 290. Since the theoreticals explain the differences we are seeing then we can make the logical deduction that the Ars figures are correct.

It's all logical.

If I were emotionally compromised, I may be on the same level of thinking that you are. Which is to assume a sort of quasi Conspiracy on the part of Ars Technica for example. Lucky for me, I'm able to control my emotions.


----------



## Dani011093

Quote:


> Originally Posted by *Mahigan*
> 
> The industry did not have DirectX 12 and the capacity to make efficient use of the available compute resources over the past decade either (see Asynchronous Shading).
> 
> The theoreticals are only used in order to show a correlation between the Ars Technica numbers and the Computerbase.de numbers in relation to a 390 vs 290. Since the theoreticals explain the differences we are seeing then we can make the logical deduction that the Ars figures are correct.
> 
> It's all logical.


by industry im reffering to console industry, read my previous posts.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Not in terms of compute performance. And Asynchronous Shading is all about compute performance.
> 
> If we assume a compute bottleneck and thus look to the ALU count differences between the R9 390 and R9 390x, we can see a 10% shader increase with the R9 390x. If we multiply the Frames per Second (48.5) by this 10% we achieve a number of 4.85 FPS. If we add 4.85FPS to 48.5FPS we get a result of 53.35FPS. This 53.4FPS figure is within striking distance of the GTX 980 Ti's 55.4FPS. This is what the ArsTechnica review showed.


That would also put it right in line with the Fury X. Suffice it to say that if the 290X matches the Fury X in actual DX12 games, I'll be very surprised. AMD would have seriously screwed up the Fiji design if that ends up being the case.


----------



## Dani011093

"If I were emotionally compromised, I may be on the same level of thinking that you are. Which is to assume a sort of quasi Conspiracy on the part of Ars Technica for example. Lucky for me, I'm able to control my emotions."

wat

Edit: Conspiracy? No, just innacurate benchmarks, in the past 10 years ive seen a variety of innacurate and awkard benchmarks. that has sown me to never base my opinion on 1 benchmark, and just on the avg result of over 5 to 10 benches...

Now im seriously off, have a nice days folks


----------



## Mahigan

Quote:


> Originally Posted by *Dani011093*
> 
> by industry im reffering to console industry, read my previous posts.


I'm sorry but your posts are filled with emotional outbursts and a lack of self control. I have no interest in communicating with you.

I apologize if this inconveniences you.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> That would also put it right in line with the Fury X. Suffice it to say that if the 290X matches the Fury X in actual DX12 games, I'll be very surprised. AMD would have seriously screwed up the Fiji design if that ends up being the case.


And that may very well be the case. Though it could be that this is a scenario which AotS exposes which other DirectX 12 titles do not. Only time will tell on that one.

There is something which is keeping the Fiji based card from reaching its compute potential under AotS. What we would need is someone with both a Fury-X and a 290x or 390x to run a sequence of benchmarks on AotS. This could help us figure out what the limitations are (by attempting overclocks and downclocks and then running the benchmark).

I mean one likely culprit could be the Graphics Command Processor. One way to see if this is the case would be to compare the Draw Call performance on both Fiji and the R9 290x to see if Fiji overtakes the R9 290x.

Therefore it could be that the issue isn't related to the Rasterizers but rather to the ability to feed the rasterizers. If that was the case, however, we should see the GTX 980 Ti performing even more poorly in relation to the Hawaii and Fiji parts.


----------



## GorillaSceptre

With the Fury X being bottle-necked, it kind of makes it's advantages with Async a bit redundant. This genre of game is like a showcase for Async, but at the end of the day the performance difference between the 980 Ti and Fury X is negligible.

With the Fury being bottle-necked elsewhere i think the Ti will perform +- the same as the Fury on DX12. I don't think you can say the same for non Maxwell 2 nVidia GPU's though. It looks like older AMD cards will start pulling ahead.

I guess both the 980Ti and Fury will be short lived, at least compared to other gens of GPU's. In hindsight, i should of bought a 290X, that thing's the 2600k of GPU's









I'm going to go for a 980 Ti, and then upgrade to Volta. By that time DX12 games should be the norm.


----------



## Mahigan

*WHAT IS THE ROLE OF AN ACE (ASYNCHRONOUS COMPUTE ENGINE)?*
The ACEs are responsible for all compute shader scheduling and resource allocation. Products may have multiple ACEs, which operate independently, to scale up or down in terms of performance. Each ACE fetches commands from cache or memory and forms task queues, which are the starting point for scheduling.

Each task has a priority level for scheduling, ranging from background to real-time. The ACE will check the hardware requirements of the highest priority task and launch that task into the GCN shader array when sufficient resources are available.

Many tasks can be in-flight simultaneously; the limit is more or less dictated by the hardware resources. Tasks complete out-of-order, which releases resources earlier, but they must be tracked in the ACE for correctness. When a task is dispatched to the GCN shader array, it is broken down into a number of workgroups that are dispatched to individual compute units for execution. Every cycle, an ACE can create a workgroup and dispatch one wavefront from the workgroup to the compute units.

While ACEs ordinarily operate in an independent fashion, they can synchronize and communicate using cache, memory or the 64KB Global Data Share. This means that an ACE can actually form a task graph, where individual tasks have dependencies on one another. So in practice, a task in one ACE could depend on tasks on another ACE or part of the graphics pipeline. The ACEs can switch between tasks queue, by stopping a task and selecting the next task from a different queue. For instance, if the currently running task graph is waiting for input from the graphics pipeline due to a dependency, the ACE could switch to a different task queue that is ready to be scheduled. The ACE will flush any workgroups associated with the old task, and then issue workgroups from the new task to the shader array.

*ASYNCHRONOUS COMPUTING*
For many tasks in the graphics rendering pipeline, the GPU needs to know about ordering; that is, it
requires information about which tasks must be executed in sequence (synchronous tasks), and
which can be executed in any order (asynchronous tasks). This requires a graphics application
programming interface (API) that allows developers to provide this information. This is a key
capability of the new generation of graphics APIs, including Mantle, DirectX® 12, and Vulkan™.
In DirectX 12, this is handled by allowing applications to submit work to multiple queues. The API
defines three types of queues:

Graphics queues for primary rendering tasks
Compute queues for supporting GPU tasks (physics, lighting, post-processing, etc.)
Copy queues for simple data transfers
Command lists within a given queue must execute synchronously, while those in different queues
can execute asynchronously (i.e. concurrently and in parallel). Overlapping tasks in multiple queues
maximize the potential for performance improvement.
Developers of games for the major console systems are already familiar with this idea of multiple
queues and understand how to take advantage of it. This is an important reason why those game
consoles have typically been able to achieve higher levels of graphics performance and image quality
than PCs equipped with a similar level of GPU processing power. However the availability of new
graphics APIs is finally bringing similar capabilities to the PC platform.



*SCHEDULING*
A basic requirement for asynchronous shading is the ability of the GPU to schedule work from
multiple queues of different types across the available processing resources. For most of their
history, GPUs were only able to process one command stream at a time, using an integrated
command processor. Dealing with multiple queues adds significant complexity. For example, when
two tasks want to execute at the same time but need to share the same processing resources, which
one gets to use them first?

Consider the example below, where two streams of traffic (representing task queues) are
attempting to merge onto a freeway (representing GPU processing resources). A simple way of
handling this is with traffic signals, which allow one traffic stream to enter the freeway while the
other waits in a queue. Periodically the light switches, allowing some traffic from both streams onto
the freeway.

Representation of a simple task switching mechanism:


To get the GPU to switch from working on one task to another, a number of steps are required:

Stop submitting new work associated with the current task
Allow all calculations in flight to complete
Replace all context data from the current task with that for the new task
Begin submitting work associated with the new task
Context (also known as "state") is a term for the working set of data associated with a particular task
while it is being processed. It can include things like constant values, pointers to memory locations,
and intermediate buffers on which calculations are being performed. This context data needs to be
readily accessible to the processing units, so it is typically stored in very fast on-chip memories.
Managing context for multiple tasks is central to the scheduling problem.

An alternative way to handle scheduling is by assigning priorities to each queue, and allowing tasks
in higher priority queues to pre-empt those in lower priority queues. Pre-emption means that a
lower priority task can be temporarily suspended while a higher priority task completes. Continuing
with the traffic analogy, high priority tasks are treated like emergency vehicles - that is, they have
right-of-way at intersections even when the traffic light is red, and other vehicles on the road must
pull to the side to let them pass.

Pre-emption mechanism for handling high priority tasks:


This approach can reduce processing latency for tasks that need it most, however it doesn't
necessarily improve efficiency since it is not allowing simultaneous execution. In fact, it can actually
reduce efficiency in some cases due to context switching overhead. Graphics tasks can often have a lot of context data associated with them, making context switches time consuming and sapping
performance.

A better approach would be to allow new tasks to begin executing without having to suspend tasks
already in flight. This requires the ability to perform fine-grained scheduling and interleaving of
tasks from multiple queues. The mechanism would operate like on-ramps merging on to a freeway,
where there are no traffic signals and vehicles merge directly without forcing anyone to stop and
wait.

Asynchronous compute with fine-grained scheduling:


The best case for this kind of mechanism is when lightweight compute/copy queues (requiring
relatively few processing resources) can be overlapped with heavyweight graphics queues. This
allows the smaller tasks to be executed during stalls or gaps in the execution of larger tasks, thereby
improving utilization of processing resources and allowing more work to be completed in the same
span of time.

*HARDWARE DESIGN*
The next consideration is designing a GPU architecture that can take full advantage of asynchronous
shading. Ideally we want graphics processing to be handled as a simultaneous multi-threaded (SMT)
operation, where tasks can be assigned to multiple threads that share available processing
resources. The goal is to improve utilization of those resources, while retaining the performance
benefits of pipelining and a high level of parallelism.

AMD's Graphics Core Next (GCN) architecture was designed to efficiently process multiple command
streams in parallel. This capability is enabled by integrating multiple Asynchronous Compute
Engines (ACEs). Each ACE can parse incoming commands and dispatch work to the GPU's processing
units. GCN supports up to 8 ACEs per GPU, and each ACE can manage up to 8 independent queues.
The ACEs can operate in parallel with the graphics command processor and two DMA engines. The
graphics command processor handles graphics queues, the ACEs handle compute queues, and the
DMA engines handle copy queues. Each queue can dispatch work items without waiting for other
tasks to complete, allowing independent command streams to be interleaved on the GPU's Shader
Engines and execute simultaneously.

This architecture is designed to increase utilization and performance by filling gaps in the pipeline,
where the GPU would otherwise be forced to wait for certain tasks to complete before working on
the next one in sequence. It still supports prioritization and pre-emption when required, but this
will often not be necessary if a high priority task is also a relatively lightweight one. The ACEs are
designed to facilitate context switching, reducing the associated performance overhead.

*USING ASYNCHRONOUS SHADERS*
The ability to perform shading operations asynchronously has the potential to benefit a broad range
of graphics applications. Practically all modern game rendering engines today make use of compute
shaders that could be scheduled asynchronously with other graphics tasks, and there is a trend
toward making increasing use of compute shaders as the engines get more sophisticated. Many
leading developers believe that rendering engines will continue to move away from traditional
pipeline-oriented models and toward task-based multi-threaded models, which increases the
opportunities for performance improvements. The following are examples of some particular cases
where asynchronous shading can benefit existing applications.

*Post-Processing Effects*
Today's games implement a wide range of visual effects as post-processing passes. These are
applied after the main graphics rendering pipeline has finished rendering a frame, and are often
implemented using compute shaders. Examples include blur filters, anti-aliasing, depth-of-field,
light blooms, tone mapping, and color correction. These kinds of effects are ideal candidates for
acceleration using asynchronous shading.

Example of a post-process blur effect accelerated with asynchronous shaders:

_Measured in AMD Internal Application - Asynchronous Compute. Test System Specifications: AMD FX 8350 CPU,
16GB DDR3 1600 MHz memory, 990 FX motherboard, AMD R9 290X 4GB GPU, Windows 7 Enterprise 64-bit_

*Lighting*
Another common technique in modern games is deferred lighting. This involves performing a prepass
over the scene with a compute shader before it is rendered, in order to determine which light
sources affect each pixel. This technique makes it possible to efficiently render scenes with a large
number of light sources.
The following example, which uses DirectX 12 and deferred lighting to render a scene with many
light sources, shows how using asynchronous shaders for the lighting pre-pass improves
performance by 10%.

Demonstration of deferred lighting using DirectX 12 and asynchronous shaders:

_Measured in AMD internal application - D3D12_AsyncCompute. Test System Specifications: Intel i7 4960X, 16GB
DDR3 1866 MHz, X79 motherboard, AMD Radeon R9 Fury X 4GB, Windows 10 v10130_
.

*SUMMARY*
Hardware, software and API support are all now available to deliver on the promise of asynchronous
computing for GPUs. The GCN architecture is perfectly suited to asynchronous computing, having
been designed from the beginning with this operating model in mind. This will allow developers to
unlock the full performance potential of today's PC GPUs, enabling higher frame rates and better
image quality.

Source: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Asynchronous-Shaders-White-Paper-FINAL.pdf


----------



## Mahigan

Basically,

Ashes of the Singularity exploits both the Post-Processing Effects as well as the Lighting features of Asynchronous Shading.

If we dive deeper we come to find that many of the features found in GameWorks, by nVIDIA, are possible under Asynchronous shading. Now consider this... Xbox One and PS4 are implementing these features. Currently DirectX 11 has been unable to implement these features. In came GameWorks. DirectX 12 arrives, out goes GameWorks.

It seems that AMD may in fact remove many of the advantages nVIDIA currently hold. I believe the only features left are PhysX which AMD is tackling, in part, with TressFX (which we will see in the new Deux Ex title).

I believe AMDs overall strategy is beginning to materialize. We may see even more interesting features being exploited by Asynchronous Shading in upcoming DirectX 12 titles.

At first I thought this was all AMD Marketing bull. The more I read what developers are saying... the more I am beginning to understand that the entire Tech review industry didn't place enough emphasis on just what is coming around the corner. We saw the articles discussing Asynchronous shading but for the most part they were simple in nature and didn't tie in this feature with what ought to be the recommended Graphics Card purchases in their other articles. We have an industry more concerned with selling Graphics Cards hardware, in order to retain review exclusivity, than it is in informing the public.

I believe the industry needs new Tech Review websites which are not as beholden to the Hardware Manufacturers. We need more people, with engineering knowledge, looking into these matters and less sensationalist journalists looking for ad revenue. Boy do I miss David Kanter.


----------



## GorillaSceptre

That's great and all, but the 980 Ti and Fury X are pretty much even under DX12 (in this benchmark anyway), and the Fury X is quite far behind it in DX11. I just don't really understand how it's going to do much for AMD.

The upcoming games weren't built to use DX12, the games that are will probably come in 2017, and at that point they'll be contending with Volta.


----------



## Forceman

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That's great and all, but the 980 Ti and Fury X are pretty much even under DX12 (in this benchmark anyway)


That fact seems to be overlooked a lot in these proclamations about how DX12 is going to be AMD's savior.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That's great and all, but the 980 Ti and Fury X are pretty much even under DX12 (in this benchmark anyway), and the Fury X is quite far behind it in DX11. I just don't really understand how it's going to do much for AMD.
> 
> The upcoming games weren't built to use DX12, the games that are will probably come in 2017, and at that point they'll be contending with Volta.


If you look at it from the lens of one Corporation against another (partisanship lens) then evidently you will conclude that the playing field will be more or less evened.

If you look at it from the lens of a consumer sitting on 2 year old hardware, you can conclude that you will be able to play the latest and greatest DirectX 12 titles (provided you're sitting on GCN hardware).

The titles will begin releasing in the fall of 2015 and Q1 2016. These are titles built for DirectX 12. Not in 2017. In 2016 we have a new AMD architecture releasing therefore you won't need to wait for Volta in 2017. Pascal may or may not incorporate improvements. I think Pascal will but several people disagree with me.

Quote:


> Originally Posted by *Forceman*
> 
> That fact seems to be overlooked a lot in these proclamations about how DX12 is going to be AMD's savior.


By evening the playing field, by eliminating key nVIDIA advantages, it is AMDs savior. Not that I care as much as I do being able to play these titles on a 4K screen with Dual Radeon R9 290x using Multi-Adapter Split Frame Rendering, another key feature of DirectX 12.

I do care going forward however. An even playing field means more competition which means a drop in GPU prices. This I care about.

The only nVIDIA graphics cards capable of playing these titles, with decent settings, will be the GTX 970/980/980 Ti. All the other cards will be rendered obsolete. Which significantly reduces the player base. Prompting many people to buy new cards, on the nVIDIA side of things at least.


----------



## CrazyHeaven

I find it easy to become emotional on a thread like this but I'm understanding things a bit more. My replies at least this replay will not be on the same plain of others but I wanted to check my understanding of a few things.

Taking the 980 ti into account and even these benchmarks it isn't bad by any means. Assuming that the info here is all true the 980 ti could still raise over the 290x on other direct x 12 games due to its increased speed. RTS is where Nvidia would be the weakest. If say the witcher 3 was updated to dx12 it would still perform well the only change is now amd would also see a increase in performance. And even if the 290x beats it at these games it doesn't undermine the performance of the 980ti on what it is capable of.

I know I upgraded to the 980ti because I had a stepup chance and wanted to max the game I was currently playing. It does that well enough so it meets my demand. If say on dx 12 it gets 60 fps and the 290x gets 72 does that mean my card is now scrap metal? No it doesn't. All that means is we are finally seeing some much needed competition. If amd people get a free upgrade then let me be the first to say that I'll be happy. How long have the red team been waiting for this day?


----------



## semitope

Is this game even heavy on ACE usage? It seems its used for the small light sources but what else? This could well be the best case scenario for Asynchronous shader usage on nvidia hardware. Considering they were up to date on the game code, this seems plausible. Bestest case for them would be not to use it at all. This also explains the reduced dx12 performance vs dx11. Using async adds latency but does not improve their efficiency anymore than their dx11

It may not even benefit their architecture if they've already invested in other ways to tackle efficiency (hence why a 980ti doesn't perform like a 290x in dx11). ACEs take advantage of idle silicon, right? If there's less, there's less relative performance gain. (Seems hyperQ tackled this). For pascal their best hope might not be to just do this better, but to pack more power into their hardware.

The shady tactic would be to prevent the use of ACEs to improve AMDs efficiency.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> By evening the playing field, by eliminating key nVIDIA advantages, it is AMDs savior. Not that I care as much as I do being able to play these titles on a 4K screen with Dual Radeon R9 290x using Multi-Adapter Split Frame Rendering, another key feature of DirectX 12.
> 
> I do care going forward however. An even playing field means more competition which means a drop in GPU prices. This I care about.
> 
> The only nVIDIA graphics cards capable of playing these titles, with decent settings, will be the GTX 970/980/980 Ti. All the other cards will be rendered obsolete. Which significantly reduces the player base. Prompting many people to buy new cards, on the nVIDIA side of things at least.


You're extrapolating from a single, pre-release, data point. Would you look at these same tests and come to the conclusion that AMD has no ability to compete in DX11 based on the DX11 results?

Nvidia was also involved in the development of DX12, it's not like they have no idea what is coming.


----------



## Mahigan

Quote:


> Originally Posted by *CrazyHeaven*
> 
> I find it easy to become emotional on a thread like this but I'm understanding things a bit more. My replies at least this replay will not be on the same plain of others but I wanted to check my understanding of a few things.
> 
> Taking the 980 ti into account and even these benchmarks it isn't bad by any means. Assuming that the info here is all true the 980 ti could still raise over the 290x on other direct x 12 games due to its increased speed. RTS is where Nvidia would be the weakest. If say the witcher 3 was updated to dx12 it would still perform well the only change is now amd would also see a increase in performance. And even if the 290x beats it at these games it doesn't undermine the performance of the 980ti on what it is capable of.


Precisely. The news isn't so much that the GTX 980 Ti will suffer, because it won't suffer. The news is that GCN cards will get a free boost. That's why I don't quite understand all of the hate being thrown around. In the end, we all win. Whatever happens to a Corporation, or a brand, shouldn't lead people to taking things personally. It does, but it shouldn't.
Quote:


> I know I upgraded to the 980ti because I had a stepup chance and wanted to max the game I was currently playing. It does that well enough so it meets my demand. If say on dx 12 it gets 60 fps and the 290x gets 72 does that mean my card is now scrap metal? No it doesn't. All that means is we are finally seeing some much needed competition. If amd people get a free upgrade then let me be the first to say that I'll be happy. How long have the red team been waiting for this day?


Couldn't agree more.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> You're extrapolating from a single, pre-release, data point.
> 
> Nvidia was also involved in the development of DX12, it's not like they have no idea what is coming.


The development of DirectX 12 is, from a programmers stand point, the development of AMD Mantle. Anyone with any knowledge of programming, who looks at DirectX 12 and Mantle, will tell you that they are inherently built on "the same" principles, even coding for the two is so similar that porting Mantle code to DirectX is incredibly easy. You could almost say that AMD Mantle was ported over to DirectX. Not entirely but in large part. Look to pages 90-93 here: http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
Quote:


> Mantle path
> All lighting draw calls use same primitive topology, shaders and rasterizer, depth stencil and
> blend states. The following Mantle API calls are made for each lighting draw call:
> grCmdBindDescriptorSet()
> grCmdBindIndexData()
> grCmdDrawIndexed()
> All shadow map draw calls use same primitive topology, shaders and rasterizer, depth stencil
> and blend states. For each shadow map draw call, the following Mantle API calls are made:
> grCmdBindDescriptorSet()
> grCmdBindIndexData()
> grCmdDrawIndexed()
> Neither lighting nor shadow map passes use tessellator or geometry shader.
> All shader constants are stored in one large constant buffer that is updated with a single
> grMapMemory() call. The memory states for the constant buffer are set with
> grCmdPrepareMemoryRegions().
> The test uses one thread for each logical CPU core. Draw call recording work is divided
> evenly between all threads for both the shadow map and lighting passes. Each thread
> records draw calls for a fixed set of geometries for both passes.


Quote:


> DirectX 12 path
> All lighting draw calls use same primitive topology and pipeline state object. The following
> DirectX 12 API calls are made, at least once, for each lighting draw call:
> SetIndexBuffer()
> SetGraphicsRootDescriptorTable()
> SetGraphicsRootConstantBufferView()
> DrawIndexedInstanced() with a single instance
> All shadow map draw calls use same primitive topology and pipeline state object. The
> following DirectX 12 API calls are made, at least once, for each shadow map draw call:
> SetIndexBuffer()
> SetGraphicsRootConstantBufferView()
> DrawIndexedInstanced() with a single instance
> Neither lighting nor shadow map passes use tessellator or geometry shader.
> The test uses one thread for each logical CPU core. Draw call recording work is divided
> evenly between all threads for both the shadow map and lighting passes. Each thread
> records draw calls for a fixed set of geometries for both passes.


I wouldn't say nVIDIA developed DirectX 12. I'd say that they were informed of the new standards and new requirements and had a role in providing input and requests. All that being said, DirectX 12 is what AMD wanted with Mantle now being realized as an industry standard. The same can be said of Vulcan.


----------



## Forceman

I didn't say developed, I said involved. And let's not start the whole Mantle caused DX12 debate again.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Forceman*
> 
> I didn't say developed, I said involved. And let's not start the whole Mantle caused DX12 debate again.


No DX12 is very similar to Mantle which plays a big part of AMDs DX12 performance.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I didn't say developed, I said involved. And let's not start the whole Mantle caused DX12 debate again.


It is a fact that Mantle caused DirectX 12. If you look at the code for the two... it is as evident as the sun rising and setting. But it's not that Mantle caused it as much as it is that Mantle took advantage of GCN and GCN is what powers the Xbox One. DirectX 12 was built to run on the XBox One. Therefore when it came to input on taking advantage of the GCN architecture of the Xbox One, Microsoft was talking predominantly with AMD. You see this in the code, manuals, SDKs etc. DirectX 12 was thus built around GCN, evidently not ignoring nVIDIAs architectures of course, I suppose that this is one of the advantages to having your hardware in Microsoft's consoles. nVIDIA is adapting its architecture in order to add more GCN-like features (as demanded by DirectX 12). That's why Maxwell added Async capabilities to its HyperQ.

Future iterations of nVIDIAs architectures will highlight this in a far more pronounced manner. I'm thinking a few ACE-like units will be present.

Compare DirectX 11...
Quote:


> DirectX 11 path
> All lighting draw calls use same primitive topology, shaders and rasterizer, depth stencil and
> blend states. The following DirectX 11 API calls are made for each lighting draw call:
> IASetIndexBuffer()
> IASetVertexBuffers()
> VSSetConstantBuffers()
> PSSetConstantBuffers()
> PSSetSamplers()
> PSSetShaderResources()
> DrawIndexed()
> All shadow map draw calls use same primitive topology, shaders and rasterizer, depth stencil
> and blend states. The following API calls are made for each shadow map draw call:
> IASetIndexBuffer()
> IASetVertexBuffers()
> VSSetConstantBuffers()
> DrawIndexed()
> Neither lighting nor shadow map passes use tessellator or geometry shader.


Even the error checking manual is almost identical...



This should not surprise anyone given the hardware running in the Xbox One.

nVIDIAs input can be seen in the DirectX 12_1 feature level. Added support for Conservative Rasterization Tier 1 and ROVs.


----------



## pengs

Mahigan,
Excellent post there.
Quote:


> *Lighting*
> Another common technique in modern games is deferred lighting. This involves performing a prepass
> over the scene with a compute shader before it is rendered, in order to determine which light
> sources affect each pixel. This technique makes it possible to efficiently render scenes with a large
> number of light sources.
> The following example, which uses DirectX 12 and deferred lighting to render a scene with many
> light sources, shows how using asynchronous shaders for the lighting pre-pass improves
> performance by 10%.
> 
> Demonstration of deferred lighting using DirectX 12 and asynchronous shaders:
> 
> _Measured in AMD internal application - D3D12_AsyncCompute. Test System Specifications: Intel i7 4960X, 16GB
> DDR3 1866 MHz, X79 motherboard, AMD Radeon R9 Fury X 4GB, Windows 10 v10130_


And given that most engines, especially those that are designed around the next gens, are moving from forward to deferred rendering this advantage packs a lot of punch.

I'm not a developer but believe that the major advantage to be had from these lighting algorithms like global illumination/voxel tracing/global shading ect. could be to cease the need for inaccurate and computationally bloated effects like ambient occlusion - it should also rid the need for baked light maps and any type of pre-processed lighting/shadowing as those methods under deferred lighting are basically simplex alternations of their big brother ray tracing (specifically voxel cone tracing) and almost all of it can be done within one or two of these chosen algorithms - rendered earlier in the pipe line (which opens up the ability to manipulate it fully as the frame is rendered further).


----------



## ku4eto

Quote:


> Originally Posted by *pengs*
> 
> Mahigan,
> Excellent post there.
> And given that most engines, especially those that are designed around the next gens, are moving from forward to deferred rendering this advantage packs a lot of punch.
> I'm not a developer but believe that the major advantage from these lighting algorithms like global illumination/voxel tracing/global shading ect. are to cease the need for inaccurate and computationally bloated effects like ambient occlusion - it should also rid the need for baked light maps and any type of pre-processed lighting/shadowing as those methods under deferred lighting are basically simplex alternations of their big brother ray tracing (specifically voxel cone tracing) and almost all of it can be done within one or two of these chosen algorithms - rendered earlier in the pipe line (which opens up the ability to manipulate it fully as the frame is rendered further).


Quote:


> Originally Posted by *pengs*
> 
> Mahigan,
> Excellent post there.
> And given that most engines, especially those that are designed around the next gens, are moving from forward to deferred rendering this advantage packs a lot of punch.
> 
> I'm not a developer but believe that the major advantage from these lighting algorithms like global illumination/voxel tracing/global shading ect. are to cease the need for inaccurate and computationally bloated effects like ambient occlusion - it should also rid the need for baked light maps and any type of pre-processed lighting/shadowing as those methods under deferred lighting are basically simplex alternations of their big brother ray tracing (specifically voxel cone tracing) and almost all of it can be done within one or two of these chosen algorithms - rendered earlier in the pipe line (which opens up the ability to manipulate it fully as the frame is rendered further).


Where is this image from? I want to see the original.


----------



## Mahigan

Quote:


> Originally Posted by *pengs*
> 
> Mahigan,
> Excellent post there.
> And given that most engines, especially those that are designed around the next gens, are moving from forward to deferred rendering this advantage packs a lot of punch.
> 
> I'm not a developer but believe that the major advantage to be had from these lighting algorithms like global illumination/voxel tracing/global shading ect. could be to cease the need for inaccurate and computationally bloated effects like ambient occlusion - it should also rid the need for baked light maps and any type of pre-processed lighting/shadowing as those methods under deferred lighting are basically simplex alternations of their big brother ray tracing (specifically voxel cone tracing) and almost all of it can be done within one or two of these chosen algorithms - rendered earlier in the pipe line (which opens up the ability to manipulate it fully as the frame is rendered further).


It is the lighting we see in Ashes of the Singularity as each unit is firing upon other units. that Lightning weapon as well.


----------



## Mahigan

Quote:


> Originally Posted by *ku4eto*
> 
> Where is this image from? I want to see the original.


From an AMD internal application. I obtained the image from the AMD Asynchronous Shaders White Paper found here: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Asynchronous-Shaders-White-Paper-FINAL.pdf


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> If you look at it from the lens of one Corporation against another (partisanship lens) then evidently you will conclude that the playing field will be more or less evened.
> 
> If you look at it from the lens of a consumer sitting on 2 year old hardware, you can conclude that you will be able to play the latest and greatest DirectX 12 titles (provided you're sitting on GCN hardware).
> 
> The titles will begin releasing in the fall of 2015 and Q1 2016. These are titles built for DirectX 12. Not in 2017. In 2016 we have a new AMD architecture releasing therefore you won't need to wait for Volta in 2017. Pascal may or may not incorporate improvements. I think Pascal will but several people disagree with me.
> By evening the playing field, by eliminating key nVIDIA advantages, it is AMDs savior. Not that I care as much as I do being able to play these titles on a 4K screen with Dual Radeon R9 290x using Multi-Adapter Split Frame Rendering, another key feature of DirectX 12.
> 
> I do care going forward however. An even playing field means more competition which means a drop in GPU prices. This I care about.
> 
> The only nVIDIA graphics cards capable of playing these titles, with decent settings, will be the GTX 970/980/980 Ti. All the other cards will be rendered obsolete. Which significantly reduces the player base. Prompting many people to buy new cards, on the nVIDIA side of things at least.


"The titles will begin releasing in the fall of 2015 and Q1 2016"

What Titles? What games are confirmed to be using DX12? The only one i know of this year is Fable, and that's being ported to it from what i understand, they didn't build the game with it in mind.

"If you look at it from the lens of a consumer sitting on 2 year old hardware, you can conclude that you will be able to play the latest and greatest DirectX 12 titles (provided you're sitting on GCN hardware)."

I still don't know how you're arriving at that conclusion based on this benchmark. The benefits of DX12 (in this benchmark) are actually pretty disappointing, all it seems to be doing is nullifying AMD's weakness in DX11, it's not giving a massive performance improvement.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*


And there it is. I knew I wouldn't be disappointed. That picture makes an appearance in pretty much every DX12 thread.

I don't understand what people think that image proves. Everyone knows AMD was involved in developing DX12, along with Nvidia and Intel, so that's hardly revelatory information, and if you incorporate portions of someone's code in your standard you are probably going to include that in the documentation also. Yet somehow people take it to mean that DX12 is just Mantle renamed, or that DX12 couldn't have existed without it. If I quote the Gettysburg Address in a term paper, does that mean Lincoln wrote it?


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> And there it is. I knew I wouldn't be disappointed. That picture makes an appearance in pretty much every DX12 thread.
> 
> I don't understand what people think that image proves. Everyone knows AMD was involved in developing DX12, along with Nvidia and Intel, so that's hardly revelatory information, and if you incorporate portions of someone's code in your standard you are probably going to include that in the documentation also. Yet somehow people take it to mean that DX12 is just Mantle renamed, or that DX12 couldn't have existed without it. If I quote the Gettysburg Address in a term paper, does that mean Lincoln wrote it?


I showed you the API calls, didn't you read them?

Mantle wasn't renamed, Microsoft used GCN and Mantle as the basis for programming DirectX 12. The API calls are the same as with Mantle. This is glaringly obvious when looking at the 3D Mark White paper for anyone with any knowledge of programming.

This shouldn't shock or surprise anyone. It also shouldn't be putting you on the defensive. What is wrong with Microsoft designing DirectX 12 so as to take the most advantage out of their own Xbox One console while retaining PC compatibility across a while array of DirectX 11 Graphics cards?

Also it is not portions of the code. Those are the API calls. You make calls to the DirectX 12 API in the same way you do Mantle. Sure the command wording is different but the calls are the same.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> "The titles will begin releasing in the fall of 2015 and Q1 2016"
> 
> What Titles? What games are confirmed to be using DX12? The only one i know of this year is Fable, and that's being ported to it from what i understand, they didn't build the game with it in mind.
> 
> "If you look at it from the lens of a consumer sitting on 2 year old hardware, you can conclude that you will be able to play the latest and greatest DirectX 12 titles (provided you're sitting on GCN hardware)."
> 
> I still don't know how you're arriving at that conclusion based on this benchmark. The benefits of DX12 (in this benchmark) are actually pretty disappointing, all it seems to be doing is nullifying AMD's weakness in DX11, it's not giving a massive performance improvement.


Ark: Survival Evolved is getting a DirectX 12 patch
Fable Legends is releasing this Fall

This is followed by:
Q1 2016 - Deus Ex: Mankind Divided, Gears of War: Ultimate Edition
Q2 2016 - Sea of Thieves
Q1 2016 - Frostbite engine (and all games releasing in 2016 which are built upon it)

And of course the Various console ports coming. Therefore the games start arriving in Q4 2015 and will continue to be released in Q1 2016, Q2 2016, Q3 2016, Q4 2016.

By the time 2017 arrives, there should be quite a few titles running on DirectX 12 already.

As for seeing DirectX 12 as a performance upgrade for GCN. By tapping into the dormant Compute capabilities of GCN, DirectX 12 is allowing GCN users to play games which make use of far more Post Processing Effects, Lighting effects etc than what was previously possible under DirectX 11. So you're getting nicer looking games, with better effects, at better frame rates. I'd say that this is an upgrade. It's like releasing a software patch which boosts performance by an incredible amount.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> I showed you the API calls, didn't you read them?
> 
> Mantle wasn't renamed, Microsoft used GCN and Mantle as the basis for programming DirectX 12. The API calls are the same as with Mantle. This is glaringly obvious when looking at the 3D Mark White paper for anyone with any knowledge of programming.
> 
> This shouldn't shock or surprise anyone. It also shouldn't be putting you on the defensive. What is wrong with Microsoft designing DirectX 12 so as to take the most advantage out of their own Xbox One console while retaining PC compatibility across a while array of DirectX 11 Graphics cards?
> 
> Also it is not portions of the code. Those are the API calls. You make calls to the DirectX 12 API in the same way you do Mantle. Sure the command wording is different but the calls are the same.


I'm not really sure what you are saying here but Microsoft was working on Direct X 12 long before Mantle even existed it was a combined effort from everybody:
Quote:


> The day after the D3D12 keynote, I got on the phone with Tony Tamasi, Nvidia's Senior VP of Content and Technology. He told me D3D12 had been in in the works for "more than three years" (longer than Mantle) and that "everyone" had been involved in its development. As he pointed out, people from AMD, Nvidia, Intel, and even Qualcomm stood on stage at the D3D12 reveal keynote. Those four companies' logos are also featured prominently on the current landing page for the official DirectX blog:
> 
> 
> 
> Tamasi went on to note that, since development cycles for new GPUs span "many years," there was "no possible way" Microsoft could have slapped together a new API within six months of Mantle's public debut.


http://techreport.com/review/26239/a-closer-look-at-directx-12
Quote:


> • Microsoft's first demonstrations of working DX12 software (3DMark and a Forza demo, Forza being a port from the AMD-powered Xbox One), were running on an nVidia GeForce Titan card, not AMD (despite the Xbox One connection and the low-level API work done there).
> 
> • For these two applications to be ported to DX12, the API and drivers had to have been reasonably stable for a few months before the demonstration. Turn 10, developers of Forza, claimed that the port to DX12 was done in about 4 man-months.
> 
> • nVidia has been working on lowering CPU-overhead with things like bindless resources in OpenGL since 2009 at least.
> 
> • AMD has yet to reveal the Mantle API to the general public. Currently only insiders know exactly what the API looks like. So far AMD has only given a rough global overview in some presentations, which were released only a few months ago. And actual beta drivers have only been around since January 30th. Microsoft/nVidia could only have copied its design through corporate espionage and/or reverse engineering in an unrealistically short timeframe


https://scalibq.wordpress.com/2014/03/27/who-was-first-directx-12-or-mantle-nvidia-or-amd/


----------



## Themisseble

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> I'm not really sure what you are saying here but Microsoft was working on Direct X 12 long before Mantle even existed it was a combined effort from everybody:
> http://techreport.com/review/26239/a-closer-look-at-directx-12
> https://scalibq.wordpress.com/2014/03/27/who-was-first-directx-12-or-mantle-nvidia-or-amd/


Yeah sure and MS never lies.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Themisseble*
> 
> Yeah sure and MS never lies.


Direct X 12 debuted on an NVIDIA GTX Titan...


----------



## semitope

The next hitman game coming in december is also dx12

http://www.pcgameshardware.de/Hitman-Spiel-6333/Videos/Glacier-2-DirectX-12-PBR-1167179/

star wars battlefront will support mantle minimum I think. It should support dx12 as well.


----------



## sugarhell

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Direct X 12 debuted on an NVIDIA GTX Titan...


So your point?


----------



## BiG StroOnZ

Quote:


> Originally Posted by *sugarhell*
> 
> So your point?


Point is pretty clear, Direct X 12 was worked on before Mantle even existed. Sources explain this quite clear. The API was featured on an NVIDIA graphics card, not on an AMD Graphics card. There is no Mantle to Direct X 12 conspiracy going on here.


----------



## Forceman

Quote:


> Originally Posted by *Themisseble*
> 
> Yeah sure and MS never lies.


Why would Microsoft lie about when they started work on DX12? At least three other companies know when it was (AMD, Nvidia, Intel) so lying hardly makes any sense.


----------



## semitope

Directx 12 (or whatever it was called) surely was being worked on. But the form it has taken is whats in question.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Direct X 12 debuted on an NVIDIA GTX Titan...


Running an Xbox exclusive title. Microsoft's logic behind tech demos cannot be fathomed, that point is a wash for both of you.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> Running an Xbox exclusive title. Microsoft's logic behind tech demos cannot be fathomed, that point is a wash for both of you.


But you would assume if there was a GNC connection to the DX12 API it would make more sense to use a GNC GPU, would it not?


----------



## sugarhell

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Point is pretty clear, Direct X 12 was worked on before Mantle even existed. Sources explain this quite clear. The API was featured on an NVIDIA graphics card, not on an AMD Graphics card. There is no Mantle to Direct X 12 conspiracy going on here.


Because they showcased a small test on a nvidia gpu?

And this was nvidia talking how they tried to reduce cpu overhead over the time on all the dx versions.

Somehow microsoft released Xbone with dx11.x a modified version of dx11 and it failed miserable. And 2 years after they release dx12.

All the sources are not evidence i dont believe any PR machine especially from microsoft.

the only evidences are here:

http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf

90-93


----------



## BiG StroOnZ

Quote:


> Originally Posted by *sugarhell*
> 
> Because they showcased a small test on a nvidia gpu?
> 
> And this was nvidia talking how they tried to reduce cpu overhead over the time on all the dx versions.
> 
> Somehow microsoft released Xbone with dx11.x a modified version of dx11 and it failed miserable. And 2 years after they release dx12.
> 
> All the sources are not evidence i dont believe any PR machine especially from microsoft.
> 
> the only evidences are here:
> 
> http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
> 
> 90-93


So what does that reveal? AMD worked on DirectX 12 alongside NVIDIA, Intel, and Qualcomm. What you are seeing is AMD's contribution to the project. The project began before Mantle even started.


----------



## GorillaSceptre

So there's at least 4 games, it's more than i thought, but none of those games are "built for DX12". The real DX12 games are still a couple years off.

In any case, AMD is now on par (the Fury X anyway) with NV, but only in a benchmark of a game that's in alpha, which Nvidia said was not a true representation of DX12 performance.

Mahigan has posted some very compelling arguments, but in the end AMD's allegedly superior architecture just barely outperforms the 980 Ti. So if both Nvidias and AMD's GPU's are bottle-necked for different reasons, then how is AMD in a better position?


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Dani011093*
> 
> indeed when is aid 90% of the benchmarks, i meant it. artechnica are the ONLY benches out there, that show those numbers, that do not really make sense with the rest of the benches out there... read the one form computerbase.de (also the ones you posted) and the ones on the first post, all show similar numbers except for theese. (When i check for benchmarks i just take many and get with the avergaes, and seriously avergaes show 980 competing against 390x... and 980ti competing with fury X)..
> 
> Also, the fury X vs 980 ti bench you posted, shows the Fury X winning by a extremely small margin on some tests ant the 980Ti winning by a even smaller margin on some dx12 test (pretty sure 1080P... all in all they show SAME performance.
> 
> 
> 
> Dani,
> 
> We've been discussing the topic of Fury-X vs 290/390x. See CrazyElf's post above. We've been attempting to figure out the likely culprit. Also worth pointing out that ArsTechnica may be the only website to show Radeon R9 290x keeping up with the GTX 980 Ti but they're also the only website to bench the two cards against one another.
> 
> 
> 
> Based on how well a Radeon R9 390 performs, relative to a GTX 980 Ti under the http://www.computerbase.de/ tests, available here: http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/#diagramm-normale-anzahl-an-draw-calls, we can conclude that the ArsTechnica numbers are valid.
> 
> How? If we assume a compute bottleneck and thus look to the ALU count differences between the R9 390 and R9 390x, we can see a 10% shader increase with the R9 390x. If we multiply the Frames per Second (48.5) by this 10% we achieve a number of 4.85 FPS. If we add 4.85FPS to 48.5FPS we get a result of 53.35FPS. This 53.4FPS figure is within striking distance of the GTX 980 Ti's 55.4FPS. This is what the ArsTechnica review showed.
Click to expand...

it doesn't show only the 390X close to the 980ti performance but even closer to the FuryX performance and that's what i don't understande what's the point for AMD (that knows what AS can do and what can bottleneck it ) to release a Card like the furyX for 650 bucks and that same card being neck to neck with the 390X ???


----------



## Kuivamaa

Quote:


> Originally Posted by *Forceman*
> 
> And there it is. I knew I wouldn't be disappointed. That picture makes an appearance in pretty much every DX12 thread.
> 
> I don't understand what people think that image proves. Everyone knows AMD was involved in developing DX12, along with Nvidia and Intel, so that's hardly revelatory information, and if you incorporate portions of someone's code in your standard you are probably going to include that in the documentation also. Yet somehow people take it to mean that DX12 is just Mantle renamed, or that DX12 couldn't have existed without it. If I quote the Gettysburg Address in a term paper, does that mean Lincoln wrote it?


We also have the 3DMark tech guide.

http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf

Check page 91 for DX12 path, 92 for Mantle path and compare them to DX11. The first two paths (DX12 and Mantle) are nearly identical (minus the syntax and probably some semantics). DX11 on the other hand is largely different (basically it bears no resemblance to the other two) and from my little knowledge, I must say it looks closer to OGL... So yeah, Mantle and DX12 come from the same lineage. Basically it seems that DX12 branched out from Mantle sometime before the whole Mantle trunk became Vulkan, to speak in software development terms.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> But you would assume if there was a GNC connection to the DX12 API it would make more sense to use a GNC GPU, would it not?


Microsoft has always used one or more of Nvidia's flagship cards to run Xbox demos as a brute force, take no chances method of ensuring high performance on an unoptimized console title. I also think Nvidia provides them with a custom driver specifically for these demos. For a price, of course. Microsoft likes to tuck away their demo machines out of the public eye, because Xbox demos aren't supposed to be running on a PC, so Nvidia only gets the rep when some nosy blogger pokes his head in the cabinet when no one is looking, they're not doing it for the exposure.

Less to do with GCN vs other architectures, more to do with an established relationship for a niche service.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> Microsoft has always used one or more of Nvidia's flagship cards to run Xbox demos as a brute force, take no chances method of ensuring high performance on an unoptimized console title. I also think Nvidia provides them with a custom driver specifically for these demos. For a price, of course. Microsoft likes to tuck away their demo machines out of the public eye, because Xbox demos aren't supposed to be running on a PC, so Nvidia only gets the rep when some nosy blogger pokes his head in the cabinet when no one is looking, they're not doing it for the exposure.
> 
> Less to do with GCN vs other architectures, more to do with an established relationship for a niche service.


I think you are downplaying the issue here, by using a dissuasive argument.

Someone claimed "Microsoft used GCN and Mantle as the basis for programming DirectX 12," they debuted the DirectX 12 API on a GTX Titan, not an AMD GNC capable graphics card. It would be in absolutely no interest for Microsoft to use a GPU that wasn't a GNC architecture graphics card if this was the case. Pretty simple logic here.


----------



## Forceman

Quote:


> Originally Posted by *Kuivamaa*
> 
> We also have the 3DMark tech guide.
> 
> http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
> 
> Check page 91 for DX12 path, 92 for Mantle path and compare them to DX11. The first two paths (DX12 and Mantle) are nearly identical (minus the syntax and probably some semantics). DX11 on the other hand is largely different (basically it bears no resemblance to the other two) and from my little knowledge, I must say it looks closer to OGL... So yeah, Mantle and DX12 come from the same lineage. Basically it seems that DX12 branched out from Mantle sometime before the whole Mantle trunk became Vulkan, to speak in software development terms.


I looked at it. It looks similar, but I'm not a programmer so I don't know how significant that is. Three lines that illustrate how a lighting or shadow call is accomplished doesn't seem like a smoking gun to me. In any case, no one is disputing that AMD worked on DX12.


----------



## Ganf

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> I think you are downplaying the issue here, by using a dissuasive argument.
> 
> Someone claimed "Microsoft used GCN and Mantle as the basis for programming DirectX 12," they debuted the DirectX 12 API on a GTX Titan, not an AMD GNC capable graphics card. It would be in absolutely no interest for Microsoft to use a GPU that wasn't a GNC architecture graphics card if this was the case. Pretty simple logic here.


I think the irony is that someone claimed that Mantle was the source for DX12 for the 48th time and you rose to the bait for the 48th time, but that's just me.


----------



## HalGameGuru

I get the feeling, personally, that the Fury line isn't really focused on the 290/390 or 980/980Ti paradigm.

With its focus on shaders, ACE, and HBM, I feel like its more focused on VR. And VR is where I think we will see its hardware spec show more improvement vs traditional gaming GPU's. High bandwidth, Asynchronous Compute, improved scheduling and power delivery, for graphics that are representing the same 3d rendered space but from different angles requiring more shader and texture horsepower than raw polygons. And keeping the frame latency and variance low to maintain smooth operation.

I think Fury is more of a LiquidVR focused product than straight gaming. At least that would be the main reason, in my eyes, why they would make a more incremental architectural improvement while putting new tech and focusing on small form factor.

Just my POV, from my angle, I think AMD hopes VR is somewhere it can drive new market share, where there is no established brand loyalty and market share disparity. VR shows to be a tech that will span far more than just the gaming market.


----------



## Redeemer

Quote:


> Originally Posted by *Ganf*
> 
> I think the irony is that someone claimed that Mantle was the source for DX12 for the 48th time and you rose to the bait for the 48th time, but that's just me.


Or maybe some just cannot handle the fact that AMD is more involved in the DX12 process with MS than Nvidia is, after all GCN is in the Xbox One as well


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Ganf*
> 
> I think the irony is that someone claimed that Mantle was the source for DX12 for the 48th time and you rose to the bait for the 48th time, but that's just me.


Then maybe perhaps more people should have corrected that person baiting for the 48th time, because it has been sourced numerous of times already that Microsoft has had DX12 in the works for quite some time, before Mantle even began. And on top of that, that yes, AMD contributed to the development of DX12, but as did NVIDIA, Intel, and Qualcomm. So it should come as no surprise to anyone to see similar code in some areas. Nevertheless, NVIDIA has been doing work on lowering CPU overhead since 2009, the forte of DX12. Which also happens to be the forte of Mantle. So should we start saying AMD copied NVIDIA with Mantle because they tried to lower CPU overhead?


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I looked at it. It looks similar, but I'm not a programmer so I don't know how significant that is. Three lines that illustrate how a lighting or shadow call is accomplished doesn't seem like a smoking gun to me. In any case, no one is disputing that AMD worked on DX12.


GCN is the basis for Xbox One.

Lets look at the timeline...

How long has the Xbox One been in development for (Around May of 2011 based on Microsoft job postings)?
When was Graphics Core Next (GCN) announced (December of 2011)?
When was AMD Mantle introduced (Sept of 2013)?
How long has DirectX 12 been in development for (2 years before AMD Mantle according to your source or 2011)?

Notice a pattern?

The reasons why Calls to the Mantle API and Calls to the DirectX 12 API are so similar is because they're both tied to a project which AMD and Microsoft began work on in 2011. A project which required the use of ACEs (Asynchronous Compute Engines) in order to allow for cinematic effects, on a console, with limited hardware resources on hand. This project was the XBox One.

What ties both DirectX 12 and Mantle together is this relationship.

Speculation: Perhaps Microsoft was not going to release this project for the PC. AMD forced their hand by releasing Mantle onto the PC platform. Once Microsoft announced DirectX 12, work on Mantle halted. Coincidence? Perhaps... but those are a heck of a lot of coincidences. Even Microsoft's own words push this idea:http://www.windowscentral.com/xboxs-phil-spencer-microsoft-has-maybe-lost-our-way-pc-gaming
Quote:


> I wanted to have the opportunity to come here, because there have been times in our past where Microsoft has maybe lost our way with PC gaming. But what people have done on PC is critical to our success, and critical to Windows' success. So it's great to get the opportunity to come here and talk directly to the fans and the press.


Quote:


> We have Windows 10 coming out in July, and one of the early moves was to make it a free upgrade. Really, we thought about that from a developers' standpoint, that as developers look at a common ecosystem, with everybody on one version of Windows, it just makes it easier for people who are developing games. Building DirectX 12 and making it common across our platform, and Xbox Live with the same API set and same service - we're just trying to make it easier for developers as they're developing Windows game


Quote:


> There are a lot of opportunities for cross-platform, but I also think there are games that exist on a television, and there are games that exist with a keyboard and mouse on the PC. It's not our job to dictate where games are developed or the kinds of games developers want to build. But giving developers the options, the opportunity, creating the widest canvas we can for creativity - what I've seen, in my time in the gaming space, is that that leads to the best games. And I think that's why we're all here.


- head of Xbox Phil Spencer

This also happened in 2011
Microsoft and Nvidia abandon PC Gaming Alliance: http://www.pcauthority.com.au/News/248875,microsoft-and-nvidia-abandon-pc-gaming-alliance.aspx

After Mantle struck in 2013... Microsoft changed its tune
Quote:


> Microsoft Studios head Phil Spencer told an audience at the Game Developers Conference in San Francisco today that the company will be putting more effort into PC gaming, though he stopped short of offering any specific details.


http://www.neowin.net/news/microsoft-says-it-has-renewed-focus-on-pc-gaming-details-coming-this-summer
Quote:


> A renewed focus on Windows and PC gaming inside Microsoft is definitely happening. You will see more focus from us - not to go compete with what Valve has done, but because we also understand as the platform holder it's important for us to invest in the platform in a real way. We're fundamentally committed to that.


- head of Xbox Phil Spencer


----------



## Kuivamaa

Quote:


> Originally Posted by *Forceman*
> 
> I looked at it. It looks similar, but I'm not a programmer so I don't know how significant that is. Three lines that illustrate how a lighting or shadow call is accomplished doesn't seem like a smoking gun to me. In any case, no one is disputing that AMD worked on DX12.


Let's say that with lighting,shading (and naturally texturing since light and shadow maps are more or less part of it) being largely the same ,we already have half the pipeline







It is a very,very smoking gun.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> GCN is the basis for Xbox One.
> 
> Lets look at the timeline...
> 
> How long has the Xbox One been in development for (Around May of 2011 based on Microsoft job postings)?
> When was Graphics Core Next (GCN) announced (December of 2011)?
> When was AMD Mantle introduced (Sept of 2013)?
> How long has DirectX 12 been in development for (2 years before AMD Mantle according to your source or 2011)?
> 
> Notice a pattern?
> 
> The reasons why Calls to the Mantle API and Calls to the DirectX 12 API are so similar is because they're both tied to a project which AMD and Microsoft began work on in 2011. A project which required the use of ACEs (Asynchronous Compute Engines) in order to allow for cinematic effects, on a console, with limited hardware resources on hand. This project was the XBox One.
> 
> What ties both DirectX 12 and Mantle together is this relationship.
> 
> Speculation: Perhaps Microsoft was not going to release this project for the PC. AMD forced their hand by releasing Mantle onto the PC platform. Once Microsoft announced DirectX 12, work on Mantle halted. Coincidence? Perhaps... but those are a heck of a lot of coincidences. Even Microsoft's own words push this idea:http://www.windowscentral.com/xboxs-phil-spencer-microsoft-has-maybe-lost-our-way-pc-gaming
> 
> - head of Xbox Phil Spencer


Also Mantle did not just Pop to existence lol.


----------



## Forceman

Quote:


> Originally Posted by *Kuivamaa*
> 
> Let's say that with lighting,shading (and naturally texturing since light and shadow maps are more or less part of it) being largely the same ,we already have half the pipeline
> 
> 
> 
> 
> 
> 
> 
> It is a very,very smoking gun.


Again, I'm not disputing that AMD was involved in DX12 development. Obviously they were. But there are plenty of posters that seem to think Microsoft was sitting around the day Mantle was announced and suddenly said, "oh man, we'd better start making DX12". GCN may have led to the development of DX12, but the Mantle announcment launch in October 2013 certainly did not.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Also Mantle did not just Pop to existence lol.


Exactly.

It clearly stems from work Microsoft and AMD did for the API for the XBox One. DirectX 12 was also born out of the same project.

It seems that Mantle was released onto the PC because Microsoft had turned its back on the PC market in favor of the console market. That's what all the articles and words from Microsoft's own executives highlight.

Mantle and DirectX 12 are the offspring of AMD and Microsoft collaboration on the Xbox One. They're not entirely the same, they're like siblings. Born out of the same genetic lineage. Both are based on the same original code which AMD and Microsoft worked on for the XBox One console.


----------



## JunkoXan

I'd be assuming Twins is the correct analogy/metaphor as Twins are identical but at the smallest level they're different to give them individuality?


----------



## Kuivamaa

Quote:


> Originally Posted by *Forceman*
> 
> Again, I'm not disputing that AMD was involved in DX12 development. Obviously they were. But there are plenty of posters that seem to think Microsoft was sitting around the day Mantle was announced and suddenly said, "oh man, we'd better start making DX12". GCN may have led to the development of DX12, but the Mantle announcment launch in October 2013 certainly did not.


"Involved in DX12 development" is an understatement, this is what I am trying to say. It seems to me that MS scrapped whatever they had up until that point (some time in early 2013 my guess), took Mantle as basis and started tailoring it to their needs.


----------



## Mahigan

Quote:


> Originally Posted by *JunkoXan*
> 
> I'd be assuming Twins is the correct analogy/metaphor as Twins are identical but at the smallest level they're different to give them individuality?


Twins works even better.


----------



## patrickjp93

Quote:


> Originally Posted by *Kuivamaa*
> 
> We also have the 3DMark tech guide.
> 
> http://s3.amazonaws.com/download-aws.futuremark.com/3DMark_Technical_Guide.pdf
> 
> Check page 91 for DX12 path, 92 for Mantle path and compare them to DX11. The first two paths (DX12 and Mantle) are nearly identical (minus the syntax and probably some semantics). DX11 on the other hand is largely different (basically it bears no resemblance to the other two) and from my little knowledge, I must say it looks closer to OGL... So yeah, Mantle and DX12 come from the same lineage. Basically it seems that DX12 branched out from Mantle sometime before the whole Mantle trunk became Vulkan, to speak in software development terms.


Alright, to correct the record here, DX 12 is not a copy of Mantle. Mantle only had the functions usable in a Direct3D portion anyway. Between DirectCompute and everything else, there's a Hell of a lot of DX 12 that isn't Mantle. And even if you wanted to claim they are exactly the same, we do not yet have access to the DX 12 source, and we won't until the likes of me disassemble the whole damn thing and analyze it, which will take many games and hundreds of hours of study and comparison. Further, Microsoft was developing DX 12 as a whole for years. It's too prideful and, frankly, too much better at programming than AMD and game studios to actually just copy it and tweak the edges. Anyone arguing otherwise is operating under a delusion.

There's some fact presented hear about the advised use of it, but that's low level access is in a perfect world: total control down to the metal. It's not going to look different from a pipeline perspective anyway. There was a logical order already in place from DX 11 for stages to play out in. The difference, algorithmically/procedurally speaking, is not very large between DX 11 and DX 12. The control and depth levels have changed, but procedurally it's very much the same. Until we have hard evidence that can be independently confirmed at anyone's leisure, we cannot conclude that DX 12 is Mantle with frills. Everyone on here needs to stop selling this like fact, because it's not fact. There are plenty of good reasons why it's not likely it will ever be fact given earlier on. Just stop. I will admit I don't have experience programming in DX itself, but I do have plenty in analyzing APIs, programming paradigms, using them to build multi-thousand line programs which complete very complex tasks in the most efficient manner possible. I analyze algorithms all day, and I can tell you just because two systems have the same general shape is not conclusive proof they are the same system or based on the same design principles. There are plenty of lawsuits which have claimed with the same level of evidence and argument you present in patent disputes, and they've been dismissed. This isn't fact; you're a less than stellar logician with a nice source but no strong synthesis of it, as it's incomplete with regards to your argument, and this really should be the end of it until you come up with actual proof, concrete evidence.


----------



## sugarhell

I think the logic behind dx12 pipeline and how the command queue works are pure mantle. All the rendering methods and the feature levels are from all the IHVs


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> Exactly.
> 
> It clearly stems from work Microsoft and AMD did for the API for the XBox One. DirectX 12 was also born out of the same project.
> 
> It seems that Mantle was released onto the PC because Microsoft had turned its back on the PC market in favor of the console market. That's what all the articles and words from Microsoft's own executives highlight.
> 
> Mantle and DirectX 12 are the offspring of AMD and Microsoft collaboration on the Xbox One. They're not entirely the same, they're like siblings. Born out of the same genetic lineage. Both are based on the same original code which AMD and Microsoft worked on for the XBox One console.


Most likely Microsoft told AMD and Nvidia that DX12 is not coming anytime soon "because we are supporting Xbox". Nvidia was fine with DX11 while AMD took API in their own hand and speed up the process with Mantle.


----------



## PontiacGTX

Quote:


> Originally Posted by *Forceman*
> 
> And there it is. I knew I wouldn't be disappointed. That picture makes an appearance in pretty much every DX12 thread.
> 
> I don't understand what people think that image proves. Everyone knows AMD was involved in developing DX12, along with Nvidia and Intel, so that's hardly revelatory information, and if you incorporate portions of someone's code in your standard you are probably going to include that in the documentation also. Yet somehow people take it to mean that DX12 is just Mantle renamed, or that DX12 couldn't have existed without it. If I quote the Gettysburg Address in a term paper, does that mean Lincoln wrote it?


then why microsoft would try to use a variation of DIrectx 12 in the xbox if they were working on a low level apI? then directx 11-x isnt good enough?


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Kuivamaa*
> 
> "Involved in DX12 development" is an understatement, this is what I am trying to say. It seems to me that MS scrapped whatever they had up until that point (some time in early 2013 my guess), took Mantle as basis and started tailoring it to their needs.


Only problem as stated, is NVIDIA was also involved with DX12 development and has been working on Reducing CPU Overhead since 2009:

http://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf

Way before Mantle, and way before Microsoft even thought of creating an Xbox One or the Hardware that was going to be inside of it or the API that was going to be used for it.

What's one of the first suggested implementations NVIDIA offers in their assessment to reduce CPU Overhead?

"Increasing the number of Draw Calls per Frame"


----------



## Mahigan

Quote:


> Originally Posted by *patrickjp93*
> 
> Alright, to correct the record here, DX 12 is not a copy of Mantle. Mantle only had the functions usable in a Direct3D portion anyway. Between DirectCompute and everything else, there's a Hell of a lot of DX 12 that isn't Mantle. And even if you wanted to claim they are exactly the same, we do not yet have access to the DX 12 source, and we won't until the likes of me disassemble the whole damn thing and analyze it, which will take many games and hundreds of hours of study and comparison. Further, Microsoft was developing DX 12 as a whole for years. It's too prideful and, frankly, too much better at programming than AMD and game studios to actually just copy it and tweak the edges. Anyone arguing otherwise is operating under a delusion.
> 
> There's some fact presented hear about the advised use of it, but that's low level access is in a perfect world: total control down to the metal. It's not going to look different from a pipeline perspective anyway. There was a logical order already in place from DX 11 for stages to play out in. The difference, algorithmically/procedurally speaking, is not very large between DX 11 and DX 12. The control and depth levels have changed, but procedurally it's very much the same. Until we have hard evidence that can be independently confirmed at anyone's leisure, we cannot conclude that DX 12 is Mantle with frills. Everyone on here needs to stop selling this like fact, because it's not fact. There are plenty of good reasons why it's not likely it will ever be fact given earlier on. Just stop. I will admit I don't have experience programming in DX itself, but I do have plenty in analyzing APIs, programming paradigms, using them to build multi-thousand line programs which complete very complex tasks in the most efficient manner possible. I analyze algorithms all day, and I can tell you just because two systems have the same general shape is not conclusive proof they are the same system or based on the same design principles. There are plenty of lawsuits which have claimed with the same level of evidence and argument you present in patent disputes, and they've been dismissed. This isn't fact; you're a less than stellar logician with a nice source but no strong synthesis of it, as it's incomplete with regards to your argument, and this really should be the end of it until you come up with actual proof, concrete evidence.


GCN is the basis for Xbox One.

Lets look at the timeline...

How long has the Xbox One been in development for (Around May of 2011 based on Microsoft job postings)?
When was Graphics Core Next (GCN) announced (December of 2011)?
When was AMD Mantle introduced (Sept of 2013)?
How long has DirectX 12 been in development for (2 years before AMD Mantle according to Microsoft or 2011)?

Notice a pattern?

The reasons why Calls to the Mantle API and Calls to the DirectX 12 API are so similar is because they're both tied to a project which AMD and Microsoft began work on in 2011. A project which required the use of ACEs (Asynchronous Compute Engines) in order to allow for cinematic effects, on a console, with limited hardware resources on hand. This project was the XBox One.

What ties both DirectX 12 and Mantle together is this relationship.

Speculation: Perhaps Microsoft was not going to release this project for the PC. AMD forced their hand by releasing Mantle onto the PC platform. Once Microsoft announced DirectX 12, work on Mantle halted. Coincidence? Perhaps... but those are a heck of a lot of coincidences. Even Microsoft's own words push this idea:http://www.windowscentral.com/xboxs-phil-spencer-microsoft-has-maybe-lost-our-way-pc-gaming
Quote:


> I wanted to have the opportunity to come here, because there have been times in our past where Microsoft has maybe lost our way with PC gaming. But what people have done on PC is critical to our success, and critical to Windows' success. So it's great to get the opportunity to come here and talk directly to the fans and the press.


Quote:


> We have Windows 10 coming out in July, and one of the early moves was to make it a free upgrade. Really, we thought about that from a developers' standpoint, that as developers look at a common ecosystem, with everybody on one version of Windows, it just makes it easier for people who are developing games. Building DirectX 12 and making it common across our platform, and Xbox Live with the same API set and same service - we're just trying to make it easier for developers as they're developing Windows game


Quote:


> There are a lot of opportunities for cross-platform, but I also think there are games that exist on a television, and there are games that exist with a keyboard and mouse on the PC. It's not our job to dictate where games are developed or the kinds of games developers want to build. But giving developers the options, the opportunity, creating the widest canvas we can for creativity - what I've seen, in my time in the gaming space, is that that leads to the best games. And I think that's why we're all here.


- head of Xbox Phil Spencer

This also happened in 2011
Microsoft and Nvidia abandon PC Gaming Alliance: http://www.pcauthority.com.au/News/248875,microsoft-and-nvidia-abandon-pc-gaming-alliance.aspx

After Mantle struck in 2013... Microsoft changed its tune
Quote:


> Microsoft Studios head Phil Spencer told an audience at the Game Developers Conference in San Francisco today that the company will be putting more effort into PC gaming, though he stopped short of offering any specific details.


http://www.neowin.net/news/microsoft-says-it-has-renewed-focus-on-pc-gaming-details-coming-this-summer
Quote:


> A renewed focus on Windows and PC gaming inside Microsoft is definitely happening. You will see more focus from us - not to go compete with what Valve has done, but because we also understand as the platform holder it's important for us to invest in the platform in a real way. We're fundamentally committed to that.


- head of Xbox Phil Spencer

Both AMD Mantle and DirectX 12 share the same basic DNA. Born were born out of the same project. Of course both headed into slightly different directions but the reason they're so similar is because they share a "parental" lineage. Like twins or siblings. Made up of the same DNA mixtures at their core.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Only problem as stated, is NVIDIA was also involved with DX12 development and has been working on Reducing CPU Overhead since 2009:
> 
> http://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf
> 
> Way before Mantle, and way before Microsoft even thought of creating an Xbox One or the Hardware that was going to be inside of it or the API that was going to be used for it.
> 
> What's one of the first suggested implementations NVIDIA offers in their assessment to reduce CPU Overhead?
> 
> "Increasing the number of Draw Calls per Frame"


No ACEs and a lack for Asynchronous shading prior to Maxwell's HyperQ revision point to nVIDIA not being the likely culprit. It is far more likely that DirectX 12 spawned from a project both Microsoft and AMD were working on. The Xbox One. Evidently DirectX 12 has diverged from that point but at its core it shares a lot of its DNA with AMD Mantle (which also spawned out of that partnership).


----------



## sugarhell

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Only problem as stated, is NVIDIA was also involved with DX12 development and has been working on Reducing CPU Overhead since 2009:
> 
> http://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf
> 
> Way before Mantle, and way before Microsoft even thought of creating an Xbox One or the Hardware that was going to be inside of it or the API that was going to be used for it.
> 
> What's one of the first suggested implementations NVIDIA offers in their assessment to reduce CPU Overhead?
> 
> "Increasing the number of Draw Calls per Frame"


Everyone is working from 2009 on reducing overhead. Reducing overhead doesnt mean dx12. They tried to reduce overhead with dx11 and they failed. They never actually (microsoft) says that the dx12 is on plan or in development like 5 years now


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> No ACEs and a lack for Asynchronous shading prior to Maxwell's HyperQ revision point to nVIDIA not being the likely culprit. It is far more likely that DirectX 12 spawned from a project both Microsoft and AMD were working on. The Xbox One. Evidently DirectX 12 has diverged from that point but at its core it shares a lot of its DNA with AMD Mantle (which also spawned out of that partnership).


Then why do all the feature tests that highlight DX12 or Mantle performance have to do with draw calls? eg. Star Swarm or 3DMark API Overhead Feature Test?

NVIDIA spawned the beginning of reducing CPU overhead back in 2009, before everything you suggest.
Quote:


> Originally Posted by *sugarhell*
> 
> Everyone is working from 2009 on reducing overhead. Reducing overhead doesnt mean dx12. They tried to reduce overhead with dx11 and they failed. They never actually (microsoft) says that the dx12 is on plan or in development like 5 years now


Show me sources of "everyone"


----------



## Kpjoslee

I find it little amusing that so many conclusions has been drawn from how dx12 will perform in gpus from respective vendors to how Nvidia has been left out in the cold during development of dx12 because of its similarities with Mantle's API path, all from single benchmark from single game that is like still 6 month away.


----------



## p4inkill3r

Quote:


> Originally Posted by *Kpjoslee*
> 
> I find it little amusing that so many conclusions has been drawn from how dx12 will perform in gpus from respective vendors to how Nvidia has been left out in the cold during development of dx12 because of its similarities with Mantle's API path, all from single benchmark from single game that is like still 6 month away.


Considering there have been 65k views of this thread, the discussion produced by one single benchmark from one single game has been enriching and informative for the most part.


----------



## ToTheSun!

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> NVIDIA spawned the beginning of reducing CPU overhead back in 2009, before everything you suggest.


A line mentioning an effort in reducing CPU overhead and increasing draw call count in a slide from Nvidia does not mean they were singly and exclusively addressing the problem, and Maxwell 2's hardware being more serial in nature does not mean Nvidia didn't have a hand in DX12.

Both are assumptions.


----------



## patrickjp93

Quote:


> Originally Posted by *Mahigan*
> 
> GCN is the basis for Xbox One.
> 
> Lets look at the timeline...
> 
> How long has the Xbox One been in development for (Around May of 2011 based on Microsoft job postings)?
> When was Graphics Core Next (GCN) announced (December of 2011)?
> When was AMD Mantle introduced (Sept of 2013)?
> How long has DirectX 12 been in development for (2 years before AMD Mantle according to Microsoft or 2011)?
> 
> Notice a pattern?
> 
> The reasons why Calls to the Mantle API and Calls to the DirectX 12 API are so similar is because they're both tied to a project which AMD and Microsoft began work on in 2011. A project which required the use of ACEs (Asynchronous Compute Engines) in order to allow for cinematic effects, on a console, with limited hardware resources on hand. This project was the XBox One.
> 
> What ties both DirectX 12 and Mantle together is this relationship.
> 
> Speculation: Perhaps Microsoft was not going to release this project for the PC. AMD forced their hand by releasing Mantle onto the PC platform. Once Microsoft announced DirectX 12, work on Mantle halted. Coincidence? Perhaps... but those are a heck of a lot of coincidences. Even Microsoft's own words push this idea:http://www.windowscentral.com/xboxs-phil-spencer-microsoft-has-maybe-lost-our-way-pc-gaming
> 
> - head of Xbox Phil Spencer
> 
> This also happened in 2011
> Microsoft and Nvidia abandon PC Gaming Alliance: http://www.pcauthority.com.au/News/248875,microsoft-and-nvidia-abandon-pc-gaming-alliance.aspx
> 
> After Mantle struck in 2013... Microsoft changed its tune
> http://www.neowin.net/news/microsoft-says-it-has-renewed-focus-on-pc-gaming-details-coming-this-summer
> - head of Xbox Phil Spencer
> 
> Both AMD Mantle and DirectX 12 share the same basic DNA. Born were born out of the same project. Of course both headed into slightly different directions but the reason they're so similar is because they share a "parental" lineage. Like twins or siblings. Made up of the same DNA mixtures at their core.


Utter speculation and not proof! This is circumstantial and conflated tortuously! Please tell me no one here is falling for this! Asynchronous shading was discussed all the way in 2009 by researchers at UC Berkeley, none of whom work for any of these parties now. I'm completely flabbergasted that anyone could think this is actual proof.

No, Microsoft wanted money and a reason for people to upgrade to Windows 10. It had nothing to do with AMD forcing its hand, proof of which exists with Nvidia's later improvements to its DX 11 drivers which proceeded to beat AMD's Mantle numbers. Microsoft doesn't give a crap about AMD and an API which barely anyone supports and next to no one was due to additional development costs. AMD didn't have the market numbers nor the potential to seize them to actually have Mantle matter. Thinking AMD can force Microsoft to do anything without legal compulsion is blind stupidity.


----------



## pengs

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Most likely Microsoft told AMD and Nvidia that DX12 is not coming anytime soon "because we are supporting Xbox". Nvidia was fine with DX11 while AMD took API in their own hand and speed up the process with Mantle.


Yeah, it's a difficult situation to decipher because it benefits both AMD and Microsoft. AMD because a low-level API caters directly to the GCN architecture and Microsoft as it umbrellas both the XBO and Windows for unification which could change market but not without complications - AMD demonstrating Mantle with tangible results was probably what convinced Microsoft to go forward with it.
It's tied all together when you put AMD's hardware into Microsoft's machine, each having a hand in the console and desktop segments. It's a mutually beneficial relationship and I don't doubt that AMD would hand over code or give help to Microsoft as a kickstart at the very least.

There's also this little tidbit from early 2014 where AMD makes a statement about DX12 and the future of Mantle:
Microsoft hints that DirectX 12 will imitate Mantle, but AMD insists its API has a bright future

"AMD has released an official statement on the matter, saying:"
Quote:


> Originally Posted by *AMD*
> Yesterday several articles were published that reported that DirectX and OpenGL are being extended to include closer-to-metal functionality and reduced CPU overhead. AMD supports and celebrates a direction for game development that is aligned with AMD's vision of lower-level, 'closer to the metal' graphics APIs for PC gaming. While industry experts expect this to take some time, developers can immediately leverage efficient API design using Mantle, and AMD is very excited to share the future of our own API with developers at this year's Game Developers Conference.


To me this sounds sincere but also steadfast - AMD not wanting to put Mantle down at that time for whatever reason, shareholders, public relations, whatnot.

I also remember reading about Microsoft calling a conference with AMD about the future of DirectX just after Mantle was released, I've looked and looked for this article but can't find it.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *ToTheSun!*
> 
> A line mentioning an effort in reducing CPU overhead and increasing draw call count in a slide from Nvidia does not mean they were singly and exclusively addressing the problem, and Maxwell 2's hardware being more serial in nature does not mean Nvidia didn't have a hand in DX12.
> 
> Both are assumptions.


What is an assumption is claiming that Microsoft began working on DirectX 12 with AMD alone, back at the development stage of the Xbox One. Even though shown here, by Microsoft themselves, multiple parties were involved:


----------



## provost

Quote:


> Originally Posted by *GorillaSceptre*
> 
> So there's at least 4 games, it's more than i thought, but none of those games are "built for DX12". The real DX12 games are still a couple years off.
> 
> In any case, AMD is now on par (the Fury X anyway) with NV, but only in a benchmark of a game that's in alpha, which Nvidia said was not a true representation of DX12 performance.
> 
> Mahigan has posted some very compelling arguments, but in the end AMD's allegedly superior architecture just barely outperforms the 980 Ti. So if both Nvidias and AMD's GPU's are bottle-necked for different reasons, then how is AMD in a better position?


It's all about longevity of the hardware , or how long the fury would remain "optimized" vs the the NV offerings. It appears that dx 12 is more favorable to GCN than to any Nvidia cards at this time. So, even if AMD comes out with a better card in 2016, Fury shall continue to be relevant much longer than Maxwell. NV customers would have to wait until Pascal to get better DX 12 performance, and thus, Maxwell owners will have to upgrade. This is of course based on the what I am reading in this thread, but I have no reason not to believe in this probable outcome of longer useful life for Fury vs 980 Ti.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Then why do all the feature tests that highlight DX12 or Mantle performance have to do with draw calls? eg. Star Swarm or 3DMark API Overhead Feature Test?
> 
> NVIDIA spawned the beginning of reducing CPU overhead back in 2009, before everything you suggest.


GPUs have always been limited by draw calls. The term "draw calls" has been part of the GPU technical jargon long before 2009. nVIDIA mentioning that they worked on reducing CPU overhead, for draw calls, is a given.

If you do some Google searches you will find a ton of articles which discuss the topic:

http://www.bit-tech.net/hardware/graphics/2011/03/16/farewell-to-directx/2
Quote:


> 'It can vary from almost nothing at all to a huge overhead,' says Huddy. 'If you're just rendering a screen full of pixels which are not terribly complicated, then typically a PC will do just as good a job as a console. These days we have so much horsepower on PCs that on high-resolutions you see some pretty extraordinary-looking PC games, but one of the things that you don't see in PC gaming inside the software architecture is the kind of stuff that we see on consoles all the time.
> 
> On consoles, you can draw maybe 10,000 or 20,000 chunks of geometry in a frame, and you can do that at 30-60fps. On a PC, you can't typically draw more than 2-3,000 without getting into trouble with performance, and that's quite surprising - the PC can actually show you only a tenth of the performance if you need a separate batch for each draw call.


Quote:


> Now the PC software architecture - DirectX - has been kind of bent into shape to try to accommodate more and more of the batch calls in a sneaky kind of way. There are the multi-threaded display lists, which come up in DirectX 11 - that helps, but unsurprisingly it only gives you a factor of two at the very best, from what we've seen. And we also support instancing, which means that if you're going to draw a crate, you can actually draw ten crates just as fast as far as DirectX is concerned.
> 
> But it's still very hard to throw tremendous variety into a PC game. If you want each of your draw calls to be a bit different, then you can't get over about 2-3,000 draw calls typically - and certainly a maximum amount of 5,000. Games developers definitely have a need for that. Console games often use 10-20,000 draw calls per frame, and that's an easier way to let the artist's vision shine through.'


- Richard Huddy AMD March 2011

You have to remember that dealing with Draw Call overhead was also a feature of DirectX 11 and DirectX 11 released in 2009.
http://www.geeks3d.com/20100806/how-to-hack-and-speed-up-direct3d-11-render-calls/

The first GPU company to actually add hardware, into their architecture, in order to deal with this was AMD with their GCN.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> GPUs have always been limited by draw calls. The term "draw calls" has been part of the GPU technical jargon long before 2009. nVIDIA mentioning that they worked on reducing CPU overhead, for draw calls, is a given.
> 
> If you do some Google searches you will find a ton of articles which discuss the topic:
> 
> http://www.bit-tech.net/hardware/graphics/2011/03/16/farewell-to-directx/2
> 
> - Richard Huddy AMD March 2011
> 
> You have to remember that dealing with Draw Call overhead was also a feature of DirectX 11 and DirectX 11 released in 2009.
> http://www.geeks3d.com/20100806/how-to-hack-and-speed-up-direct3d-11-render-calls/
> 
> The first GPU company to actually add hardware, into their architecture, in order to deal with this was AMD with their GCN.


The first article is from 2011, the second article is from 2010. After what NVIDIA had published in 2009:





Also lastly GNC was very similar to NVIDIA's Fermi:
Quote:


> In short, they are completely abandoning their VLIW architecture, because of the inherent inefficiencies with trying to parallelize code. Instead they're going for a more straightforward SIMD-like approach, much like nVidia's Fermi architecture, but also much like Intel's Larrabee.
> 
> It has always been obvious that nVidia's approach was better for GPGPU than AMD's. AMD's approach was better for graphics (enough parallelism can be extracted from most graphics tasks to make VLIW efficient, and pack more processing power in the same die space). However, nVidia proved that their approach could be made 'good enough' for graphics, while having advantages for GPGPU. In a way, Fermi seems to be the right architecture at the right time. Sure, they had a bit of a false start with the original GTX465/470/480, but that was more to do with manufacturing than with the architecture itself, as the more refined GTX460 and the GTX500 series show. They are competitive in terms of price/performance on the graphics front, while offering the extra GPGPU benefits that AMD currently lacks.
> 
> Other new GPGPU features include support for function pointers, virtual functions, exceptions and recursion, to allow for a full C++ implementation. Again, these are feature that Fermi already supports. AMD also goes for ECC memory support, which again, nVidia's Fermi already supports.


https://scalibq.wordpress.com/2011/06/21/amd-follows-nvidias-lead-in-gpu-design/


----------



## sugarhell

So what. The term and the whole reduce draw calls is a thing for many years or decades. 3D artists for many years optimize their models to have low impact on draw calls or even compile a lot of small meshes to big ones so they can optimize draw calls. Opengl had for many years before nvidia release extensions for reduced draw calls. Even glide...

I can do the same:

http://blogs.technet.com/b/torgo3000/archive/2007/06/01/performance-art.aspx
Quote:


> The number one killer for us (and just about every single DX9 game out there) is draw calls. There is a really cool little app called NVPerfHUD that will let you step through the scene's draw calls as well as profile a whole bunch of different performance metrics (sorry, it works in debug only). When we load up FSX in NVPerfHUD on a decent graphics card (6800-ish) the graphics card is 100% idle most of the time. Unless you get really close to an object with an expensive shader, the graphics card is never taxed. This is because it is always waiting for the CPU to process and send the draw calls, which it handily processes with ease.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> What is an assumption is claiming that Microsoft began working on DirectX 12 with AMD alone, back at the development stage of the Xbox One. Even though shown here, by Microsoft themselves, multiple parties were involved:


That is dated in 2014 and of course Microsoft would take input from all the relevant technology makers in the industry when designing an API for the Windows Operating system.

The basis for the direction of DirectX 12 was and is the same basis at the heart of AMD Mantle. It was derived out of the AMD and Microsoft partnership for the XBox One.

The idea of tapping into Asynchronous Compute Engines in order to efficiently utilize the available compute resources on hand is something that the XBox One (or any console) needs. We wonder why consoles perform better than their hardware would suggest, well one of the reasons is their ability to be closer to metal.

The reason the DirectX API calls are so similar to AMD Mantle is because they were made from the same DNA, at their heart. The reason for nVIDIA adding HyperQ support so late in the game (well after 2011 and in 2014) is a testament to this. AMD had ACEs in GCN starting with the very first GCN part which was released in December of 2011 (http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review). XBox One started development in 2011 and is built around GCN. DirectX 12 began development in 2011 for the XBox One.

We can tell by the time frame just which architecture formed the basis for DirectX 12. We can also tell that the XBox One, and consoles in general, was the primary focus of Microsoft (who had abandoned PC Gaming, for the most part, until 2014).

I'm not sure why this is shocking to anyone. XBox One was built for DirectX 12. It's hardware was built for DirectX 12.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> The first article is from 2011, the second article is from 2010. After what NVIDIA had published in 2009:
> 
> 
> 
> 
> 
> Also lastly GNC was very similar to NVIDIA's Fermi:
> https://scalibq.wordpress.com/2011/06/21/amd-follows-nvidias-lead-in-gpu-design/


DirectX 12 wasn't in development in 2010. It started development as an API for the next generation Xbox console (XBox One) in 2011. It likely would have never made it as an API onto the PC if it weren't for AMD Mantle threatening its market dominance (as many developers began to talk of jumping ship such as Crytek, Epic, EA etc).


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> That is dated in 2014 and of course Microsoft would take input from all the relevant technology makers in the industry when designing an API for the Windows Operating system.
> 
> The basis for the direction of DirectX 12 was and is the same basis at the heart of AMD Mantle. It was derived out of the AMD and Microsoft partnership for the XBox One.
> 
> The idea of tapping into Asynchronous Compute Engines in order to efficiently utilize the available compute resources on hand is something that the XBox One (or any console) needs. We wonder why consoles perform better than their hardware would suggest, well one of the reasons is their ability to be closer to metal.
> 
> The reason the DirectX API calls are so similar to AMD Mantle is because they were made from the same DNA, at their heart. The reason for nVIDIA adding HyperQ support so late in the game (well after 2011 and in 2014) is a testament to this. AMD had ACEs in GCN starting with the very first GCN part which was released in June of 2012 (announced in 2011). XBox One started development in 2011 and is built around GCN. DirectX 12 began development in 2011 for the XBox One.
> 
> We can tell by the time frame just which architecture formed the basis for DirectX 12. We can also tell that the XBox One, and consoles in general, was the primary focus of Microsoft (who had abandoned PC Gaming, for the most part, until 2014).
> 
> I'm not sure why this is shocking to anyone. XBox One will runs DirectX 12 as well. It was built for it.


They didn't just take input they worked on it collectively. It was a group effort.

You have no proof whatsoever that Direct X 12 came from Mantle directly. Nor that it was derived out of AMD's and Microsoft's partnership for Xbox One. Microsoft even came out themselves and said do not expect drastic performance increases from Direct X 12 on Xbox One.

Consoles don't perform better than the hardware would suggest, there performance is on par with what their hardware suggests:





Dips into 19-20 fps

The reason why some of it is similar is because AMD worked on DirectX 12 with Microsoft, but it is no indication that it stemmed from Mantle. Direct X 12 as said numerous times and proven was a collective effort amongst numerous companies.

You are making ridiculous connections without any solid proof, and nothing more than speculation. Your time frames mean nothing in regards to DirectX 12.

Sure it runs DirectX 12, but it by no means, translates to some amazing performance increase as you are trying to make it seem will happen:
Quote:


> Xbox head Phil Spencer appeared to dump cold water on the idea that DX12 would make a major difference for the console, writing: "It will help developers on XBOX One. *It's not going to be a massive change* but will unlock more capability for devs."


http://www.extremetech.com/gaming/184768-head-of-xbox-warns-gamers-not-to-expect-dramatic-improvements-from-dx12
Quote:


> "I got asked early on by people if DirectX12 is going to dramatically change the graphics capability of Xbox One, *I said it would not,*" said Mr. Spencer in an interview with The Inner Circle.


http://www.kitguru.net/gaming/anton-shilov/microsoft-directx-12-will-not-dramatically-improve-xbox-one/


----------



## Kpjoslee

Quote:


> Originally Posted by *p4inkill3r*
> 
> Considering there have been 65k views of this thread, the discussion produced by one single benchmark from one single game has been enriching and informative for the most part.


Pretty much the rehash of AMD technical presentation we saw back in Hawaii and Fiji launch, mostly preaching about how superior asynchronous compute shaders compared to Nvidia ones. Doing Apples to Apples comparison to what is actually Apples and oranges pretty much lead to misleading conclusion.
8x8=64 thus it is superior to nvidia's 32 compute queues is misleading conclusion. AMD's GCN is like 8 cores with 8-way hyperthreading while Nvidia's one (Maxwell 2) is like 32 simpler cores. So theoretical performance might not be that much different. It will be heavily dependent upon how it is coded.

Let me quote from what I found in Beyond3D forum
Quote:


> Asynchronous compute is useful in similar cases where hyperthreading is useful.
> 
> The first use case are stalls. A queue needs to wait for something (for example end of the raster operations and ROP cache flush to start sampling that render target as a texture in the following post process shader). On CPU side a core can stall for example if it waits some work to finish on a mutex or semaphore. In this scenario the other command stream is stalled, while the other runs at full rate. Hyperthreading / asyncronous compute keeps the CPU / GPU fed (it doesn't need to idle). Unless of course both instruction streams stall at the same time.
> 
> The second use case is workload that is bound by resource limits or fixed function hardware. On GPU, there is a fixed maximum primitive setup rate, maximum fill rate, maximum texture filtering rate, etc. Bandwidth and LDS work memory is also limited. When any of these things are the bottleneck, the shader cannot run at maximum speed, meaning that some instructions slots are unused. Asynchronous compute can use these instruction slots (if that shader has different bottlenecks). Similarly on CPU side, the instruction stream might have execution bubbles for various reasons. The instruction mix might overutilize some CPU execution ports or there might be cache misses (memory bottleneck). Hyperthreading can fill these small bubbles with instructions from the other thread.
> 
> Obviously the CPU and GPU are quite different, but both hyperthreading and asynchronous compute give the execution units more options (more TLP) to fill the execution pipelines with steady instruction stream.


So basically, too early to conclude anything at this point.


----------



## Mahigan

Here is some proof that Draw Calls have been an emphasis for some time throughout various DirectX APIs.

ftp://download.nvidia.com/developer/cuda/seminar/TDCI_DX10perf_DX11preview.pdf

From nVIDIA no less and discussing some of the improvements in DirectX 11:


----------



## ToTheSun!

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> A line mentioning an effort in reducing CPU overhead and increasing draw call count in a slide from Nvidia does not mean they were singly and exclusively addressing the problem, and Maxwell 2's hardware being more serial in nature does not mean Nvidia didn't have a hand in DX12.
> 
> Both are assumptions.
> 
> 
> 
> What is an assumption is claiming that Microsoft began working on DirectX 12 with AMD alone, back at the development stage of the Xbox One. Even though shown here, by Microsoft themselves, multiple parties were involved:
Click to expand...

Like i said, both are assumptions.

The only thing is Mahigan's assumption is more plausible than yours.

BUT, AGAIN, BOTH ASSUMPTIONS.


----------



## sugarhell

The point of dx12 on xbone is that developes whined a lot about dx11.x semi low api that they used on pre-release. They had to go over the dx11 programming for a console which is pointless opposed to ps4 GNM API that was wonderful and they could just send commands with simple QWORDS. The limitations of xbone hardware are all well know and this will not gonna change. But it will gonna change the time of the developer debugging their game,shaders.

http://wccftech.com/ps4-api-graphics-programmers-love-specific-gpu-optimizations-improve-performance/


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> DirectX 12 wasn't in development in 2010. It started development as an API for the next generation Xbox console (XBox One) in 2011. It likely would have never made it as an API onto the PC if it weren't for AMD Mantle threatening its market dominance (as many developers began to talk of jumping ship such as Crytek, Epic, EA etc).


This is your nonsense speculation, that you are making up without any FACTUAL proof whatsoever. AMD Mantle threatening market dominance? Are you kidding right now, you have to be joking.







I can count on my hands how many games featured Mantle.

As pointed out, Microsoft worked on Direct X 12 for many years, before Mantle with numerous companies:



See, NVIDIA, AMD, Intel and Qualcomm.


----------



## Mahigan

Nobody is saying that DirectX 12 came from Mantle. DirectX 12, at its heart, came from the development of the XBox One console. Mantle, at its heart, came from the development of the XBox One console. they both come from the same project.

They were both designed to tap into more efficient use of Compute Resources. This is a fact. Whether you want to believe it or not is your own prerogative.

End of discussion.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> This is your nonsense speculation, that you are making up without any FACTUAL proof whatsoever. AMD Mantle threatening market dominance? Are you kidding right now, you have to be joking.
> 
> 
> 
> 
> 
> 
> 
> I can count on my hands how many games featured Mantle.
> 
> As pointed out, Microsoft worked on Direct X 12 for many years, before Mantle with numerous companies:
> 
> 
> 
> See, NVIDIA, AMD, Intel and Qualcomm.


It's not the games that featured Mantle... it is the interest it peaked in the big Game Engine developers.




Quote:


> Tim Sweeney of Epic: there's some good ideas in Mantle we really liked the idea of having low overhead access to the GPU, if you look back at DX/OGL, there's a lot of overhead in those APIs and the fact that they date back to the old SGI model of rendering which is very different than the current model, potentially unified memory, good ideas there I hope it really helps the OGL community and MS evolve their APIs.


Quote:


> Today, AMD announced that three more developers are working on incorporating Mantle into their upcoming games. First up, Square Enix's Eidos-Montréal studio has partnered with AMD for Mantle in future titles, including its upcoming game Thief, which is set for release in February 2014.
> 
> Cloud Imperium Games, currently working on the crowd-funded Star Citizen, is the second developer to partner with AMD. Chris Roberts, CEO at the studio, said that "Mantle is vitally important for a game like Star Citizen, which is being designed with the need for massive GPU horsepower", indicating that AMD and the studio have modified Crytek's fourth-gen CryEngine to make use of the API in this game.
> 
> The final developer to join the Mantle bandwagon is Oxide Games, who are currently developing a new 64-bit, multi-platform, multi-core game engine called 'Nitrous'. The engine is in the early stages of development, so on release, it should be optimized fully for Mantle.


http://www.techspot.com/news/54573-three-more-developers-hop-on-amds-mantle-api-bandwagon.html
Quote:


> "Crytek prides itself on enabling CRYENGINE with the latest and most impressive rendering capabilities," said Cevat Yerli, Founder, CEO & President of Crytek. "By integrating AMD's new Mantle API, CRYENGINE will gain a dimension of 'lower level' hardware access that enables extraordinary efficiency, performance and hardware control."


And the smoking gun is from Carmack (October 18, 2013)...
Quote:


> John Carmack: *Mantle only became interesting because of their dual console wins*, the landscape does matter that they have both major console wins with similar architectures, not a stupid thing that AMD is doing at this point, could have some implications for Steam, if MS and Sony embraced it that would be very powerful for AMD but it doesn't look like they're going to (at least MS), if I was still doing all of the major tech coding I probably would not be embracing Mantle right now but there would be days where it would be extremely tempting


Do you understand what Carmack just said... and MS did embrace it.

Microsoft announced DirectX 12 for the PC shortly after in March of 2014.

AMD killed Mantle, officially, in 2015.
Quote:


> Why this matters: If this is the end of Mantle 1.0 as we know it, it will have achieved what it needed to do-and that's get Microsoft and OpenGL's Khronos to finally move forward and faster on graphics APIs that have been holding back PC gaming.


http://www.pcworld.com/article/2891672/amds-mantle-10-is-dead-long-live-directx.html?null

Once Microsoft announced DirectX 12, everyone quit working on Mantle and ported over to DirectX 12


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Mahigan*
> 
> Here is some proof that Draw Calls have been an emphasis for some time throughout various DirectX APIs.
> 
> ftp://download.nvidia.com/developer/cuda/seminar/TDCI_DX10perf_DX11preview.pdf
> 
> From nVIDIA no less and discussing some of the improvements in DirectX 11:


So you use NVIDIA as your source?









But, I was using NVIDIA
Quote:


> Originally Posted by *ToTheSun!*
> 
> Like i said, both are assumptions.
> 
> The only thing is Mahigan's assumption is more plausible than yours.
> 
> BUT, AGAIN, BOTH ASSUMPTIONS.


My assumptions aren't simply assumptions they are based on facts, most of what Mahigan posts is pulled from his rear, accompanied by out of context information.
Quote:


> Originally Posted by *Mahigan*
> 
> Nobody is saying that DirectX 12 came from Mantle. DirectX 12, at its heart, came from the development of the XBox One console. Mantle, at its heart, came from the development of the XBox One console. they both come from the same project.
> 
> They were both designed to tap into more efficient use of Compute Resources. This is a fact. Whether you want to believe it or not is your own prerogative.
> 
> End of discussion.


You know what's the end of discussion is the fact that you have no factual proof regarding your theories and but yet you still spin them off as factual. That's what signifies the end of discussion.

Nobody said DirectX 12 came from Mantle? Actually you basically did, you claimed they were born out of the same project. "Mantle and DirectX 12 are the offspring of AMD and Microsoft collaboration on the Xbox One."

Quote:


> Originally Posted by *Mahigan*
> 
> It's not the games that featured Mantle... it is the interest it peaked in the big Game Engine developers.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.techspot.com/news/54573-three-more-developers-hop-on-amds-mantle-api-bandwagon.html
> Once Microsoft announced DirectX 12, everyone quit working on Mantle and ported over to DirectX 12


So what? If a 100 developers say they are interested in Mantle but only 4 actually use it. Who gives a crap? It's meaningless. More speculation without any proof.

They quit working on Mantle because it is proprietary. Obviously a developer will support an API that is used collectively on multiple hardware vendors as opposed to an exclusive API that can only be used on AMD Hardware.


----------



## maltamonk

This gonna be good...lol


----------



## pengs

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> This is your nonsense speculation, that you are making up without any FACTUAL proof whatsoever. AMD Mantle threatening market dominance? Are you kidding right now, you have to be joking.
> 
> 
> 
> 
> 
> 
> 
> I can count on my hands how many games featured Mantle.
> 
> As pointed out, Microsoft worked on Direct X 12 for many years, before Mantle with numerous companies:
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> See, NVIDIA, AMD, Intel and Qualcomm.


The problem is that Mantle could had crossed over onto Linux
http://www.pcworld.com/article/2364760/amd-wants-to-improve-gaming-in-linux-and-steam-boxes-with-its-mantle-tools.html

Now you ARE threatening Microsoft's market dominance and the company's ability to maintain itself as the gaming platform for the PC, maybe not immediately but Microsoft whacked the mole as soon as it appeared.

Also, what does it matter if Microsoft used DX12 or DX11.x when the consoles are all x86. Microsoft could had easily used something other than a low level API.

It lends to the theory that with Mantle being demonstrated as being operable, the public release of it and the few games which featured with it Microsoft feared a potential loss (any loss) and plausibly coaxed a head start from AMD.

Now why would AMD turn them down? GCN is in the XBO, having a circular and congruent low-level API between the consoles and the PC will show very significant performance gains on Radeon GPU's and help fight NV's Maxwell architecture.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Nobody is saying that DirectX 12 came from Mantle


You may not be, but lots of people are/do. In fact, many people claim that Microsoft had no intention of developing DX12 until after Mantle was announced.
Quote:


> Originally Posted by *pengs*
> 
> The problem is that Mantle could had crossed over onto Linux
> http://www.pcworld.com/article/2364760/amd-wants-to-improve-gaming-in-linux-and-steam-boxes-with-its-mantle-tools.html
> 
> Now you ARE threatening Microsoft's market dominance and the company's ability to maintain itself as the gaming platform for the PC, maybe not immediately but Microsoft whacked the mole as soon as it appeared.


I doubt Microsoft is concerned about Linux on the desktop stealing market share, or about AMD/Mantle causing people to switch to Linux for gaming. Tablets, handhelds yes, Linux no.
Quote:


> Originally Posted by *pengs*
> 
> It lends to the theory that with *Mantle being demonstrated as being operable, the public release of it and the few games which featured with it* Microsoft feared a potential loss (any loss) and plausibly coaxed a head start from AMD.


That again implies that Microsoft hadn't started working on DX12 when Mantle was announced in the fall of 2013, and we know that isn't true.


----------



## Mahigan

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> So you use NVIDIA as your source?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> But, I was using NVIDIA
> My assumptions aren't simply assumptions they are based on facts, most of what Mahigan posts is pulled from his rear, accompanied by out of context information.
> You know what's the end of discussion is the fact that you have no factual proof regarding your theories and but yet you still spin them off as factual. That's what signifies the end of discussion.
> 
> Nobody said DirectX 12 came from Mantle? Actually you basically did, you claimed they were born out of the same project. "Mantle and DirectX 12 are the offspring of AMD and Microsoft collaboration on the Xbox One."
> So what? If a 100 developers say they are interested in Mantle but only 4 actually use it. Who gives a crap? It's meaningless. More speculation without any proof.
> 
> They quit working on Mantle because it is proprietary. Obviously a developer will support an API that is used collectively on multiple hardware vendors as opposed to an exclusive API that can only be used on AMD Hardware.


They quit working on Mantle because they achieved their goal. To get MS to support all of the fundamental aspects of AMD Mantle on the PC. MS did so. Therefore there was no longer a need for Mantle.

Saying that AMD Mantle and DirectX 12 were born out of the same project, XBox One, isn't the same as saying DirectX 12 came from Mantle (which I am not saying unlike others). Both were born in 2011. As a method to tap into the performance of the XBox One's GCN architecture. AMD created Mantle out of the lessons learned from this project.

That's why it is easier to port from Mantle to Direct3D 12 than from Direct3D 11 to Direct3D 12. We see this when comparing the Mantle API calls and the DirectX 12 API calls. Anyone with programming knowledge will tell you this. https://community.amd.com/community/gaming/blog/2015/05/12/mantle-the-start-of-a-low-overhead-future
Quote:


> Above all, *Mantle will present developers with a powerful shortcut to DirectX® 12, as the lingual similarities between APIs will make it easy to port a Mantle-based render backend to a DirectX® 12*-based one if needed or desired. In addition, Mantle developers that made the bold decision to support our historic API will be well-educated on the design principles DirectX® 12 also promises to leverage. Finally, we will ensure that tomorrow's game engines have an easy time of supporting a Mantle render backend, just as talented devs are comfortable with supporting multiple backends today to better address the needs of gamers.




AMD learned how to build a low level API by working with Microsoft on the XBox One. The project allowed AMD to walk away with the foundations for an API that they thought would work well on the PC. The Lingual similarities, between AMD Mantle and DirectX 12 are irrefutable. Both APIs have the same basic foundation.

Microsoft was reluctant to release DirectX 12 onto the PC. At first, what we now call DirectX 12, was mean't as an API for the Xbox One. As Microsoft's own chief of Xbox development mentioned, Microsoft had neglected the PC. So Microsoft began to work on bringing this API to the PC under the name DirectX 12. Mantle achieved the goal of showing a large degree of interest for a low level API on the PC. So with DirectX 12 on the way, AMD cancelled Mantle. They don't need Mantle anymore. DirectX 12 does everything Mantle did and then some.


----------



## ZealotKi11er

One think is true that Nvidia wants you to be able to use any of their cards with any CPU you have and have the least amount of power. They want you to spend all your budget on their GPU instead of spreading thin and having to upgrade CPU and PSU. If GTX980 Ti could fully run the same in a Intel Pentium G as 6700K, Nvidia would be happy.


----------



## PontiacGTX

Quote:


> Originally Posted by *ZealotKi11er*
> 
> One think is true that Nvidia wants you to be able to use any of their cards with any CPU you have and have the least amount of power. They want you to spend all your budget on their GPU instead of spreading thin and having to upgrade CPU and PSU. If GTX980 Ti could fully run the same in a Intel Pentium G as 6700K, Nvidia would be happy.


I dont a dual core in a multiplayer game would be useful since the trend seems improve the support for more cores for any reason(because intel has been pausing the release of a 6 core for mainstream price)


----------



## pengs

Quote:


> Originally Posted by *Forceman*
> 
> You may not be, but lots of people are/do. In fact, many people claim that Microsoft had no intention of developing DX12 until after Mantle was announced.
> I doubt Microsoft is concerned about Linux on the desktop stealing market share, or about AMD/Mantle causing people to switch to Linux for gaming. Tablets, handhelds yes, Linux no.


I don't think you're applying enough pressure to the executive part of your brain







There is a reason why Microsoft started in the late 70's and continue to retain +90% market share 35 years after it's conception. Sounds a bit cynical but the goal at the end of the day is to make money and some companies do it exponentially, especially with larger assets and resources, more money equals more money if the right people are making decisions.

It's not just Linux, AMD is liberal with it's assets giving them away at any chance, Mantle may had ended up on something which threatened Microsoft (which it all does if it's not Microsoft tbh).


----------



## ToTheSun!

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> My assumptions aren't simply assumptions they are based on facts, most of what Mahigan posts is pulled from his rear, accompanied by out of context information.


Saying Nvidia were pushing for something DX12-esque or that they had initial involvement in DX12 because of a slide in 2009 saying "we want more draw calls" is an assumption.

In comparison, saying AMD were pushing for something DX12-esque or that they had initial involvement in DX12 because DX12's code is, basically, Mantle's and because Khronos publicly thanked AMD for the code is also an assumption, but a vastly more plausible one.

And i'm not even considering the fact that it's AMD's hardware that's inside the first consoles that will support DX12 and Vulkan in my argument.


----------



## NightAntilli

Ok I've been following this thread for quite a while, and it has been quite informative. I do think that it's been too much on the GPU side of things. The CPUs were mentioned shortly but everyone ignored that. I think we can agree that the most interesting part of the GPUs have already been discussed. So much so that now people are arguing who contributed to which API and so on.

So if possible, anyone that has good knowledge with CPUs, can you explain why the FX-8370 loses to the i3? If Ashes really uses all available threads on the CPU, and we take raw performance of the CPUs based on available multithreaded benchmarks, an FX-4 should be on par with an i3, and an FX-6 should beat it and an FX-8 should wipe the floor with it. And now a Phenom II X6 is beating an FX-8, so, what's going on?


----------



## gamervivek

Quote:


> Originally Posted by *Forceman*
> 
> And there it is. I knew I wouldn't be disappointed. That picture makes an appearance in pretty much every DX12 thread.
> 
> I don't understand what people think that image proves. Everyone knows AMD was involved in developing DX12, along with Nvidia and Intel, so that's hardly revelatory information, and if you incorporate portions of someone's code in your standard you are probably going to include that in the documentation also. Yet somehow people take it to mean that DX12 is just Mantle renamed, or that DX12 couldn't have existed without it. If I quote the Gettysburg Address in a term paper, does that mean Lincoln wrote it?


Somehow mantle's 'design philosophy' ends up being copied for DX12 with the same phrases and words in its documentation.

But no, dx12 is not based on mantle, how could it be?


----------



## mtcn77

Quote:


> Originally Posted by *NightAntilli*
> 
> Ok I've been following this thread for quite a while, and it has been quite informative. I do think that it's been too much on the GPU side of things. The CPUs were mentioned shortly but everyone ignored that. I think we can agree that the most interesting part of the GPUs have already been discussed. So much so that now people are arguing who contributed to which API and so on.
> 
> So if possible, anyone that has good knowledge with CPUs, can you explain why the FX-8370 loses to the i3? If Ashes really uses all available threads on the CPU, and we take raw performance of the CPUs based on available multithreaded benchmarks, an FX-4 should be on par with an i3, and an FX-6 should beat it and an FX-8 should wipe the floor with it. And now a Phenom II X6 is beating an FX-8, so, what's going on?


Possible candid reasons are,

The test uses SSE2 for compatibility(amd chips would gain a lot of performance with xop as intel chips would with avx),
Cache contention,
...might be others.


----------



## Mahigan

Quote:


> Originally Posted by *NightAntilli*
> 
> Ok I've been following this thread for quite a while, and it has been quite informative. I do think that it's been too much on the GPU side of things. The CPUs were mentioned shortly but everyone ignored that. I think we can agree that the most interesting part of the GPUs have already been discussed. So much so that now people are arguing who contributed to which API and so on.
> 
> So if possible, anyone that has good knowledge with CPUs, can you explain why the FX-8370 loses to the i3? If Ashes really uses all available threads on the CPU, and we take raw performance of the CPUs based on available multithreaded benchmarks, an FX-4 should be on par with an i3, and an FX-6 should beat it and an FX-8 should wipe the floor with it. And now a Phenom II X6 is beating an FX-8, so, what's going on?


I emailed Oxide in order to get more details about the CPU Optimizations used in AotS as well as potential bottlenecks. I received a response from one the of the developers. It's posted somewhere in the 100 pages of this thread lol.

I'll re-iterate the points that were brought up which could explain some of the performance issues with the AMD FX 83x0 systems (important to note that since this is an RTS, you can game comfortably at 30FPS so you won't have an issue playing the game, this was mentioned by the developer).

1. Oxide have used SSE2 optimizations, for compatibility reasons as mentioned. What is an eye opener is the way both the AMD FX 8350 and a 4350 perform under SSE2 optimized code. Pay special attention to the FX 4350 running 5.1GHz and the FX 8350 running 5GHz. You don't see the performance doubling (up 17.5%) which is what we've been witnessing, as seen below:


2. Memory Bandwidth is an issue. You hit close to peak memory bandwidth on 40% of the frame on an i7 3770 running on a z77 motherboard. That means you hit around 19-20GB/s of use. Given that AMD FX processors are memory bandwidth starved to begin with, and that a 990x/990fx motherboard can only must 19GB/s on PC3 1600 RAM, this will result in a memory bottleneck on AMD systems.

That's all the info that is available thus far.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> I emailed Oxide in order to get more details about the CPU Optimizations used in AotS as well as potential bottlenecks. I received a response from one the of the developers. It's posted somewhere in the 100 pages of this thread lol.
> 
> I'll re-iterate the points that were brought up which could explain some of the performance issues with the AMD FX 83x0 systems (important to note that since this is an RTS, you can game comfortably at 30FPS so you won't have an issue playing the game, this was mentioned by the developer).
> 
> 1. Oxide have used SSE2 optimizations, for compatibility reasons as mentioned. What is an eye opener is the way both the AMD FX 8350 and a 4350 perform under SSE2 optimized code. Pay special attention to the FX 4350 running 5.1GHz and the FX 8350 running 5GHz. You don't see the performance doubling (up 17.5%) which is what we've been witnessing, as seen below:
> 
> 
> 2. Memory Bandwidth is an issue. You hit close to peak memory bandwidth on 40% of the frame on an i7 3770 running on a z77 motherboard. That means you hit around 19-20GB/s of use. Given that AMD FX processors are memory bandwidth starved to begin with, and that a 990x/990fx motherboard can only must 19GB/s on PC3 1600 RAM, this will result in a memory bottleneck on AMD systems.
> 
> That's all the info that is available thus far.


So wouldn't something like X79 and X99 with Quad Channel just crush Dual Channel setups? My 3700K with 2400MHz get bit boost in memory bandwidth over 1600MHz. Maybe this benchmark is limited to Dual Channel hence 6700K is super fast DDR4 tops even X99.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> So wouldn't something like X79 and X99 with Quad Channel just crush Dual Channel setups? My 3700K with 2400MHz get bit boost in memory bandwidth over 1600MHz. Maybe this benchmark is limited to Dual Channel hence 6700K is super fast DDR4 tops even X99.


Well...







Looks like memory bandwidth plays a role in performance.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> Well...


Why is 6700K faster then 5960X then?


----------



## mtcn77

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Considering that most of the gains are because of AMD's poor DX11 overhead, the performance difference between 11 and 12 is pretty underwhelming tbh.
> 
> Was this game built for DX12 or was it updated from 11?


OK, found it. Ironically just when searching for the XBox article.
Quote:


> For the PC, Wardell points out that "Ashes of the Singularity" (a new title from StarDock, his studio) has received a 70 percent boost in performance on rendering the same scene. The big news here though is there's still backwards compatibility with DX11 - what that means according to Wardell is they're losing performance because of this. He's pretty convinced that if they'd have designed the project from the ground up with only DX12 in mind, their performance would have been higher.


----------



## mtcn77

I think we are making the mistake of comparing the impact of Asynchronous Shaders solely based on pc benchmarks. Where it will shine forth, will still be the XBox, imo. The primary reason for that is the system arrangement. PCI Express is a gpu link to the cpu "in series". Now series means you get their inverse geometric mean. So when a gpu has 400 GB/s of vram bandwidth - such as an overclocked 980Ti - what this means is, it actually has 80 GB/s of memory bandwidth when reckoning on the cpu link via the 16 GB/s pci express, also.
By partnering up with AMD and licencing their apu technology, Microsoft overcame that: apu's can share pointers between gpu and apu components - that means working "_in parallels_". You may check the XBox'es layout. The gpu has separate links to ddr3 and esram all in parallels with the apu which is just confined to a prefetching unit at this point.
The gpu IMC can access either,

DDR3 - in this case have discrete 42 GB/s R/W of bandwidth,
APU - 25 GB/s of coherent bandwidth,
ESRAM - 102 GB/s of pure awesomeness.
The gpu itself has in total both 37 GB/s ddr3 access and 102 GB/s esram access. Supposing the apu and gpu are still working in series, but not ferrying among themselves, that makes it what? Working in parallels? I am not good at these things...
Eventually, consoles can beat the living crap out of every sub 1TB/s gpu there is, should Microsoft were to bring the competition to the confines of what pc's can do. They just need to turn on megatextures and out comes the goodies.

[RedTechGaming]


----------



## anubis44

Me, too. It's just one of the reasons I hold a grudge against nVidia, that they knocked 3Dfx out of business.


----------



## diggiddi

Quote:


> Originally Posted by *ToTheSun!*
> 
> Saying Nvidia were pushing for something DX12-esque or that they had initial involvement in DX12 because of a slide in 2009 saying "we want more draw calls" is an assumption.
> 
> In comparison, saying AMD were pushing for something DX12-esque or that they had initial involvement in DX12 because DX12's code is, basically, Mantle's and because Khronos publicly thanked AMD for the code is also an assumption, but a vastly more plausible one.
> 
> And i'm not even considering the fact that it's AMD's hardware that's inside the first consoles that will support DX12 and Vulkan in my argument.


There is a video in this thread with Kronos bigwigs and Amd guy and someone on the panel says Vulkan is Mantle


----------



## Serios

Quote:


> Originally Posted by *Forceman*
> 
> Why would Microsoft lie about when they started work on DX12? At least three other companies know when it was (AMD, Nvidia, Intel) so lying hardly makes any sense.


But Microsoft hasn't said anything about when they started working on DX12.


----------



## Kuivamaa

Quote:


> Originally Posted by *patrickjp93*
> 
> Alright, to correct the record here, DX 12 is not a copy of Mantle. Mantle only had the functions usable in a Direct3D portion anyway. Between DirectCompute and everything else, there's a Hell of a lot of DX 12 that isn't Mantle. And even if you wanted to claim they are exactly the same, we do not yet have access to the DX 12 source, and we won't until the likes of me disassemble the whole damn thing and analyze it, which will take many games and hundreds of hours of study and comparison. Further, Microsoft was developing DX 12 as a whole for years. It's too prideful and, frankly, too much better at programming than AMD and game studios to actually just copy it and tweak the edges. Anyone arguing otherwise is operating under a delusion.
> 
> There's some fact presented hear about the advised use of it, but that's low level access is in a perfect world: total control down to the metal. It's not going to look different from a pipeline perspective anyway. There was a logical order already in place from DX 11 for stages to play out in. The difference, algorithmically/procedurally speaking, is not very large between DX 11 and DX 12. The control and depth levels have changed, but procedurally it's very much the same. Until we have hard evidence that can be independently confirmed at anyone's leisure, we cannot conclude that DX 12 is Mantle with frills. Everyone on here needs to stop selling this like fact, because it's not fact. There are plenty of good reasons why it's not likely it will ever be fact given earlier on. Just stop. I will admit I don't have experience programming in DX itself, but I do have plenty in analyzing APIs, programming paradigms, using them to build multi-thousand line programs which complete very complex tasks in the most efficient manner possible. I analyze algorithms all day, and I can tell you just because two systems have the same general shape is not conclusive proof they are the same system or based on the same design principles. There are plenty of lawsuits which have claimed with the same level of evidence and argument you present in patent disputes, and they've been dismissed. This isn't fact; you're a less than stellar logician with a nice source but no strong synthesis of it, as it's incomplete with regards to your argument, and this really should be the end of it until you come up with actual proof, concrete evidence.


You took my initial post and jumped into conclusions. Look my other post that came later about MS obviously tailoring whatever they took from AMD to their ecosystem. Mantle was just the root. As for procedural similarities , you could tell that for DX11 and OGL and from a quick look here it seems to me that those two are more closely related in that aspect. Let's not forget how similar in hardware geforces and radeons are after all (Nvidia and AMD have patent agreement)


----------



## Noufel

GCN 2.0 will be a monster of arch, money saving mod activated for 2 full arctic island gpus


----------



## delboy67

The best thing mahigan has said is about 'emotion' in peoples posts, you can really tell fanboys by the emotion! Take olivon (for no reason other than he posted above, theres better examples), he actually seems to be genuinely offended by amd pr or amd being put in good light. Its crazy because this isn't a sport or a sports team, the only reason I can see for this logic is pps, 'I'm smart, I decided this was the best way to spend my money, if another person spends his a different way and argues it was best, it's an insult to my intelligence'


----------



## Dudewitbow

Quote:


> Originally Posted by *Olivon*
> 
> Mahigan =


hes actually the only user who is directly messaging Oxide questions about the benchmark(which is a rare thing in general). TBH, its his earlier posts and thoughts that created such a large discussion that everyone is picking up.


----------



## delboy67

Why does anyone care about nv/amd, dx12 is supposed to bring bigger and better games, when can I actually play AotS?


----------



## anubis44

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Then maybe perhaps more people should have corrected that person baiting for the 48th time, because it has been sourced numerous of times already that Microsoft has had DX12 in the works for quite some time, before Mantle even began. And on top of that, that yes, AMD contributed to the development of DX12, but as did NVIDIA, Intel, and Qualcomm. So it should come as no surprise to anyone to see similar code in some areas. Nevertheless, NVIDIA has been doing work on lowering CPU overhead since 2009, the forte of DX12. Which also happens to be the forte of Mantle. So should we start saying AMD copied NVIDIA with Mantle because they tried to lower CPU overhead?


How do you know when Mantle 'began'? From the sounds of the architectural details being offered about GCN by Mahigan, a former AMD GPU engineer himself, Mantle must have been in development since at least 2011 or so. Obviously, AMD would have been working on a low-level API from the time they first considered competing for the chips in the XBox1 and Play Station 4. I'm sorry that it's so galling to you, an nVidia user to learn that Microsoft's Direct X12 was essentially built by AMD. It must be at least as irritating to you as it was for you to find folders in your 'C:\Windows\WinSxS' labelled 'AMD64*' on your Intel CPU computer.


----------



## anubis44

Quote:


> Originally Posted by *Mahigan*
> 
> Exactly.
> 
> It clearly stems from work Microsoft and AMD did for the API for the XBox One. DirectX 12 was also born out of the same project.
> 
> *It seems that Mantle was released onto the PC because Microsoft had turned its back on the PC market in favor of the console market. That's what all the articles and words from Microsoft's own executives highlight.*
> 
> Mantle and DirectX 12 are the offspring of AMD and Microsoft collaboration on the Xbox One. They're not entirely the same, they're like siblings. Born out of the same genetic lineage. Both are based on the same original code which AMD and Microsoft worked on for the XBox One console.


And if that's true, that Mantle sort of pushed Microsoft's hand in the direction of releasing a DX12 that is basically Mantle itself in order to boost PC gaming, then ironically, nVidia has AMD to thank for providing that shot in the arm to PC gaming--something that's very important to nVidia's bottom line.


----------



## GorillaSceptre

Not Mantle vs DX12 again









You'd have to be a special kind of ignorant to think that Mantle forced MS to make DX12. It's naive to think that these corporations don't know what's going on at other companies years before we find out.

It has been stated numerous times that it was a collaborative effort between all the vendors. AMD obviously tried to use Mantle to give themselves the upper hand before DX12 dropped, why do you think nVidia didn't bat an eye when they announced it? NV probably thought putting more money into GameWorks would benefit them more in the short term, and AMD went the API route.

Unfortunately for AMD, flashy graphics and physics is easier to sell to consumers than "closer to the metal" is.


----------



## SpeedyVT

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Not Mantle vs DX12 again
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You'd have to be a special kind of ignorant to think that Mantle forced MS to make DX12. It's naive to think that these corporations don't know what's going on at other companies years before we find out.
> 
> It has been stated numerous times that it was a collaborative effort between all the vendors. AMD obviously tried to use Mantle to give themselves the upper hand before DX12 dropped, why do you think nVidia didn't bat an eye when they announced it? NV probably thought putting more money into GameWorks would benefit them more in the short term, and AMD went the API route.
> 
> Unfortunately for AMD, flashy graphics and physics is easier to sell to consumers than "closer to the metal" is.


As influential as one may have been. It's irrelevent as they are a totally different breed of rendering at this point in development and application.


----------



## anubis44

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Only problem as stated, is NVIDIA was also involved with DX12 development and has been working on Reducing CPU Overhead since 2009:
> 
> http://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf
> 
> Way before Mantle, and way before Microsoft even thought of creating an Xbox One or the Hardware that was going to be inside of it or the API that was going to be used for it.
> 
> What's one of the first suggested implementations NVIDIA offers in their assessment to reduce CPU Overhead?
> 
> "Increasing the number of Draw Calls per Frame"


Posting an nVidia memo from 2009 that vaguely mentions a goal of 'increasing the number of draw calls per frame' as proof that nVidia was working on DX12 before AMD was working on Mantle is like saying that Leonardo Da Vinci was the inventor of the first practical helicopter, not Igor Sikorsky, because he drew a sketch of something in the 15th century that LOOKS something like a helicopter.


----------



## Noufel

Quote:


> Originally Posted by *anubis44*
> 
> Quote:
> 
> 
> 
> Originally Posted by *BiG StroOnZ*
> 
> Only problem as stated, is NVIDIA was also involved with DX12 development and has been working on Reducing CPU Overhead since 2009:
> 
> http://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf
> 
> Way before Mantle, and way before Microsoft even thought of creating an Xbox One or the Hardware that was going to be inside of it or the API that was going to be used for it.
> 
> What's one of the first suggested implementations NVIDIA offers in their assessment to reduce CPU Overhead?
> 
> "Increasing the number of Draw Calls per Frame"
> 
> 
> 
> 
> 
> Posting an nVidia memo from 2009 that vaguely mentions a goal of 'increasing the number of draw calls per frame' as proof that nVidia was working on DX12 before AMD was working on Mantle is like saying that Leonardo Da Vinci was the inventor of the first practical helicopter, not Igor Sikorsky, because he drew a sketch of something in the 15th century that LOOKS something like a helicopter.
Click to expand...









making Da Vinci looking like a child drawing a sketch







people are awessome


----------



## infranoia

Stroonz, Force, Gorilla...

It's a very different thing to suggest that Mantle's release convinced MS to direct its efforts with DX12 onto the PC (i.e. *release* XBOne API as PC DX12) versus "Mantle forced MS to *develop* DX12." One of those statements is plausible. The other not so much.

Anyone trying to argue the latter should rightly be tarred and feathered, but I'm not seeing that argument here.


----------



## infranoia

And by the way, before dismissing a DX12 / Mantle shared origin, consider the implications.

It means that AMD's DX12 performance for FPS games will look like Thief and BF4. For RPGs it will look like DA:I. Nvidia made Mantle irrelevant in each case.

The surprise that AMD is performing well in Ashes is only because we haven't yet seen a Mantle-based RTS. These games are massively parallel, and a best-case scenario for GCN. And even after all that, look at Star Swarm. There should be plenty of Pascal wins for DX12 titles.

The thread is fascinating and Mahigan's research is compelling, but the only possible conclusion at this point is that GCN will excel at games with thousands of light sources and units. So if you're not on a Fury X for "Sims: Fairy Forest" then you might feel some buyer's remorse.


----------



## Silent Scone

I vote locadile.


----------



## Mahigan

Quote:


> Originally Posted by *infranoia*
> 
> And by the way, before dismissing a DX12 / Mantle shared origin, consider the implications.
> 
> It means that AMD's DX12 performance for FPS games will look like Thief and BF4. For RPGs it will look like DA:I. Nvidia made Mantle irrelevant in each case.
> 
> The surprise that AMD is performing well in Ashes is only because we haven't yet seen a Mantle-based RTS. These games are massively parallel, and a best-case scenario for GCN. And even after all that, look at Star Swarm. There should be plenty of Pascal wins for DX12 titles.
> 
> The thread is fascinating and Mahigan's research is compelling, but the only possible conclusion at this point is that GCN will excel at games with thousands of light sources and units. So if you're not on a Fury X for "Sims: Fairy Forest" then you might feel some buyer's remorse.


Bingo.

Any DirectX 12 title, which is bottle necked on the computing front, will show GCN in a favorable light. This will likely change once nVIDIA release Pascal because Pascal is said to improve on nVIDIA compute capabilities.


----------



## Luciferxy

rofl on the "amd's track record is quite clean"

seriously, where have you been ?


----------



## Mahigan

If your GPU architecture is both compute heavy and massively parallel, and you have a game which is compute heavy (Post Processing Effects, Lights and Physics), and you use a massively parallel API which makes efficient use of the compute capabilities of your architecture, then evidently the in-game results will translate into an increase in Frames per Second.

Of course this is assuming that the game engine itself is not bottlenecked on other fronts (Fill Rate, Memory Bandwidth, Texture Mapping, Geometry etc).

This is what we see with Ashes of the Singularity.

The easiest way to derive a comparison of the theoretical compute capabilities of various architectures (theoretical assumes efficient use):

*GeForce GTX 980 Ti*
CUDA Cores: 2816
Boost Clock (MHz): 1075
1075(2816 * 2) = 6,054,400 flops

*Radeon R9 290X*
Stream Cores: 2816
Clock (MHz): 1000
1000(2816 * 2) = 5,632,000 flops

If we look at the relative compute performance between the two architectures we can conclude that, theoretically, the GeForce GTX 980 Ti should have an edge (and a noticeable one at that). My findings, by looking at the GCN 1.1 and the Maxwell2 architectures, explain why we do not see this result in the various AotS benchmarks published (8 x Asynchronous Compute Engines (8 queues each) vs 1 x Grid Management Unit working with 1 x Work distributor (32 queues total)). I've explained why, architecturally, nVIDIAs HyperQ implementation is not as parallel as AMDs Asynchronous shading implementation (32 queues vs 64 plus 1). This lack in parallelism is one aspect of HyperQ's limitations. The other aspect is the hierarchical nature of HyperQ. Under GCN, the ACEs communicate directly with the various Compute Units (CUs). Under Maxwell 2, the Grid Management Unit (capable of holding 1000s of pending grids) communicates through a Work Distributor which then communicates with the various SMXs (Streaming Multiprocessor Units).

Maxwell 2 is a less parallel compute architecture than Hawaii.

How can we verify this?
Quote:


> Like Fermi and Kepler, GM204 is composed of an array of Graphics Processing Clusters (GPCs), Streaming
> Multiprocessors (SMs), and memory controllers. GM204 consists of four GPCs, 16 Maxwell SMs (SMM),
> and four memory controllers. GeForce GTX 980 uses the full complement of these architectural
> components (if you are not well versed in these structures, we suggest you first read the Kepler and
> Fermi whitepapers).


nVIDIA recommends we look at the Kepler White Papers so lets do that...




But where did I get the idea that nVIDIAs HyperQ solution worked hierarchically? Well... right from the horses mouth:


Why would hierarchy matter? One answer to this... LATENCY.

Therefore you have both Latency, as well as the ability to handle half the amount of queues as Hawaii in Maxwell 2, to thank for the lack of compute efficiency relative to Hawaii.

Now who are these "professionals" who are laughing at me? Would that be Anandtech? Who published erroneous information? Or perhaps all the other publications who simply sourced Anandtech rather than doing their own work? Anandtech pretty much concluded "nVIDIA can do Asynchronous shading too" and then went on to post information which was not well researched. Ryan Smith has also refused to correct his information. Why? Who knows.

As for the CPU side of things, I emailed Oxide using my sources (I used to work in this industry btw). I was put in touch with a developer. He answered my query and I shared the information here. The idea that system memory bandwidth plays a role was obtained directly from the horses mouth. Right from one of the guys working on the game. How did I figure out that AotS uses SSE2 optimizations? Same source.

All scientific inquiry starts with the formation of a hypothesis and then testing this hypothesis. This is what I've been doing, in conjunction with everyone who has participated here, in order to formulate a final conclusion. The first statements I made were a Hypothesis. As the thread went on, this hypothesis was tested and if it failed it was changed to take new information into consideration. One shouldn't be afraid of being wrong. Failure has the potential of leading to success. One shouldn't be ashamed of being ignorant, as ignorance can be cured with knowledge. One should be afraid of being wrong and refusing to accept empirical evidence as it is presented (willful ignorance).

I think it is high time a new Tech Website comes into play. One which is not as beholden to marketing as well as GPU Manufacturing exclusivity. One that focuses less on benchmarks and more on information. Perhaps then we wouldn't have so many people surprised to see an R9 290x keeping up with a GTX 980 Ti in compute heavy workloads under a parallel API. I suppose I understand it a bit better now, people like you Olivion, are the reason Tech websites are afraid to speculate or participate in investigative journalism. If a benchmark doesn't go your way... you scream "BIAS!!!". I suppose this is the result of a generation of kids who focus on benchmarks rather than the architectures at play. We have consoles to thank for that. Bringing mainstream folks into the PC field. It has its ups, increased sales, and its downs, extreme partisanship.

Source: http://www.microway.com/download/whitepaper/NVIDIA_Kepler_GK110_GK210_Architecture_Whitepaper.pdf


----------



## thegreatsquare

Consoles are already "to metal" with CGN graphics and can't hold a candle to my laptop 980m in DX11, much less GPUs like the 980ti, Fury X and the upcoming Pascal, so none of this should really make or break the ability to play 90%+ of tiles at 1080p for the life of the PS4/XB1 ...right?

I mean when everything is going to be "to metal", the consoles' raw abilities is already the "performance floor".

I also wonder if PC game customization menus will eventually change to reflect settings in terms of their processing costs, like with how GTA reflects memory usage taken a step further.


----------



## ToTheSun!

Quote:


> Originally Posted by *Mahigan*
> 
> I think it is high time a new Tech Website comes into play. One which is not as beholden to marketing as well as GPU Manufacturing exclusivity.


We're all looking at you and thinking the same thing.

It surely is more productive than pointlessly arguing with people here.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Maxwell 2 is a less parallel compute architecture than Hawaii.
> 
> But where did I get the idea that nVIDIAs HyperQ solution worked hierarchically? Well... right from the horses mouth:


But that's based on Kepler's workflow, how do we know they didn't change the way it works with Maxwell 2?

I'm not trying to be argumentative, i'm genuinely interested. You're posting some great stuff, keep going


----------



## Mahigan

Quote:


> Originally Posted by *ToTheSun!*
> 
> We're all looking at you and thinking the same thing.
> 
> It surely is more productive than pointlessly arguing with people here.


I would be lying if the thought hadn't crossed my mind and if I wasn't in talks to make it happen.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> But that's based on Kepler's workflow, how do we know they didn't change the way it works with Maxwell 2?
> 
> I'm not trying to be argumentative, i'm genuinely interested. You're posting some great stuff, keep going


This is the part which Anandtech got right











Prior to Maxwell 2, HyperQ could only handle tasking 1 Graphics or 32 compute queues. HyperQ couldn't do both in parallel. With Maxwell 2 there was an increase in parallelism. Maxwell 2 can now handle tasking both 1 Graphics and 31 Compute queues in parallel (32 compute queues when no graphics task requires queuing such as in OpenCL or other pure computational GPGPU scenarios).

As for my inspiration in my younger years... that would be David Kanter from http://www.realworldtech.com/.
https://www.linkedin.com/in/kanterd

He now works as an IT consultant for the Linley Group. Some of his work consists of creating white papers for various tech companies. He's the author of AMDs LiquidVR whitepapers and some of their Asynchronous Shading content with a focus on VR.
http://www.linleygroup.com/


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> This is the part which Anandtech got right


But how do we know that's accurate? Any other sources? I know, i'm lazy


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> But how do we know that's accurate? Any other sources? I know, i'm lazy


Well,

Anandtech reached out to nVIDIA (they didn't reach out to AMD it would appear) for information on Maxwell 2. nVIDIA provided those figures to Anandtech. I'm inclined to believe nVIDIA. I have no reason not too in this case as the benchmark results suggest this information is accurate. If Maxwell 2 couldn't simultaneously process Graphics and Compute, we would see some rather poor results under Ashes of the Singularity relative to AMDs Hawaii. The reason I say that they didn't reach out to AMD is because I doubt AMD would withhold such information from the public as it highlights one of GCNs architectural strengths.


----------



## CasualCat

Quote:


> Originally Posted by *GorillaSceptre*
> 
> So there's at least 4 games, it's more than i thought, but none of those games are "built for DX12". The real DX12 games are still a couple years off.
> 
> In any case, AMD is now on par (the Fury X anyway) with NV, but only in a benchmark of a game that's in alpha, which Nvidia said was not a true representation of DX12 performance.
> 
> Mahigan has posted some very compelling arguments, but in the end AMD's allegedly superior architecture just barely outperforms the 980 Ti. So if both Nvidias and AMD's GPU's are bottle-necked for different reasons, then how is AMD in a better position?


I think better here might be out of context. It is better relative to where they were not necessarily better as in superior. Meaning they're now more on even footing for "free" instead of having to optimize drivers like with DX11 where Nvidia seemed to have an advantage in terms of resources and speed to market.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Well,
> 
> Anandtech reached out to nVIDIA (they didn't reach out to AMD it would appear) for information on Maxwell 2. nVIDIA provided those figures to Anandtech. I'm inclined to believe nVIDIA. I have no reason not too in this case as the benchmark results suggest this information is accurate. If Maxwell 2 couldn't simultaneously process Graphics and Compute, we would see some rather poor results under Ashes of the Singularity relative to AMDs Hawaii. The reason I say that they didn't reach out to AMD is because I doubt AMD would withhold such information from the public as it highlights one of GCNs architectural strengths.


I'll try dig around for some more info on it. No disrespect to Anandtech, but if they messed up AMD's figures so badly, then they could of messed up Nvidia's too.


----------



## CasualCat

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Why is 6700K faster then 5960X then?


Could that be clock differences? Wonder how they'd compare if the 6700K was under clocked or the 5960X over clocked?


----------



## ku4eto

Quote:


> Originally Posted by *CasualCat*
> 
> Could that be clock differences? Wonder how they'd compare if the 6700K was under clocked or the 5960X over clocked?


Game doesn't use well all cores, and Skylake has some better IPC (and as such better Single threaded) performance than the 5960x.


----------



## Mahigan

Quote:


> Originally Posted by *thegreatsquare*
> 
> Consoles are already "to metal" with CGN graphics and can't hold a candle to my laptop 980m in DX11, much less GPUs like the 980ti, Fury X and the upcoming Pascal, so none of this should really make or break the ability to play 90%+ of tiles at 1080p for the life of the PS4/XB1 ...right?
> 
> I mean when everything is going to be "to metal", the consoles' raw abilities is already the "performance floor".
> 
> I also wonder if PC game customization menus will eventually change to reflect settings in terms of their processing costs, like with how GTA reflects memory usage taken a step further.


Consoles are close to metal, indeed, the DirectX 11-like API being used on the XBox One is "close to metal". That being said, this API did not make use of the available ACEs as Asynchronous shading was not a feature of DirectX 11, on the API or otherwise. This is why Tim Sweeney, of Epic Games, announced a 20% boost was on the way for the XBox One when updating the console for DirectX 12 and running games which are developed for DirectX 12.

As far as not being able to hold a candle to your Laptop's 980m in DX11, this is likely true because of the fact that nVIDIAs architecture (the hierarchical nature of a Grid Management Unit and the Work Distributor, allows nVIDIA to derive more Compute performance, relative to AMD GCN, under a serial API such as DX11. Therefore we can conclude that Maxwell 2 is sort of a hybrid architecture which sits between DX11 and DX12 (serial and parallel). Skilled in both trades. This is why we don't see an enormous performance boost going from DirectX 11 to DirectX 12 on nVIDIAs GPUs.

I am of the belief that Pascal will incorporate far more parallelism, in its architecture, than GCN. This could be wrong headed of me for one obvious reason. Pascal is set to be used in Obama's announced Super Computer initiative. If Pascal becomes too parallel, it will require many CPU cores to feed it. This would result in a rather large super computer mainframe (more heat, more power usage etc). Therefore it is entirely plausible, as others have suggested, that Pascal maintains its serialized architecture. One of the things nVIDIA could do with Pascal, to counter this lack of parallelism relative to AMD GCN, is to increase the amount of compute queues in its work distributor. From 32 to say 64 or more. This wouldn't resolve the latency issue but it would resolve the lack of parallelism on one level relative to AMD GCN. The issue here is that AMDs upcoming architecture will be improved on ALL fronts (Kitguru as source). Therefore Pascall won't be competing with Hawaii or Fiji. It will be competing with a parallel monster from AMD.

nVIDIA has a choice, focus on gaming or focus on HPC. I believe the HPC market brings in more money. Therefore it is highly likely that nVIDIA will retain the hierarchical nature of its HyperQ.

As for games ported over from the console. It is not the games themselves that matter to most of us, it is the engines upon which they run. Take Frostbite 3 or Unreal 4 as examples. Games built on those engines are usually scaled down for consoles, not scaled up for PCs. Therefore any title running on the consoles, using these engines, is scaled down for the console. Once it releases on the PC, we PC Gamers get the full version. This translates into far more action, far more post processing effects, lighting, physics etc on the PC version. This is what we are going to be seeing with Fable Legends running on the PC (relative to it running on the XBox One console as both variants use the Unreal Engine 4). Therefore we shouldn't assume that the XBox One variant, scaled down, will be the variant we will find over on the PC.

Just a few tidbits... I may have gone way beyond the intended question.


----------



## CasualCat

Quote:


> Originally Posted by *ku4eto*
> 
> Game doesn't use well all cores, and Skylake has some better IPC (and as such better Single threaded) performance than the 5960x.


That thread of conversion was regarding why the 6700K is faster than the 5960X if as was suggested the memory bandwidth is important and an issue. My point was memory bandwidth isn't the only difference between those two processors since the 6700K even barring any IPC improvements is clocked higher.


----------



## Randomdude

Sorry to butt in like this. I recently tried to upgrade my computer, I tried doing the LGA 771 -> 775 hardware mod, bought a Xeon, the whole shebang. Well, I screwed it up and have been without a PC for a while now. Decided I would do a decent build after all these years, got the top of the line Z97 components with extra detail for case and PSU... Now, because of this thread and especially your posts Mahigan, I decided to detach myself from what I want and look at what I need. Logic trumps emotion in hardware purchases. And I have the same views you have shared here regarding corporations and longevity of your hardware and voting with your money - as I've previously expressed through threads and posts here. I have a question, I already did buy the 4790k, even though I fear Zen will be amazing, but I held off buying a Fury X. Even though it was the part I wanted the most. Instead I opted for a 290. Do you believe this decision will pay off in the future? I can upgrade my Z97X-Gaming Etc black edition in 3 years for the equivalent board then. My question is, can someone completely emotionally detached, tell me if even the 4790k is worth it given the situation right now. Less CPU overhead, dx12 behind the corner and so on. I can return the 4790k, keep the board and get that big upgrade that I want in 3 years with the big fat chip and new processor, thanks to Gigabyte's motherboard program. I am having an extremely hard time deciding. Especially considering -anything- is a huge jump from a 1950/e6300 lga 775 platform. Please someone rational tell me what to do. Damn.


----------



## CrazyElf

I don't like to take sides in these, but I will note that nobody has offered a good competing hypothesis against Mahigan's hypothesis that Nvidia's current architecture is not good at parallel and that AMD has had a huge influence over the development of DX12 and Vulkan. So far, the attacks have been "you're wrong" or "you're and AMD fan", but beyond that, what's the game plan for current Nvidia users? From where I am standing, assuming (and I put a strong emphasis on the "assuming" as this is still a hypothesis and we need more data), the hypothesis is correct, DX12 and Vulkan simply cannot be as driver optimized as Nvidia has historically been able to because of their closer to the metal design.

That doesn't mean Nvidia is doomed (I will note that I disagree with Mahigan here on this in that Nvidia is still very wealthy with cash, still has a huge mind share, and the financial resources to recover. It still has considerable advantages in terms of GPU design in tessellation, better compression algorithms, better rasterization, and possibly a better memory controller. I would agree that in either Pascal or Volta, Nvidia will bounce back in paralleization, in a big way. To use an analogy, it's like when AMD outperformed Nvidia at bitcoin mining. The way the GPUs were designed favored AMD. That was a temporary advantage (ASICs have taken this area over), but it is an example of a lopsided (albeit temporary) advantage. Eventually though the advantage disappears (either the competition catches up or there's something else, like in the case of coin mining - ASICs).

So far the only "good" criticism I have seen of Mahigan's hypothesis is from Digidi, who noted that the Fury X had a higher draw call count (and by extension higher Polygon output) and that this implied that rasterization is not the bottleneck on the Fury. Now that is valid criticism beyond "I don't like you because this makes AMD look good". The issue of the what is causing the Fury X bottleneck remains to be solved. It may even poke holes into Mahigan's hypothesis and require a thorough revision. But I am disappointed that we are not seeing more discussion here, and rather seeing personal attacks.

At the moment though:

AMD is definitely behind on memory compression and that's going to put on the hurt on AMD when Nvidia adopts HBM2, assuming they cannot close the gap because right now Nvidia is more efficient in this regard
If rasterization is the bottleneck, they need to address that next generation in a big way
I would also advocate for more ROPs next generation and probably more ACEs as well (even more parallel)
Shaders, TMUs, and the other parts do not appear to be the bottleneck (otherwise the FuryX would have performed much better relative to the 290X)
That does however give us an idea of where the transistor budget should go.

We know that HBM2 is going to solve the 4GB VRAM issue as well, but that will benefit both sides equally.

Financially, AMD is in trouble too and unless they can regain mind/market share, that isn't going to change. Just remember though, it's in everyone's best interests as enthusiasts to have at least 2 companies competing it out, or else we'll get high prices for poor performance.

Quote:


> Originally Posted by *Mahigan*
> 
> This is the part which Anandtech got right
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Prior to Maxwell 2, HyperQ could only handle tasking 1 Graphics or 32 compute queues. HyperQ couldn't do both in parallel. With Maxwell 2 there was an increase in parallelism. Maxwell 2 can now handle tasking both 1 Graphics and 31 Compute queues in parallel (32 compute queues when no graphics task requires queuing such as in OpenCL or other pure computational GPGPU scenarios).
> 
> As for my inspiration in my younger years... that would be David Kanter from http://www.realworldtech.com/.
> https://www.linkedin.com/in/kanterd
> 
> He now works as an IT consultant for the Linley Group. Some of his work consists of creating white papers for various tech companies. He's the author of AMDs LiquidVR whitepapers and some of their Asynchronous Shading content with a focus on VR.
> http://www.linleygroup.com/


David Kanter does still sometimes give his thoughts, although not written.

These are his thoughts on Fury X





The latest is his Podcast on Skylake (it's over 90 minutes long, but I think it's worth a watch);






But yes, nobody gives articles as in depth as these:
http://www.realworldtech.com/jaguar/

One question I do have for Mahigan - what does Vulkan mean for Linux gaming? Given that both Vulkan and DX12 seem to share many similarities with Mantle, would it be easier to port over games from Windows to Linux? What implications does this have for Linux gaming? Or for that matter, Steam Machines?

Quote:


> Originally Posted by *Randomdude*
> 
> Sorry to butt in like this. I recently tried to upgrade my computer, I tried doing the LGA 771 -> 775 hardware mod, bought a Xeon, the whole shebang. Well, I screwed it up and have been without a PC for a while now. Decided I would do a decent build after all these years, got the top of the line Z97 components with extra detail for case and PSU... Now, because of this thread and especially your posts Mahigan, I decided to detach myself from what I want and look at what I need. Logic trumps emotion in hardware purchases. And I have the same views you have shared here regarding corporations and longevity of your hardware and voting with your money - as I've previously expressed through threads and posts here. I have a question, I already did buy the 4790k, even though I fear Zen will be amazing, but I held off buying a Fury X. Even though it was the part I wanted the most. Instead I opted for a 290. Do you believe this decision will pay off in the future? I can upgrade my Z97X-Gaming Etc black edition in 3 years for the equivalent board then. My question is, can someone completely emotionally detached, tell me if even the 4790k is worth it given the situation right now. Less CPU overhead, dx12 behind the corner and so on. I can return the 4790k, keep the board and get that big upgrade that I want in 3 years with the big fat chip and new processor, thanks to Gigabyte's motherboard program. I am having an extremely hard time deciding. Especially considering -anything- is a huge jump from a 1950/e6300 lga 775 platform. Please someone rational tell me what to do. Damn.


Keep the 4790K and buy a Z97 motherboard. Or sell it and buy a 6700K. Either should last several years.

Zen, according to AMD's marketing will be 40% faster than Steamroller. That's assuming they can make up to AMD's marketing and AMD actually delivers across the board. That's a huge "if". There is a precedent for accomplishing this - Conroe was a huge jump in performance over Prescott and everything that came before it, because the Netburst architecture was hot and inefficient (I suppose in many ways, Bulldozer and Steamroller are inefficient too).

But let's keep in mind that a 4790K is about 60% faster clock for clock than Steamroller. With Skylake, it's more like ~70%. So even AMD makes a 40% faster IPC, Intel will still have a considerable lead. Even then, that's assuming that they can have a huge IPC and aggressive clockspeeds.


----------



## Mahigan

Quote:


> Originally Posted by *Randomdude*
> 
> Sorry to butt in like this. I recently tried to upgrade my computer, I tried doing the LGA 771 -> 775 hardware mod, bought a Xeon, the whole shebang. Well, I screwed it up and have been without a PC for a while now. Decided I would do a decent build after all these years, got the top of the line Z97 components with extra detail for case and PSU... Now, because of this thread and especially your posts Mahigan, I decided to detach myself from what I want and look at what I need. Logic trumps emotion in hardware purchases. And I have the same views you have shared here regarding corporations and longevity of your hardware and voting with your money - as I've previously expressed through threads and posts here. I have a question, I already did buy the 4790k, even though I fear Zen will be amazing, but I held off buying a Fury X. Even though it was the part I wanted the most. Instead I opted for a 290. Do you believe this decision will pay off in the future? I can upgrade my Z97X-Gaming Etc black edition in 3 years for the equivalent board then. My question is, can someone completely emotionally detached, tell me if even the 4790k is worth it given the situation right now. Less CPU overhead, dx12 behind the corner and so on. I can return the 4790k, keep the board and get that big upgrade that I want in 3 years with the big fat chip and new processor, thanks to Gigabyte's motherboard program. I am having an extremely hard time deciding. Especially considering -anything- is a huge jump from a 1950/e6300 lga 775 platform. Please someone rational tell me what to do. Damn.


This is a tough question to answer. I'd say that a Core i7 4790K is a solid purchase, even in a massively parallel engine, such as nitrous, it performs well. Given that Nitrous is a bit overboard, as it pertains to its demand on CPU performance, I'd be inclined to conclude that a Core i7 4790K would be more than sufficient for DX12 gaming.

The hard part is the GPU. With an nVIDIA GPU (Maxwell 2 or GTX 980/980 Ti, DX11 performance today translates into DX12 performance tomorrow. You're not going to get more Frames per Second under DX12 that you currently enjoy under DX11. I feel very confident in making this statement based on the architecture involved.

On the AMD front, an R9 290 is a good buy as its DX12 performance will likely see a boost under compute heavy workloads. What we can bank on is that in any DX12 title, The R9 290 will derive efficient compute usage. Since this is what held Hawaii back in DX11 titles, we should expect a boost across ALL DX12 titles. That being said, provided you don't plan on gaming at anything beyond 1080p/1440p, the R9 290 should allow for some great DX12 gaming. I think it was a good idea, on your part, not to purchase a Fury-X or a GTX 980 Ti.

Since the R9 290 is relatively inexpensive, can play any DX11 title, and is set to receive a boost under DirectX 12, I'd say it was a very logical purchase. Come 2016, you'll be able to choose between AMDs Greenland and nVIDIAs Pascal parts (I usually would recommend going highend but not the top model so rather than a Fury-X, would suggest Fury and rather than a 290x, I would suggest the 290 rather than a GTX 980 Ti, I would suggest a GTX 980 and so on). This usually allows for both longevity and bang for your buck.

I am quite certain that DirectX 12's main feature, Asynchronous shading, will be utilized in all the upcoming DX12 titles. AMD GCNs Draw Call advantage also plays into this. Compute performance will likely be more important than any other aspect of a GPU moving forward. Geometry performance (in the form of tessellation) will remain the same (a factor in TWIMTBP nVIDIA sponsored titles). Texturing wise, an R9 290 is better than a GTX 980. Pixel fill rate wise, a GTX 980 is better than an R9 290 (one of the most important aspect for gaming at higher resolutions) but not by much. When it comes to nearly every other metric, the two are close enough in performance.

AMDs achilles heel was its compute performance under DX11. With this rectified under DX12 and with new cards coming in late 2016. I'd say you've made the most logical choice. Save some money.

And for those who think it is absurd to recommend an R9 290 over a GTX 980, as more DX12 titles begin to surface... I think your opinions will change. I'm not basing this on AotS alone. I'm basing this on the fact that what was holding the 290/290x back in terms of performance... was its compute capabilities under DX11. It doesn't matter to what extent Asynchronous shading is used. The fact it is used at all is what matters. *Any usage of Asynchronous shading will benefit AMDs compute capabilities moreso than nVIDIAs as it pertains to their currently released GPU architectures*.

It now makes sense to see why AMD re-released Hawaii with a larger frame buffer (DirectX 12 4K gaming). It also makes sense as to why they invested so much die space for compute on the Fiji parts rather than bolster the rest of the pipeline. What doesn't make sense is Fiji's performance under AotS. This I admit. I think we're all about to see why AMD made the choices they did in a relatively short period of time.


----------



## GorillaSceptre

There's been a lot of discussion happening on more tech savvy forums.

I'd link them but i'm not sure what the rules are about that on here.

After reading some of the posts by the "experts" over there, the only conclusion seems to be that Mahigan's caused a storm in a tea cup


----------



## Randomdude

Thank you very much for the detailed input. It has definitely made it easier to make up my mind thanks to you. I will keep the 4790K, and using Gigabyte's two-for-one, exclusive to black edition Z97 boards program, in 3 years I will exchange this Gaming G1 board for its equivalent, sell the 4790K, the ram and the R9 290, and buy the big fat 2017-2018 GPU. I will also have the choice whether to sell the board then and move to the enthusiast socket or not, since the new board will still have 2 years of warranty, be brand new and unopened, and thus be easy to get off my hands. Perhaps I will invest the savings in something long-term as well. Food for thought, haha. Definitely hard to justify a high-end GPU for me at this point in time. Coming from someone who really wants a big upgrade and has had a problem with patience. Though either way, I will not get to keep the 4790K in 3 years time, hence I am still having doubts about that particular nit pick.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> There's been a lot of discussion happening on more tech savvy forums.
> 
> I'd link them but i'm not sure what the rules are about that on here.
> 
> After reading some of the posts by the "experts" over there, the only conclusion seems to be that Mahigan's caused a storm in a tea cup


Developers or GPU guru's?

I dealt with so-called experts in the Anand forums, they had no clue what they were talking about. Some claimed that Maxwell had better compute capabilities. They linked me to DX11 benchmarks as proof. I don't see how having better compute capabilities, under DX11, translates to the same, under DX12. A few game developers chimed in and bolstered my arguments.


----------



## Bauxno

I think we will see this perf increase on game this friday( ark dev means saturday) with ark survival evolved getting the dx 12 patch. The only thing I dont know its how good is this game gonna use the strong feature of amd cards and those ACES


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Developers or GPU guru's?
> 
> I dealt with so-called experts in the Anand forums, they had no clue what they were talking about. Some claimed that Maxwell had better compute capabilities. They linked me to DX11 benchmarks as proof. I don't see how having better compute capabilities, under DX11, translates to the same, under DX12. A few game developers chimed in and bolstered my arguments.


Supposedly both. But who knows who's real on the internet.

They were using fancy words though


----------



## Mahigan

Quote:


> Originally Posted by *Bauxno*
> 
> I think we will see this perf increase on game this friday( ark dev means saturday) with ark survival evolved getting the dx 12 patch. The only thing I dont know its how good is this game gonna use the strong feature of amd cards and those ACES


Ark Survival instinct right?

I suppose the wild card, in its performance, will be whether or not if it includes Async shading as well as the fact that it includes GameWorks. I wouldn't be surprised if GameWorks titles sacrifice on Post Processing effects and place an emphasis on PhysX effects instead (as well as non-beneficial Tessellation levels). That's the card nVIDIA can play.

I mean you can't have both immense degrees of cinematic effects, through post processing, as well as PhysX because both tax the available compute resources.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Supposedly both. But who knows who's real on the internet.
> 
> They were using fancy words though


Do you have a link? I'd like to chime in


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Do you have a link? I'd like to chime in


Going back a few pages, i saw you were already in there









The other one has been locked.

The next couple years are going to be interesting to say the least. More competition means beefier cards for us


----------



## PontiacGTX

@Mahigando you think that future games could improve their performance by taking advantage of extra cores in the near future?because as you can see there skylake is ahead a 5960x,even if DX12 scaling is up to 6cores or more

And btw you used photoworxx as a comparison for the 6700k in SSE/2/3 and other instructions,but it keeps telling that the 5960x is ahead so their game engine has something that skylakr has improved way better than Haswell-E



AVX2 cant be used on Directxmath ,is it true for mantle/DX12?
http://blogs.msdn.com/b/chuckw/archive/2015/06/03/directxmath-avx2.aspx


----------



## thegreatsquare

Quote:


> Originally Posted by *Mahigan*
> 
> Consoles are close to metal, indeed, the DirectX 11-like API being used on the XBox One is "close to metal". That being said, this API did not make use of the available ACEs as Asynchronous shading was not a feature of DirectX 11, on the API or otherwise. This is why Tim Sweeney, of Epic Games, announced a 20% boost was on the way for the XBox One when updating the console for DirectX 12 and running games which are developed for DirectX 12.
> 
> As far as not being able to hold a candle to your Laptop's 980m in DX11, this is likely true because of the fact that nVIDIAs architecture (the hierarchical nature of a Grid Management Unit and the Work Distributor, allows nVIDIA to derive more Compute performance, relative to AMD GCN, under a serial API such as DX11. Therefore we can conclude that Maxwell 2 is sort of a hybrid architecture which sits between DX11 and DX12 (serial and parallel). Skilled in both trades. This is why we don't see an enormous performance boost going from DirectX 11 to DirectX 12 on nVIDIAs GPUs.
> 
> I am of the belief that Pascal will incorporate far more parallelism, in its architecture, than GCN. This could be wrong headed of me for one obvious reason. Pascal is set to be used in Obama's announced Super Computer initiative. If Pascal becomes too parallel, it will require many CPU cores to feed it. This would result in a rather large super computer mainframe (more heat, more power usage etc). Therefore it is entirely plausible, as others have suggested, that Pascal maintains its serialized architecture. One of the things nVIDIA could do with Pascal, to counter this lack of parallelism relative to AMD GCN, is to increase the amount of compute queues in its work distributor. From 32 to say 64 or more. This wouldn't resolve the latency issue but it would resolve the lack of parallelism on one level relative to AMD GCN. The issue here is that AMDs upcoming architecture will be improved on ALL fronts (Kitguru as source). Therefore Pascall won't be competing with Hawaii or Fiji. It will be competing with a parallel monster from AMD.
> 
> nVIDIA has a choice, focus on gaming or focus on HPC. I believe the HPC market brings in more money. Therefore it is highly likely that nVIDIA will retain the hierarchical nature of its HyperQ.
> 
> As for games ported over from the console. It is not the games themselves that matter to most of us, it is the engines upon which they run. Take Frostbite 3 or Unreal 4 as examples. Games built on those engines are usually scaled down for consoles, not scaled up for PCs. Therefore any title running on the consoles, using these engines, is scaled down for the console. Once it releases on the PC, we PC Gamers get the full version. This translates into far more action, far more post processing effects, lighting, physics etc on the PC version. This is what we are going to be seeing with Fable Legends running on the PC (relative to it running on the XBox One console as both variants use the Unreal Engine 4). Therefore we shouldn't assume that the XBox One variant, scaled down, will be the variant we will find over on the PC.
> 
> Just a few tidbits... I may have gone way beyond the intended question.


That was all good, ...I think. You make some good observations. On the other hand, [...in Zodiberg vioce] "I'm not hearing a no." Even if it is scaled up for PC, that's what playing with the settings is for ...and if that don't cut it, then that's what the MXM slot is for.


----------



## Mahigan

Shots fired over at HardOCP









We'll see how accepted the final product is. Great work guys/gals









http://hardforum.com/showthread.php?t=1873640


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> @Mahigando you think that future games could improve their performance by taking advantage of extra cores in the near future?because as you can see there skylake is ahead a 5960x,even if DX12 scaling is up to 6cores or more
> 
> And btw you used photoworxx as a comparison for the 6700k in SSE/2/3 and other instructions,but it keeps telling that the 5960x is ahead so their game engine has something that skylakr has improved way better than Haswell-E
> 
> 
> 
> AVX2 cant be used on Directxmath ,is it true for mantle/DX12?
> http://blogs.msdn.com/b/chuckw/archive/2015/06/03/directxmath-avx2.aspx


I think it is a mix of the improved SIMD performance, a higher clock and improved caching performance but I really don't know. I'm more well versed on GPU architectures than CPU architectures.

What I do know is that the Core i7 6700K is superior to the 5960x in terms of caching and IPC (per thread). The 6700K also has a decent amount of memory bandwidth at its disposal for a dual channel configuration (thanks to DDR4 support). Of course once you add all of the 5960x's extra threads and cores the IPC (multi-thread) performance is higher.


----------



## Randomdude

Ashes of the Singularity, Future is Fusion - funny correlation, you know maybe AMD really has played the long-term hand, perhaps more so than we can currently grasp, I would love competition again and I do hope you are right Mahigan, and this is not some personal agenda of yours. Making a site however, I can see how you'd benefit from being correct in this hypothesis. Best of luck to all of us.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I think it is a mix of the improved SIMD performance, a higher clock and improved caching performance but I really don't know. I'm more well versed on GPU architectures than CPU architectures.
> 
> What I do know is that the Core i7 6700K is superior to the 5960x in terms of caching and IPC (per thread). The 6700K also has a decent amount of memory bandwidth at its disposal for a dual channel configuration (thanks to DDR4 support). Of course once you add all of the 5960x's extra threads and cores the IPC (multi-thread) performance is higher.


it is interesting that other graphics engines under directx 12 scaled better with more cores/threads but a reason this time might the game uses 6threads/cores whichever is used first so 3cores+3(2)threads of a newer architecture might achieve similar results than 6/threads cores from an older architecture.

Also that mantle had a wider support for multicore than DX12 and Microsoft didnt take this advantage...


----------



## Themisseble

Yes... but nobody said that VULKAN wont.


----------



## anubis44

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Not Mantle vs DX12 again
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You'd have to be a special kind of ignorant to think that Mantle forced MS to make DX12. It's naive to think that these corporations don't know what's going on at other companies years before we find out.


Well, Mahigan is a retired AMD GPU engineer. He actually understands the code in Mantle and DX12. In addition, he presented relevant quotes from luminaries in the gaming software world coming right out and saying how influential Mantle was.

By contrast, what qualifies you to have an opposing opinion on the subject? If you don't have any evidence to the contrary, you don't have an argument.

Please provide details about why it's impossible that AMD forced Microsoft to make the DX12 we now have.


----------



## Noufel

Why the gains from dx12 aren't the same for the 390X and for the furyX when the last one has more theorical compute power?
Has oxide coded their game to run better on GCN 1.1 with dx12 than the GCN 1.2 ?


----------



## NightAntilli

Quote:


> Originally Posted by *Noufel*
> 
> Why the gains from dx12 aren't the same for the 390X and for the furyX when the last one has more theorical compute power?


Same amount of ROPs most likely.


----------



## Exilon

Quote:


> Originally Posted by *anubis44*
> 
> Well, Mahigan is a retired AMD GPU engineer.


Wut?


----------



## PontiacGTX

Quote:


> Originally Posted by *Noufel*
> 
> Why the gains from dx12 aren't the same for the 390X and for the furyX when the last one has more theorical compute power?
> Has oxide coded their game to run better on GCN 1.1 with dx12 than the GCN 1.2 ?


similar ACE count?similar rop count,similar tessellation,similar pixel fillrate


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> Why the gains from dx12 aren't the same for the 390X and for the furyX when the last one has more theorical compute power?
> Has oxide coded their game to run better on GCN 1.1 with dx12 than the GCN 1.2 ?


There is something holding Fury-X back from achieving its true potential. I have not yet been able to discern what is causing this odd behavior on the part of Fiji. Fiji shares most of the same architecture as Hawaii. Therefore Fiji, High Bandwidth Memory and Compute Units aside, is quite close to both Hawaii and Tonga architecturally. I'd have to purchase a Fiji card in order to figure out just what the issue is.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> There is something holding Fury-X back from achieving its true potential. I have not yet been able to discern what is causing this odd behavior on the part of Fiji. Fiji shares most of the same architecture as Hawaii. Therefore Fiji, High Bandwidth Memory and Compute Units aside, is quite close to both Hawaii and Tonga architecturally. I'd have to purchase a Fiji card in order to figure out just what the issue is.


But it's GCN 1.2. If you see some comes R9 285 is faster then R9 280X.


----------



## Noufel

Quote:


> Originally Posted by *PontiacGTX*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> Why the gains from dx12 aren't the same for the 390X and for the furyX when the last one has more theorical compute power?
> Has oxide coded their game to run better on GCN 1.1 with dx12 than the GCN 1.2 ?
> 
> 
> 
> similar ACE count?similar rop count,similar tessellation,similar pixel fillrate
Click to expand...

Knowing AMD was working on low level api aka mantle and knowing how paralelism works and what ressources it needs why would AMD gimp their new fiji gpus by using the same ACE, ROP, pix fillrate...etc it would have been more appropriate for them to make a gpu with less SP ( ~ 3000 ) with more ACE ROPs .


----------



## CasualCat

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-5
Quote:


> Yeah read the first 5 paragraphs of that and its clear that he is just mashing all of the operations in one lump and trying to make a reason behind the performance disparity lol, added to what you just stated the *ACE's don't have anything to do with the CPU draw calls and overhead.*


----------



## Slaughterem

@mahigan One thing I noticed about the architecture's is that the Fiji has 2 dma engines and Maxwell 2 has 1. Can you shed any light on the advantages or disadvantages of having two or one DMA? Will this be something that VR will be able to leverage?


----------



## sugarhell

Quote:


> Originally Posted by *CasualCat*
> 
> https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-5


Who said that ACEs has anything to do with draw calls?


----------



## CasualCat

Quote:


> Originally Posted by *sugarhell*
> 
> Who said that ACEs has anything to do with draw calls?


Seems Mahigan is:
Quote:


> Originally Posted by *Mahigan*
> 
> Now compare this with AMD GCN 1.1/1.2 which is composed of 8 *A*synchronous *C*ompute *E*ngines each able to queue 8 Compute tasks for a total of 64 coupled with 1 Graphic task by the Graphic Command Processor. See bellow:
> 
> Each ACE can also apply certain Post Processing Effects without incurring much of a performance penalty. This feature is heavily used for Lighting in Ashes of the Singularity. Think of all of the simultaneous light sources firing off as each unit in the game fires a shot or the various explosions which ensue as examples.
> 
> *This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls* now being made by the Multi-Core CPU under Direct X 12.


edit: separately this full disclosure would have been nice in this thread:
Quote:


> Originally Posted by *Mahigan*
> 
> Hello Everyone,
> 
> I'm obviously new here. I'm a Canadian living in Morocco over the past two years with my Wife. I'm an old Veteran
> 
> 
> 
> 
> 
> 
> 
> , *having worked for* Dell, HP, Compaq, *ATi* and others in the tech industry. I'm a Technical Trainer by trade. I travel the world teaching various work forces on topics such as IT Troubleshooting, IT Engineering, Fibe Optics Engineering and Maintenance amongst others.
> 
> Currently working on immigrating my Wife and I back to Canada
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Pleased to meet you all,
> 
> Mahigan


----------



## Mahigan

Quote:


> Originally Posted by *CasualCat*
> 
> Seems Mahigan is:
> edit: separately this full disclosure would have been nice in this thread:


Quote:


> Now compare this with AMD GCN 1.1/1.2 which is composed of 8 Asynchronous Compute Engines each able to queue 8 Compute tasks for a total of 64 coupled with *1 Graphic task by the Graphic Command Processor*. See bellow:
> 
> Each ACE can also apply certain Post Processing Effects without incurring much of a performance penalty. This feature is heavily used for Lighting in Ashes of the Singularity. Think of all of the simultaneous light sources firing off as each unit in the game fires a shot or the various explosions which ensue as examples.
> 
> This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under Direct X 12.


The Graphic Command Processor is decoupled from the ACEs. The ACEs handle the Compute tasks while simultaneously working in conjunction with the Graphics Command Processor. The ACEs are separate units from the Graphics Command Processor. All 8 ACEs and the single Graphics Command Processor can function in parallel.

This is made possible by DirectX 12 and its ability to make use of Multi-core processors.

Thus AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under DirectX 12. While handling the Compute tasks separately. With DirectX 11, the Graphics Command Processor handled both the Compute tasking as well as the Graphics tasking. Therefore you had to use pre-emption in order to prioritize important workloads in a serial fashion.

Did they miss the "Graphics Command Processor" in my statement?


----------



## CasualCat

Quote:


> Originally Posted by *Mahigan*
> 
> The Graphic Command Processor is decoupled from the ACEs. The ACEs handle the Compute tasks while simultaneously working in conjunction with the Graphics Command Processor. The ACEs are separate units from the Graphics Command Processor. All 8 ACEs and the single Graphics Command Processor can function in parallel.
> 
> This is made possible by DirectX 12 and its ability to make use of Multi-core processors.
> 
> Thus AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under DirectX 12.
> 
> Did they miss the "Graphics Command Processor" in my statement?


I don't know but the level of technical knowledge I'm seeing on that forum, and knowing you've created multiple recent profiles on other forums anyhow, it might be more productive to engage them directly in conversation.


----------



## mav451

I'm seeing strong words and thinly veiled antagonism on Beyond3D.
And yet I'm not seeing any kind of fleshed out rebuttal.

So easy to be dismissive







Ah well.

If they want to add input, this thread is open for their statements.


----------



## Forceman

Quote:


> Originally Posted by *CasualCat*
> 
> I don't know but the level of technical knowledge I'm seeing on that forum, and knowing you've created multiple recent profiles on other forums anyhow, it might be more productive to engage them directly in conversation.


It's probably also the best place to have a technical discussion of DX12. Lots of smart (and industry) people post there.


----------



## CasualCat

Quote:


> Originally Posted by *Forceman*
> 
> It's probably also the best place to have a technical discussion of DX12. Lots of smart (and industry) people post there.


That is the impression I got after perusing their forums.


----------



## Mahigan

Quote:


> Originally Posted by *CasualCat*
> 
> I don't know but the level of technical knowledge I'm seeing on that forum, and knowing you've created multiple recent profiles on other forums anyhow, it might be more productive to engage them directly in conversation.


The only forums I've posted in have been Anandtech, HardOCP, Hexus and Overclock.net.

I've stayed away from Tomshardware and other such forums.

I must admit that my first statements were riddled with errors. Hence why I have revised them as new information came to light about the architectures at play. When I last worked for ATi the Radeon x800 series were brand new. That was a LONG time ago. Everything has changed since then. We moved from fixed function units, in GPUs, to the highly parallel architectures we see today.

AMD is not ATi. And I haven't been following architectures for quite some time, lost interest prior to Cayman being released. I moved on.

The only reason my interest has returned is because of DirectX 12. When I saw what Ashes of the Singularity was capable of achieving I started reading again. Catching up on everything.

I started to post in forums and was met with rather hostile people. It's not the same environment it was back then that's for sure. I can't make a single statement or speculate without being bombarded with angry posters.


----------



## CasualCat

Quote:


> Originally Posted by *Mahigan*
> 
> The only forums I've posted in have been Anandtech, HardOCP, Hexus and Overclock.net.
> 
> I've stayed away from Tomshardware and other such forums.
> 
> I must admit that my first statements were riddled with errors. Hence why I have revised them as new information came to light about the architectures at play. When I last worked for ATi the Radeon x800 series were brand new. That was a LONG time ago. Everything has changed since then. We moved from fixed function units, in GPUs, to the highly parallel architectures we see today.
> 
> AMD is not ATi. And I haven't been following architectures for quite some time, lost prior to Cayman being released. I moved on.
> 
> The only reason my interest has returned is because of DirectX 12. When I saw what Ashes of the Singularity was capable of achieving I started reading again. Catching up on everything.
> 
> I started to post in forums and was met with rather hostile people. It's not the same environment it was back then that's for sure. I can't make a single statement or speculate without being bombarded with angry posters.


Fair enough. I'm interested in seeing a discussion on the place I linked as they actually seem to have the technical expertise to confirm, debunk, or slightly correct what you've been writing about. They certainly seem more technically oriented than any of the others you've listed including OCN and most definitely Tom's.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> The only forums I've posted in have been Anandtech, HardOCP, Hexus and Overclock.net.
> 
> I've stayed away from Tomshardware and other such forums.
> 
> I must admit that my first statements were riddled with errors. Hence why I have revised them as new information came to light about the architectures at play. When I last worked for ATi the Radeon x800 series were brand new. That was a LONG time ago. Everything has changed since then. We moved from fixed function units, in GPUs, to the highly parallel architectures we see today.
> 
> AMD is not ATi. And I haven't been following architectures for quite some time, lost interest prior to Cayman being released. I moved on.
> 
> The only reason my interest has returned is because of DirectX 12. When I saw what Ashes of the Singularity was capable of achieving I started reading again. Catching up on everything.
> 
> I started to post in forums and was met with rather hostile people. It's not the same environment it was back then that's for sure. I can't make a single statement or speculate without being bombarded with angry posters.


Well keep doing what you are doing









The more we can demystify the "performance' riddle through more information, arguments and cross arguments , the better educated we would be as a consumer.


----------



## Kpjoslee

Your thoughts? @Mahigan









http://hardforum.com/showpost.php?p=1041818052&postcount=29
Quote:


> The benchmark for AOS is probably valid, but async shader performance are sensitive to how they are written for the architecture (this is much more so then serially written shaders), there are many variables for this.
> 
> He is incorrect on many assumptions of how the GCN and Maxwell 2 architectures work from an ALU, ACE, AWS, ROP, Hull shader units, and many more.
> 
> GCN and Maxwell 2 ALU structures are very different. Maxwell 2 does dual issue and GCN can do 4 or 5 co issue. Maxwell 2 has 32 async wrap schedulers which he didn't account for, in this regard Maxwell 2 should more capable than GCN with 8 ACE's (if we follow the OP's logic) but again as above depending on how the shaders are written. He has lumped GCN ACE performance with API overhead due to draw calls (from another post from another forum I have only glanced over this current post of his), one mistake. ACE's and AWS's don't interact with the CPU they were made to reduce latency by doing out of order instructions within the GPU, if they had to interect with the CPU that would defeat the purpose of latency reduction.
> 
> He has linked to Keplar's white paper for an example for Maxwell 2's Hyper Q's workflow, which Maxwell 2's Hyper Q's workflow is very different, because the individual units are different another mistake. Maxwell 2's HyperQ does work on grids, with child girds for compute if the parent grid needs a child grild, and does so in serial this is the SAME for GCN, within the grids is where ACE's and AWC's are "asynchronous", the reason for this is if they weren't you will have rendering issues and other problems. Think of this as a critical path problem if X to Y you have 4 operations ABCD, C and D first (C is a child set for A, and D is a different grid that needs to be used for B to complete), each A B C D can be done separately, but C must be done for A to complete and D must be done first for B to complete. With in each of the sets they can be done any which, so some parts of A can be done in concurrent with C as also B parts can be done as D is is being done. After C and D are done, then A and B are complete, then X can become Y with the results of A and B.
> 
> There is another mistake with the Exectution of Compute and Graphics computations vs Execution in Queue, those are two different things, the first is actually being done in realtime the second is storing for computation as ALU's get freed up. Maxwell 2 execution in queue is dependent on if there is register space and cache is available to store. Not sure about GCN if there is a variable amount do to the same limitation, but AMD stated a set amount of 64. Anandtech's table is correct in what they stated, not incorrect as the OP stated so any conclusions from that are incorrect.
> 
> ROPs don't control tessellation performance, the Hull shaders and how often they are fed if they are bottlenecked due to the amount of procedural geometry is being created do, another mistake which he posted in another forum seems like he is learning not to lump all his conclusions into one.
> 
> Microsoft's specifications were carefully made to reduce pressure on the tessellation pipeline going from Dx10 to Dx11. This is because in Dx10 the geometry shaders were being bottlenecked by the shader array because of the new geometry had to be sent into the shader array again from another one of his posts about async shaders on another forum.
> 
> These are just a few of the mistakes, there are many more, I would suggest the best way to understand these things is to go through the CUDA and OpenCl handbooks by each respective IHV's to get a better understanding of Async shaders and how they can work optimally for each of the architectures.
> 
> I just posted a paraphrased version of how and why async shaders work on B3D so lets see where it goes the rest of what I posted its easily googleable for white papers or found in handbooks as above.


----------



## Mahigan

I've replied. I'll post my reply here...
Quote:


> Maxwell 2 has 32 async wrap schedulers which he didn't account for, in this regard Maxwell 2 should more capable than GCN with 8 ACE's (if we follow the OP's logic) but again as above depending on how the shaders are written.


I will address this piece by piece.

1. Maxwell does have 4 Warp Schedulers per SMX (I assume this is what you mean by Wrap schedulers). These Warp Schedulers are fed MPI tasks by what nVIDIA terms as their HyperQ. HyperQ consists of a Grid Management Unit (holding 1000s of grids) and the Work Distributor (up to 32 active grids, or compute queues). Each set of 4 Warp schedulers function as part of their respective SMX. They can thus be characterized to be dependent as they can only access the compute resources from within the SMX in which they reside. You cannot claim that Maxwell has 32 ACEs as the Warp schedulers in one SMX cannot access the compute resources of another SMX. The ACEs, found in GCN, run as independent schedulers who can schedule to any available Compute unit(s) in any of the 4 shader engines in GCN. The ACEs can also sync, run dependently of one another, if the need arises (example would be when the tasks in one ACE are dependent on the tasks in another ACE), by way of communicating by cache, memory and/or the global data share. ACEs are more flexible than nVIDIAs HyperQ solution because of these features. this flexibility allows for a more efficient use of the available compute resources. Another caveat of the HyperQ solution is that the work distributor can only queue up either (1 graphics and 31 compute tasks or 32 compute tasks). GCNs ACEs have a queue depth of 8 (64 queues total) and run independently of the Graphics Command Processor, therefore you can queue up more tasks and prioritize more work per cycle. The rest of my thoughts on this are mentioned in the original post in this thread. With the information I am giving you about how ACEs function. I believe you will understand just why AMDs Asynchronous compute implementation is superior to nVIDIAs HyperQ implementation.

As for the shaders, nVIDIA supplied their own HyperQ optimized shader code for AotS. Therefore Maxwell 2 is not more capable, as the results show.
Quote:


> He has lumped GCN ACE performance with API overhead due to draw calls (from another post from another forum I have only glanced over this current post of his), one mistake.


I was responding to another poster who brought up draw calls. I stated, and I quote:
"Now compare this with AMD GCN 1.1/1.2 which is composed of 8 Asynchronous Compute Engines each able to queue 8 Compute tasks for a total of 64 coupled with *1 Graphic task by the Graphic Command Processor*. See bellow:

*Each ACE can also apply certain Post Processing Effects without incurring much of a performance penalty. This feature is heavily used for Lighting in Ashes of the Singularity. Think of all of the simultaneous light sources firing off as each unit in the game fires a shot or the various explosions which ensue as examples.*

This means that AMDs GCN 1.1/1.2 is best adapted at handling the increase in Draw Calls now being made by the Multi-Core CPU under Direct X 12."

Nowhere did I state that the ACEs handled draw calls. The Graphics command processor takes care of draw calls. Because it runs independently from, and in parallel with, the ACEs. The CPU does not need to wait a cycle in order to submit more work. Both the compute tasks and the graphic tasks can be submitted in parallel. this means that your draw call count is not affected by the post processing work the ACEs are sending to the Compute Units. Prior to DirectX 12. The Command Processor handled both Compute tasks as well as Graphic tasks. You could use pre-emption, for higher priority tasks, but you could not have the GPU work on both in parallel. You mis-read my statement.

Quote:


> He has linked to Keplar's white paper for an example for Maxwell 2's Hyper Q's workflow, which Maxwell 2's Hyper Q's workflow is very different, because the individual units are different another mistake.


This is a misrepresentation. Because I linked to the Kepler white paper in order to showcase the general idea behind HyperQ. I then linked to the Anandtech chart which clearly highlighted the difference in how both Compute and Graphics tasks are handled in Maxwell/2 and Kepler.
Quote:


> Maxwell 2's HyperQ does work on grids, with child girds for compute if the parent grid needs a child grild, and does so in serial this is the SAME for GCN, ACE's and AWC's are "asynchronous", the reason for this is if they weren't you will have rendering issues and other problems.


This is true for AWC's but not for ACE's. ACE's can run independent or dependent of one another and sync through cache, memory and the global data share. This alleviates rendering issues and "other problems". ACEs thus have the capacity for more parallelism than AWCs. This is what I mentioned in my original post. Are you well versed on AMDs ACE implementation?
Quote:


> There is another mistake with the Exectution of Compute and Graphics computations vs Execution in Queue, those are two different things, the first is actually being done in realtime the second is storing for computation as ALU's get freed up.


I did make this mistake in one of my first posts on the topic. this mistake was pointed to me by another user. I researched his claim and redacted my statements in future posts. ex: This "mistake" is not found in the post above.
Quote:


> Anandtech's table is correct in what they stated, not incorrect as the OP stated so any conclusions from that are incorrect.


Anandtech's table has "Queues" in the title. Speaking of queues would make the graph incorrect. If the graph was about Asynchronous compute units then the title above would have mentioned this. The entire information, above and below the graph, is discussing queues. Each ACE has a queue depth of 8. There are 8 ACEs. This means a total queue of 64, not 8. See AMD information below:

Quote:


> ROPs don't control tessellation performance, the Hull shaders and how often they are fed if they are bottlenecked due to the amount of procedural geometry is being created do, another mistake which he posted in another forum seems like he is learning not to lump all his conclusions into one.


I don't remember making this statement. Could you please link me to it?
.


----------



## PontiacGTX

@Mahigan
Did you notice the R9 Fury nano has 2 HWS instead 4 ACEs more?
What is the difference?
Quote:


> Originally Posted by *Noufel*
> 
> Knowing AMD was working on low level api aka mantle and knowing how paralelism works and what ressources it needs why would AMD gimp their new fiji gpus by using the same ACE, ROP, pix fillrate...etc it would have been more appropriate for them to make a gpu with less SP ( ~ 3000 ) with more ACE ROPs .


that would reach the performance of a GTX 980 OC but still games need more power than 3072SP GCN1.2


----------



## resonance spark

@Mahigan I've tabulated the data you have provided so people can see it better. Hopefully it will help people understand your explanations. Highlighted in purple are the theoretical winners for each category though higher numbers may not always be the actual winners since each architecture may have specific instructions allowing them to carry out an operation more efficiently. Feel free to refer to them.


----------



## error-id10t

I'll admit, every time I come here my head hurts afterwards from the way too technical stuff that I understand. I just want to know which lolly tastes nicer!


----------



## diggiddi

Quote:


> Originally Posted by *error-id10t*
> 
> I'll admit, every time I come here my head hurts afterwards from the way too technical stuff that I understand. I just want to know which lolly tastes nicer!


LOL, needs a TLDR statement after every technical explanation


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> The only forums I've posted in have been Anandtech, HardOCP, Hexus and Overclock.net.
> 
> I've stayed away from Tomshardware and other such forums.
> 
> I must admit that my first statements were riddled with errors. Hence why I have revised them as new information came to light about the architectures at play. When I last worked for ATi the Radeon x800 series were brand new. That was a LONG time ago. Everything has changed since then. We moved from fixed function units, in GPUs, to the highly parallel architectures we see today.
> 
> AMD is not ATi. And I haven't been following architectures for quite some time, lost interest prior to Cayman being released. I moved on.
> 
> The only reason my interest has returned is because of DirectX 12. When I saw what Ashes of the Singularity was capable of achieving I started reading again. Catching up on everything.
> 
> I started to post in forums and was met with rather hostile people. It's not the same environment it was back then that's for sure. I can't make a single statement or speculate without being bombarded with angry posters.


That's because a lot of people don't want to talk about technology today unless it involves them describing their epeen and how it's marvelously better. I like talking about progression in technology. I don't understand fanboyism from any faction, it down right frustrates me.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> The only forums I've posted in have been Anandtech, HardOCP, Hexus and Overclock.net.
> 
> I've stayed away from Tomshardware and other such forums.
> 
> I must admit that my first statements were riddled with errors. Hence why I have revised them as new information came to light about the architectures at play. When I last worked for ATi the Radeon x800 series were brand new. That was a LONG time ago. Everything has changed since then. We moved from fixed function units, in GPUs, to the highly parallel architectures we see today.
> 
> AMD is not ATi. And I haven't been following architectures for quite some time, lost interest prior to Cayman being released. I moved on.
> 
> The only reason my interest has returned is because of DirectX 12. When I saw what Ashes of the Singularity was capable of achieving I started reading again. Catching up on everything.
> 
> I started to post in forums and was met with rather hostile people. It's not the same environment it was back then that's for sure. I can't make a single statement or speculate without being bombarded with angry posters.


Where did you work for ATi? Canada? My teacher also worked for ATi back in the day in Markham.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Where did you work for ATi? Canada? My teacher also worked for ATi back in the day in Markham.


That's where I worked


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> That's where I worked


Did you retire to Morocco. I heard that was a popular place for expats to just vanish off the face of the earth and enjoy life.


----------



## Mahigan

The exchange, with Razor1 (the GPU Guru at Beyond3D), was very enlightening.

He walked away learning things he didn't know about GCN and I walked away learning things I did not know about Maxwell 2.

That's the way it should be. We're both the better for it.

Worth reading: http://hardforum.com/showthread.php?p=1041818853&posted=1#post1041818853


----------



## Mahigan

Quote:


> Originally Posted by *SpeedyVT*
> 
> Did you retire to Morocco. I heard that was a popular place for expats to just vanish off the face of the earth and enjoy life.


I'm not retired yet.

I left ATi and joined HP/Compaq, then Dell then Bell Canada.

Bell is what brought me to Morocco. I met my Wife, I fell in love and married.









Heading back to Canada, with my wife, in September.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> The exchange, with Razor1 (the GPU Guru at Beyond3D), was very enlightening.
> 
> He walked away learning things he didn't know about GCN and I walked away learning things I did not know about Maxwell 2.
> 
> That's the way it should be. We're both the better for it.
> 
> Worth reading: http://hardforum.com/showthread.php?p=1041818853&posted=1#post1041818853


Interesting read







+rep


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> I'm not retired yet.
> 
> I left ATi and joined HP/Compaq, then Dell then Bell Canada.
> 
> Bell is what brought me to Morocco. I met my Wife, I fell in love and married.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Heading back to Canada, with my wife, in September.


Wonderful! I'm going to Thailand to meet my GF's parents and see it next year.

Not rich good thing Thailand isn't expensive.


----------



## Mahigan

*Asynchronous Compute Engines*

*Out of Order*

ACEs can function independently. That means that tasks can complete Out of Order and the ACE will track the tasks for correctness. This is a big deal. I've stressed it before and not many people have understood what it mean't. By allowing tasks to complete out of order, you can make more efficient use of the compute resources on tap my mitigating stalls (where some CUs stop working, waiting for other CUs to complete their work, due to dependencies).


*In Order*

ACEs can also synchronize, by communicating through the L2 Cache, Global Data Share or the Memory. The ACE will form task graphs, like nVIDIAs HyperQ solution does, and can operate in parallel, out of order, from within these graphs (synching with another ACE or part of the graphics pipe).


afaik, reading through the CUDA documentation, HyperQ does not have this flexibility. HyperQ is tied to "in order" ops according to the documentation.

This was part of the discussion I had over at HardOCP


----------



## GorillaSceptre

Razor1 from HardOCP:

"He has linked to Keplar's white paper for an example for Maxwell 2's Hyper Q's workflow, which Maxwell 2's Hyper Q's workflow is very different, because the individual units are different another mistake. Maxwell 2's HyperQ does work on grids, with child girds for compute if the parent grid needs a child grild, and does so in serial this is the SAME for GCN, within the grids is where ACE's and AWC's are "asynchronous", the reason for this is if they weren't you will have rendering issues and other problems. Think of this as a critical path problem if X to Y you have 4 operations ABCD, C and D first (C is a child set for A, and D is a different grid that needs to be used for B to complete), each A B C D can be done separately, but C must be done for A to complete and D must be done first for B to complete. With in each of the sets they can be done any which, so some parts of A can be done in concurrent with C as also B parts can be done as D is is being done. After C and D are done, then A and B are complete, then X can become Y with the results of A and B."

My brain!


----------



## Xuper

gah I'm confused!

Originally Posted by *Razor1* from HardOCPView



> Well executing in real time, GCN can only do 8+1 compute + graphics, from what I've seen from AMD's slide's, by the same measure Maxwell 2, can do 31+1 this is from the Cuda programming guide.


what does it mean ? I think he's wrong or because of this ?

Quote:


> There is another mistake with the *Execution of Compute and Graphics computations* vs *Execution in Queue*, those are two different things, the first is actually being done in realtime the second is storing for computation as ALU's get freed up. Maxwell 2 execution in queue is dependent on if there is register space and cache is available to store. Not sure about GCN if there is a variable amount do to the same limitation, but AMD stated a set amount of 64. Anandtech's table is correct in what they stated, not incorrect as the OP stated so any conclusions from that are incorrect.


CGN can Do What ? 8 Queues or something ? Is there any word except queue? Why does he say : executing in real time , CGn can do 8.What does mean "*8*"

can i say this : Queue = Task ? 8 means 8 queues or 8 tasks executing per cycle?

@Mahigan Said :



> GCN ACEs work in reverse of how AWSs work.
> 
> In Maxwell 2 the work distributor can prioritize 32 tasks, every cycle, which it then sends to the various AWSs. The AWSs then send the tasks to the available compute resources. You have 32 tasks executing per cycle.
> 
> In GCN 1.1 290 series/1.2 every cycle the 8 ACEs fetch tasks from cache, or memory, and create a work group of 8 tasks each (so it receives 64 tasks per cycle). Then the ACEs send the tasks off to any available compute units. You have 64 tasks executing per cycle.
> 
> An ACE is like a Hyperthreaded CPU Core.


and he replied that "Interesting, I will look into that tomorrow its getting late for me here". well perhaps he didn't know ?


----------



## airfathaaaaa

ive been lurking around here for some years now (and others forums as well) trying to find solutions as to why amd sucked so hard on dx11 and especially in certain games (i blamed mostly gameworks by default on some games)

but then when i heard about mantle and dx12 and this got me thinking quite a lot

i started to dig up perfomance benches etc etc from 2010 till 2013 and i realised something
after q2 2012 amd failed to keep up on a drivers level with nvidia (when i mean "keep up'" i mean that their drivers was really just a glow and nothing more with some drivers for cgn being the exception ) and i thought that maybe because they have a relatively small drivers team they focused on xbox back then and at some point after that on mantle

BUT

how possible it is for amd to intentionally stopped working on dx11 drivers and focused 100% on mantle behind the scenes throwing baits on devs to accept it and thus forcing ms to make dx12 possible for pc's?(cause i imagine that this api at some point was only for the xbox one i can totally see this...)

and also i really wanna know is it possible with dx12 to render tressfx and gameworks obsolete? and give the dev the responsibility to create a beautifull and optimised game without throwing useless out of screen renders?


----------



## umeng2002

The level of conspiracy that would require doesn't exist.

nVidia simply had the money to optimize their drivers to recompile shaders to run better on their GPUs (they got source code from games in development and rewrote shaders to run better on their GPUs - that kind of cooperation costs money - which is also what paid for their logo being shown at the beginning of a lot of games). AMD doesn't have the money to throw at devs to optimize for their GPUs. But their architecture was better designed for parallelism. And the APIs weren't there to really use it, so they made Mantle, which really lit a fire in the industry to switch to a multi-threaded/ asynchronous API that can really take advantage of more than one or two threads and can allow the GPU to work on more than one frame at a time to avoid wasting time idling sections of the GPU.

Also, tressfx and gameworks are essentially obsolete since they were made. Those things could have been done with just normal dx11 with DirectCompute. But AMD and nVidia simply made software and sold it to developers so they wouldn't have to spend more man power making their own solutions.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Razor1 from HardOCP:
> 
> "He has linked to Keplar's white paper for an example for Maxwell 2's Hyper Q's workflow, which Maxwell 2's Hyper Q's workflow is very different, because the individual units are different another mistake. Maxwell 2's HyperQ does work on grids, with child girds for compute if the parent grid needs a child grild, and does so in serial this is the SAME for GCN, within the grids is where ACE's and AWC's are "asynchronous", the reason for this is if they weren't you will have rendering issues and other problems. Think of this as a critical path problem if X to Y you have 4 operations ABCD, C and D first (C is a child set for A, and D is a different grid that needs to be used for B to complete), each A B C D can be done separately, but C must be done for A to complete and D must be done first for B to complete. With in each of the sets they can be done any which, so some parts of A can be done in concurrent with C as also B parts can be done as D is is being done. After C and D are done, then A and B are complete, then X can become Y with the results of A and B."
> 
> My brain!


Well,

What he's saying, in laymen terms, "if an operation requires the result of another operation, then it must be paused. Giving time for the other operation to complete". He is arguing that like Asynchronous Warp Schedulers, ACEs operate "in order", or there would be rendering errors etc.

ACEs reside outside of the Shader Engines, this flexibility allows them to communicate with all the various elements, within the GPU such as the CUs, Rasterizers, ROps etc, by communicating through the R/W L2 Cache, Global Data Share and the GPU Memory (GDDR5/HBM). There is less of a bandwidth constraint in doing so because of the various communication mediums. In doing so, ACEs can check for errors and correct as necessary (re-issue). The AWS's, in Maxwell 2, can use the L2 Cache to communicate (one communication medium thus placing bandwidth constraints) with the other SMMs, but it does so in order to maintain coherency (sync) dependent workloads. Therefore AWS's can operate asynchronously (in parallel), when *working on the same work load (large work loads which are split amongst *themselves), but stall (pause) if the *work one AWS is working on requires a result from the *work being worked on by another AWS. From what I'm reading, AWS's do not have the capacity to complete work "out of order" and then check for errors and correct as necessary.

*_Now evidently the AWSs and the ACEs aren't the units doing the work. They assign the work to compute cores and keep an eye on the compute cores until the compute cores are done their work. I just wanted to simplify the paragraph above._


----------



## SpeedyVT

Quote:


> Originally Posted by *airfathaaaaa*
> 
> ive been lurking around here for some years now (and others forums as well) trying to find solutions as to why amd sucked so hard on dx11 and especially in certain games (i blamed mostly gameworks by default on some games)
> 
> but then when i heard about mantle and dx12 and this got me thinking quite a lot
> 
> i started to dig up perfomance benches etc etc from 2010 till 2013 and i realised something
> after q2 2012 amd failed to keep up on a drivers level with nvidia (when i mean "keep up'" i mean that their drivers was really just a glow and nothing more with some drivers for cgn being the exception ) and i thought that maybe because they have a relatively small drivers team they focused on xbox back then and at some point after that on mantle
> 
> BUT
> 
> how possible it is for amd to intentionally stopped working on dx11 drivers and focused 100% on mantle behind the scenes throwing baits on devs to accept it and thus forcing ms to make dx12 possible for pc's?(cause i imagine that this api at some point was only for the xbox one i can totally see this...)
> 
> and also i really wanna know is it possible with dx12 to render tressfx and gameworks obsolete? and give the dev the responsibility to create a beautifull and optimised game without throwing useless out of screen renders?


Quote:


> Originally Posted by *umeng2002*
> 
> The level of conspiracy that would require doesn't exist.
> 
> nVidia simply had the money to optimize their drivers to recompile shaders to run better on their GPUs (they got source code from games in development and rewrote shaders to run better on their GPUs - that kind of cooperation costs money - which is also what paid for their logo being shown at the beginning of a lot of games). AMD doesn't have the money to throw at devs to optimize for their GPUs. But their architecture was better designed for parallelism. And the APIs weren't there to really use it, so they made Mantle, which really lit a fire in the industry to switch to a multi-threaded/ asynchronous API that can really take advantage of more than one or two threads and can allow the GPU to work on more than one frame at a time to avoid wasting time idling sections of the GPU.
> 
> Also, tressfx and gameworks are essentially obsolete since they were made. Those things could have been done with just normal dx11 with DirectCompute. But AMD and nVidia simply made software and sold it to developers so they wouldn't have to spend more man power making their own solutions.


Possible AMD played the long game as building a very serialized design would only hurt them in the adoption of consoles and future hardware. So it's possible. Although unlikely. Best to wait and see if AMD takes claim to that.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Well,
> 
> What he's saying, in laymen terms, "if an operation requires the result of another operation, then it must be paused. Giving time for the other operation to complete". He is arguing that like Asynchronous Warp Schedulers, ACEs operate "in order", or there would be rendering errors etc.
> 
> ACEs reside outside of the Shader Engines, this flexibility allows them to communicate with all the various elements, within the GPU such as the CUs, Rasterizers, ROps etc, by communicating through the R/W L2 Cache, Global Data Share and the GPU Memory (GDDR5/HBM). In doing so, ACEs can check for errors and correct as necessary. The AWS's, in Maxwell 2, can use the L2 Cache to communicate with the other SMMs, but it does so in order to maintain coherency (sync) dependent workloads. Therefore AWS's can operate asynchronously (in parallel), when working on the same work load (large work loads which are split amongst themselves), but stall (pause) if the work one AWS is working on requires a result from the work being worked on by another AWS. From what I'm reading, AWS's do not have the capacity to submit or complete work "out of order" and then check for errors and correct as necessary.


Thanks









Yeah, i finally got what he was saying after i read it about 20 times


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> what does it mean ? I think he's wrong or because of this ?


He means that on a per clock basis, the Asynchronous Warp Schedules can execute (assign) work to the compute units. Since there are 4 AWS's per SMM (8 total SMMs) then on a per clock basis you have 32 compute executions for pure compute workloads (or 31 Compute executions and 1 Graphics execution for gaming work loads).

Quote:


> CGN can Do What ? 8 Queues or something ? Is there any word except queue? Why does he say : executing in real time , CGn can do 8.What does mean "*8*"


Queues are pending work tasks. Execution in real time is how many tasks are executed per cycle. The Anandtech article confused everyone essentially.
Quote:


> can i say this : Queue = Task ? 8 means 8 queues or 8 tasks executing per cycle?


Not exactly. Queues are pending tasks (to be executed when resources are available). What is executed in real time varies on the amount of AWSs and ACEs.
Quote:


> @Mahigan Said :
> 
> and he replied that "Interesting, I will look into that tomorrow its getting late for me here". well perhaps he didn't know ?


There are things we both did not know. GCN and Maxwell 2 are very complicated architectures to grasp.


----------



## Mahigan

The difference can be summed up like this:

*AMD GCN Asynchronous Compute Solution*:

Each ACE fetches commands from R/W L2 Cache, Global Data Share Cache and On board memory (GDDR5/HBM) and form task queues
64 Total task queues can be formed by the ACEs (prioritizing pending tasks to be executed).
8 Compute tasks/cycle and 1 Graphics task/cycle are executed in real time
Each ACE can operate independently and complete tasks out of order
Each ACE can synchronize tasks with other ACEs and the Graphics Command Processor in order to operate "in order" but also in parallel
ACEs can communicate through the R/W L2 Cache, Global Data Share Cache and On board memory (GDDR5/HBM) allowing for an incredible amount of memory bandwidth

*nVIDIA HyperQ*:

Grid Management Unit receives thousands of pending tasks
Grid Management Unit sends 32 tasks to the work distributor (32 compute tasks or 31 compute, 1 Graphics task)
Work Distributor assigns 32 tasks across 32 AWS's.
32 Compute tasks/cycle or 31 Compute tasks/cycle and 1 Graphics task/cycle are executed in real time
Each AWS cannot operate independently to complete tasks out of order
Each AWS must synchronize tasks with other ACWs and complete these tasks "in order" but also operate in parallel
If one task is dependent on another task, HyperQ receives a pipeline stall (pause)
AWS's can communicate though a single medium, the L2 Cache, which is limited in memory bandwidth as it is shared with all elements of the Graphics and Compute pipeline
Therefore the fact that HyperQ can execute more tasks per clock is negated by the fact that it is limited in memory bandwidth and because HyperQ must complete these tasks in order, it is prone to pipeline stalls. nVIDIA HyperQ lacks the flexibility of AMD Asynchronous Compute solution.This is likely the cause of the AotS performance figures we are seeing.

This also translates into more latency, for HyperQ, in VR when compared to AMDs LiquidVR solution.

And what is AMD pushing? Asynchronous Compute and LiquidVR. They're aware of their advantages in these areas.


----------



## Kpjoslee

Thanks for your analysis as always @Mahigan, made me more interested in looking deeper into how GCN works. Razor1 in hardforum actually linked to entire AMD GCN slide you provided him so I will probably look at that both the slide and Cuda documentation to educate myself how they differ, after I am done with work.


----------



## MegaBoyX7

2. 64 Total task queues can be formed by
the ACEs (prioritizing pending tasks to be
executed).

Shouldn't it be ACEs can queue total of 64 tasks (prioritizing pending tasks to be executed).

As ACEs have queue depth of 8, that would mean that each can queue 8 tasks, not 8 task queues. I'm not that good with English so maybe I understood it wrongly.


----------



## Klocek001

There's one thing that strikes me just now, although it should've stricken me on the 1st time I saw these charts. I know it must have been a serious discussion in some previous posts but I just don't have time to read 110 pages.
Why is everyone cheering AMD so much if 390X DX12 still loses to 980 DX11 in majority of those charts ?


----------



## Kpjoslee

Quote:


> Originally Posted by *Klocek001*
> 
> There's one thing that strikes me just now, although it should've stricken me on the 1st time I saw these charts. I know it must have been a serious discussion in some previous posts but I just don't have time to read 110 pages.
> Why is everyone cheering AMD so much if 390X DX12 still loses to 980 DX11 in majority of those charts ?


Because it is finally allowing GCN gpus the way it was intended to perform, as it is no longer bound by CPU overhead problem that severely limited their DX11 driver. For DX11, Nvidia's driver seems very well optimized so that is probably the reason why we are seeing good numbers in DX11. For DX12, it could be either how the code is optimized for Nvidia's architecture or problem on Nvidia's DX12 driver, or little bit of both. I don't think their DX12 performance should fall behind DX11's if the code and the driver is not the problem.
It is the game that is still work in progress. I think there is room for improvement on DX12 performance on both sides. It is hard to draw anything conclusive at this point.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> The difference can be summed up like this:
> 
> *AMD GCN Asynchronous Compute Solution*:
> 
> Each ACE fetches commands from R/W L2 Cache, Global Data Share Cache and On board memory (GDDR5/HBM) and form task queues
> 64 Total task queues can be formed by the ACEs (prioritizing pending tasks to be executed).
> 8 Compute tasks/cycle and 1 Graphics task/cycle are executed in real time
> Each ACE can operate independently and complete tasks out of order
> Each ACE can synchronize tasks with other ACEs and the Graphics Command Processor in order to operate "in order" but also in parallel
> ACEs can communicate through the R/W L2 Cache, Global Data Share Cache and On board memory (GDDR5/HBM) allowing for an incredible amount of memory bandwidth
> 
> *nVIDIA HyperQ*:
> 
> Grid Management Unit receives thousands of pending tasks
> Grid Management Unit sends 32 tasks to the work distributor (32 compute tasks or 31 compute, 1 Graphics task)
> Work Distributor assigns 32 tasks across 32 AWS's.
> 32 Compute tasks/cycle or 31 Compute tasks/cycle and 1 Graphics task/cycle are executed in real time
> Each AWS cannot operate independently to complete tasks out of order
> Each AWS must synchronize tasks with other ACWs and complete these tasks "in order" but also operate in parallel
> If one task is dependent on another task, HyperQ receives a pipeline stall (pause)
> AWS's can communicate though a single medium, the L2 Cache, which is limited in memory bandwidth as it is shared with all elements of the Graphics and Compute pipeline


Doesn't that actually show that Maxwell 2 is more capable than GCN?









Edit:

Just ignore that


----------



## Themisseble

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Doesn't that actually show that Maxwell 2 is more capable than GCN?


Nope


----------



## Klocek001

Quote:


> Originally Posted by *Kpjoslee*
> 
> Because it is finally allowing GCN gpus the way it was intended to perform, as it is no longer bound by CPU overhead problem that severely limited their DX11 driver. For DX11, Nvidia's driver seems very well optimized so that is probably the reason why we are seeing good numbers in DX11. For DX12, it could be either how the code is optimized for Nvidia's architecture or problem on Nvidia's DX12 driver, or little bit of both. I don't think their DX12 performance should fall behind DX11's if the code and the driver is not the problem.
> It is the game that is still work in progress. I think there is room for improvement on DX12 performance on both sides. It is hard to draw anything conclusive at this point.


yeah but I don't see people buryying nvidia if it still performs better on old api. people quoting 60% performace gain for amd are obviously confused. this 60% gain still makes 390X fall behind 980 running on an old api.


----------



## airfathaaaaa

so let me get this straight

ace engines can actually work on the "Background" in a way to prebuff something so when it get asked to do so it will provide it with not latency at all?
but nvidia one in the same position they will have to complete the tasks till one engine gets free so that they could process the call?


----------



## mtcn77

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Doesn't that actually show that Maxwell 2 is more capable than GCN?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Edit:
> 
> Just ignore that


As much as a 32 core cpu with no cache structure makes competition to a 8 core 8 register per core cpu.


----------



## GorillaSceptre

@Mahigan Haven't seen you respond to this?

From razor1.

"Slide 56 tells you what it can do 9 devices, 8+1 compute and graphics respectively and in queue 64+

yeah it has the same limitations as Maxwell 2 when it comes to register pressure slide 34

Quote:
However, the new architecture is more susceptible to: register pressure
using too many registers with a shader can reduce the maximum waves per SIMD

This is a direct quote from that slide

then it gives a table to clarify this

Maxwell 2's AWC's and AMD's ACE's are setup similarly, I think the slides talking about the cache which is like how CPU's cache works is what misled you. The reason for this is because the APU's would need this.

This slide deck should be available on line for everyone.

http://www.slideshare.net/DevCentral...e-by-layla-mah "

"Any case Fury X seems to come in at around the 980 ti performance wise in this early bench from AOS, so unless there is some magic sauce for the r290x, lets put this down to Fiji and maxwell 2 drivers aren't as mature in Dx12 yet. And AMD's Dx11 drivers were just forgotten about."


----------



## orlfman

outside of the DX12 benchmarks, how is the actually game its self? is it fun for a rts? is it as fun as supreme commander forge alliance was? is it worth $45-$50 for game value?

ever since EA butchered the command and conquer franchise, there really has been a lack of good RTS as of late. sins of a solar empire was fun but ashes reminds me of SC-FA which i enjoyed.


----------



## Themisseble

@Mahigan

Anything new about CPU performance?


----------



## semitope

Quote:


> Originally Posted by *orlfman*
> 
> outside of the DX12 benchmarks, how is the actually game its self? is it fun for a rts? is it as fun as supreme commander forge alliance was? is it worth $45-$50 for game value?
> 
> ever since EA butchered the command and conquer franchise, there really has been a lack of good RTS as of late. sins of a solar empire was fun but ashes reminds me of SC-FA which i enjoyed.


They are still working on the gameplay. They sorted out the engine first i think. Tech first gameplay next. Though both still happen at the same time on different levels.

My hope is they don't over do it with unit counts at the expense of fun. They could have units with automated ai controlled swarms if they want that. Beef up the graphics, use the draw calls top offer interesting dynamics

There's another rts to come in September i think. Similar to act of war game from way back. "Act of aggression" or something


----------



## Anna Torrent

Quote:


> Originally Posted by *Kpjoslee*
> 
> Because it is finally allowing GCN gpus the way it was intended to perform, as it is no longer bound by CPU overhead problem that severely limited their DX11 driver. For DX11, Nvidia's driver seems very well optimized so that is probably the reason why we are seeing good numbers in DX11. For DX12, it could be either how the code is optimized for Nvidia's architecture or problem on Nvidia's DX12 driver, or little bit of both. I don't think their DX12 performance should fall behind DX11's if the code and the driver is not the problem.
> It is the game that is still work in progress. I think there is room for improvement on DX12 performance on both sides. It is hard to draw anything conclusive at this point.


Just a note - it's not just the API overhead, it seems like AoS people used some of the GCN features that previously were not utilized. I guess some parallelism and/or some kind of utilizing the GPU better. API overhead itself does not explain some of the benchmarks of DX11 and DX12


----------



## Kpjoslee

Quote:


> Originally Posted by *Anna Torrent*
> 
> Just a note - it's not just the API overhead, it seems like AoS people used some of the GCN features that previously were not utilized. I guess some parallelism and/or some kind of utilizing the GPU better. API overhead itself does not explain some of the benchmarks of DX11 and DX12


Just stating DX12's biggest advantage. Of course AoS probably utilized GCN features since that is the benefits of DX12, more direct access to the hardware. AMD probably didn't bother to optimize DX11 driver for AoS since it looks like they are going all in with DX12, reason why their DX11 performance is very poor.
I am just saying well optimized DX12 code should enable it to perform better than using DX11 path, and currently, it is not showing on Nvidia GPUs. So I am guessing that either the code or the Nvidia's DX12 driver is the culprit, instead of being bound by pure hardware performance under DX12.

Nvidia has to cover a lot of ground with Fermi, Kepler, Maxwell 1 n 2 with DX12, since they are quite different in a way they do the parallel workload, while AMD only has to deal with GCN. so that could be the reason why they are behind on DX12 performance compared to AMD. Of course you can disregard my last comment if other DX12 games show different results.


----------



## Anna Torrent

Quote:


> Originally Posted by *Kpjoslee*
> 
> Just stating DX12's biggest advantage. Of course AoS probably utilized GCN features since that is the benefits of DX12, more direct access to the hardware. AMD probably didn't bother to optimize DX11 driver for AoS since it looks like they are going all in with DX12, reason why their DX11 performance is very poor.
> I am just saying well optimized DX12 code should enable it to perform better than using DX11 path, and currently, it is not showing on Nvidia GPUs. So I am guessing that either the code or the Nvidia's DX12 driver is the culprit, instead of being bound by pure hardware performance under DX12.
> 
> Nvidia has to cover a lot of ground with Fermi, Kepler, Maxwell 1 n 2 with DX12, since they are quite different in a way they do the parallel workload, while AMD only has to deal with GCN. so that could be the reason why they are behind on DX12 performance compared to AMD. Of course you can disregard my last comment if other DX12 games show different results.


I'm on your side buddy!
Didn't think about the last point - true. About the previous points - my conclusions also. This seem to be vastly unoptimized scenario

BTW, benchmarks show that the 980 Ti 6GB version actually has a nice advantage over the Fury X / 290X in DX12, unlike the 4GB version. About 25% more. I think it was pcgameshardware.de or something


----------



## Forceman

Quote:


> Originally Posted by *Anna Torrent*
> 
> BTW, benchmarks show that the 980 Ti 6GB version actually has a nice advantage over the Fury X / 290X in DX12, *unlike the 4GB version*. About 25% more. I think it was pcgameshardware.de or something


?

What 4GB 980 Ti? You sure you weren't looking at a 980?


----------



## Anna Torrent

Quote:


> Originally Posted by *Forceman*
> 
> ?
> 
> What 4GB 980 Ti? You sure you weren't looking at a 980?


Oh, confused! yea, it is a GTX 980, non-TI.
But according to this review, the 980 Ti pulls ahead, while the 980 equals the 290/390 in the AoS benchmark in DX12
It confuses me, because according to the Arstechnica, the 980 Ti and 290X are almost the same in DX12 mode - link

Maybe a TAA difference?


----------



## Kpjoslee

Quote:


> Originally Posted by *Anna Torrent*
> 
> Oh, confused! yea, it is a GTX 980, non-TI.
> But according to this review, the 980 Ti pulls ahead, while the 980 equals the 290/390 in the AoS benchmark in DX12
> It confuses me, because according to the Arstechnica, the 980 Ti and 290X are almost the same in DX12 mode - link
> 
> Maybe a TAA difference?


Oh, they did another benchmark under different settings without MSAA and certainly shows the different results. Let me link it again

http://www.pcgameshardware.de/Ashes-of-the-Singularity-Spiel-55338/Specials/Benchmark-DirectX-12-DirectX-11-1167997/

It shows DX12 performance improves on 980ti and Titan X while taking a little step back on 980 and under on Nvidia side. AMD sides still making expected jump from DX11 to DX12.
Yeah, it is better wait until the game is complete, since it is hard to draw anything conclusive at this point.

*this is not a updated benchmark from new version or different driver, they just used a different settings from the first benchmark they did. First Benchmark is linked on the first post.


----------



## umeng2002

AMD hardware design has always felt limitations with Software not being friendly to their CPUs and GPUs. Look at the ACE engine in GCN, it needed dx12/ mantle/ vulkan to really work. Look at the issue with process scheduling in Windows with the FX chips. As I've said before, AMD didn't/ doesn't have the money to though around the software industry to get them to support all their new features.

I'd bet MOST dx12 games other than RTS won't even really take advantage of the ACE's advantage over nVidia Warp Scheduler (or what ever it's called) since MOST gamers are still using nVidia GPUs.

I would look to the 2nd or 3rd round of DX12/ Vulkan games before they really push a game design to were it needs something like AMD's ACE. By that time, nVidia would have "caught up" in that department, though.


----------



## HalGameGuru

I'm not so sure about that, as Mahigan and a few others have made mention of, the asynchronous shaders offer a lot of other benefits in other areas, where you had to do a lot of acrobatics to get ambient occlusion and decent shadows DX12 and parallel computation will allow for better quality shadows and lighting with less overhead and more detail.

I think this may wind up being in quite a few genres. FPS, adventure, immersive RPGs, and ESPECIALLY VR. It may not saturate quite as much as an RTS with bespoke light and shadow sources, but that tech will have performance benefits in many instances.


----------



## umeng2002

Quote:


> Originally Posted by *HalGameGuru*
> 
> I'm not so sure about that, as Mahigan and a few others have made mention of, the asynchronous shaders offer a lot of other benefits in other areas, where you had to do a lot of acrobatics to get ambient occlusion and decent shadows DX12 and parallel computation will allow for better quality shadows and lighting with less overhead and more detail.
> 
> I think this may wind up being in quite a few genres. FPS, adventure, immersive RPGs, and ESPECIALLY VR. It may not saturate quite as much as an RTS with bespoke light and shadow sources, but that tech will have performance benefits in many instances.


What I meant was push the ACE and Warp Scheduler as hard, like nVidia $elling devs to not put too much of that in games until their new GPU launches.


----------



## HalGameGuru

I suppose its POSSIBLE you could bribe a dev into making their own product look like schmutz, but I'd like to think most WANT to make their games look as good as possible, although I'd probably PREFER if they focused a bit more on making them PLAY as well as possible and maybe a little more effort in story or mechanics. But, I digress. I don't doubt there may be some pressure to tone down DX12 exuberance until their arch better supports it but I don't think nVidia would want that kind of thing coming to light. I'd wager more of a focus on guaranteeing "backwards compatibility" making sure DX11 remains concurrently supported until their new hardware drops.


----------



## umeng2002

It's not bribes really. nVidia simply offers to pay a dev X amount of money so they design their engine to run well on their GPUs and not take advantage of AMD's architecture and then they also put the nVidia logo on the start menu.

It's like saying Nintendo bribes Retro to make Nintendo games, it's not illegal, just the way the industry can work.

This is probably why the Ashes of Singularity demo favored AMD, the devs have been working closely with the GCN design and took advantage of it even if it meant nVidia's GPUs could handle it the way they designed it... or it could just be bad driver's on nVidia's part.

It's still up in the air, tbh.


----------



## HalGameGuru

Yeah but in this case the API, by default, favors one over the other at the moment simply due to differences in design philosophy. I doubt nVidia could convince any dev to ****** their game's development in DX12 nearly as easily as simply convincing them to maintain DX11 support concurrently in the hardware interim.


----------



## Serandur

Quote:


> Originally Posted by *Kpjoslee*
> 
> Oh, they did a updated benchmark and certainly shows the different results. Let me link it again
> 
> http://www.pcgameshardware.de/Ashes-of-the-Singularity-Spiel-55338/Specials/Benchmark-DirectX-12-DirectX-11-1167997/
> 
> It shows DX12 performance improves on 980ti and Titan X while taking a little step back on 980 and under on Nvidia side. AMD sides still making expected jump from DX11 to DX12.
> Yeah, it is better wait until the game is complete, since it is hard to draw anything conclusive at this point.


Screenshots:

1920x1080



3840x2160


----------



## Xuper

Allright , Now It's faster than Fury X! Nvidia Said :

Quote:


> *NVIDIA: We Don't Believe AotS Benchmark To Be A Good Indicator Of DX12 Performance*
> 
> http://wccftech.com/nvidia-we-dont-believe-aots-benchmark-a-good-indicator-of-dx12-performance/


and New benchmark.... Please Nvidia....


----------



## Klocek001

"AMD sides still making expected jump from DX11 to DX12."
this is such BS. multiply 20 frames by 1.5x and you're still behind the green team in dx12.


----------



## ku4eto

Quote:


> Originally Posted by *Klocek001*
> 
> "AMD sides still making expected jump from DX11 to DX12."
> this is such BS. multiply 20 frames by 1.5x and you're still behind the green team in dx12.


Your comment contributed so much to this thread... Really, a 290x outperforms a 780 Ti. And where does the green team leads ? Only with 980 Ti. The Titan X is behind Fury X., where one costs 1,000$ +, and the other one is ~700$. Oh and the Titan X is 12GB, while the Fury X is 4GB (even although is HBM).


----------



## Wishmaker

Quote:


> Originally Posted by *ku4eto*
> 
> Your comment contributed so much to this thread... Really, a 290x outperforms a 780 Ti. And where does the green team leads ? Only with 980 Ti. The Titan X is behind Fury X., where one costs 1,000$ +, and the other one is ~700$. Oh and the Titan X is 12GB, while the Fury X is 4GB (even although is HBM).


So all of a sudden, after one test, new tech is better than old one (Fury X vs Titan X). When DX 11 tests were out, guys new tech, needs time to mature. *rofl*


----------



## Anna Torrent

Quote:


> Originally Posted by *Wishmaker*
> 
> So all of a sudden, after one test, new tech is better than old one (Fury X vs Titan X). When DX 11 tests were out, guys new tech, needs time to mature. *rofl*


A lot of marketing ofcourse, and a lot of bored people like me, who try to make DX12 benchmarks to feel their emptiness. The consumption based society is really depressing
But really, that's quite annoying that you don't know nothing. AMD and NV could tell us what's going on, get us into the tech a little bit. AMD had GPUs that were good for APIs like DX12 4 years ago? why weren't they saying that?


----------



## sinholueiro

This benchmarks don't use a TXAA as AA? Isn't this AA a Nvidia developed AA?


----------



## Assirra

Quote:


> Originally Posted by *sinholueiro*
> 
> This benchmarks don't use a TXAA as AA? Isn't this AA a Nvidia developed AA?


From what i can see here it uses MSAA and the extreme tech article actually saids that is in the advantage of AMD.


----------



## Devnant

Quote:


> Originally Posted by *Serandur*
> 
> Screenshots:
> 
> 1920x1080
> 
> 
> 
> 3840x2160


Now THIS starts making more sense. But are you sure this is an updated benchmark? MSAA is OFF.


----------



## Xuper

Gotcha!

3840x2160 , TAA ( Temporal Anti Aliasing ) , DirectX 12 ( average FPS)

So MSAA is off.Where is that Updated Benchmark ? ? I think perhaps previous Benchmark couldn't use TAA?

Update :

Here : Integrated Benchmark, MSAA off, TAA High, Point Lights High, Shading Samples High, Shadow Quality High, Texture Quality High.

Quote:


> *Updated on 08/20/15:*
> 
> The little mudslinging between NVIDIA and Stardock or oxides goes even further. While Nvidia has accused the developers to have implemented the MSAA antialiasing with GeForce GPUs flawed, it is one, according to Stardock CEO Brad Wardell-alias "Frogboy" acting bug in Nvidia's own driver. According to him, one could neglect the at the benchmark results because of the small differences, however - our results should be as stated correctly. Ultimately, it is a new API and accordingly the driver still contain errors at the beginning. Over time, the whole thing should be mature.


I thought Updated Bench meant they Fixed MSAA.Please do not confuse ppl! There is No updated bench.


----------



## ku4eto

Quote:


> Originally Posted by *Assirra*
> 
> From what i can see here it uses MSAA and the extreme tech article actually saids that is in the advantage of AMD.


From what i can actually see on those screenshots, MSAA is not enabled. If it is enabled, i wonder if nVidia results will be the same (the famous "MSAA bug").
So i guess this "new" bench is just with the same engine version, disabled MSAA, and new nVidia drivers specifically for this alpha game ?


----------



## CasualCat

Quote:


> Originally Posted by *Kpjoslee*
> 
> Oh, they did a updated benchmark and certainly shows the different results. Let me link it again
> 
> http://www.pcgameshardware.de/Ashes-of-the-Singularity-Spiel-55338/Specials/Benchmark-DirectX-12-DirectX-11-1167997/
> 
> It shows DX12 performance improves on 980ti and Titan X while taking a little step back on 980 and under on Nvidia side. AMD sides still making expected jump from DX11 to DX12.
> Yeah, it is better wait until the game is complete, since it is hard to draw anything conclusive at this point.


One thing I wish I'd see if they're going to have factory OC'd cards, that's fine, but if they're also comparing them to cards running stock (FuryX and TitanX ), they should probably also have stock versions of the OC'd cards.

edit: and where is the 1440p bench









edit2: Separately I kinda wish Oxide would release a stand alone benchmark without the game, but I suspect they're hoping to get some sales from people just wanting the benchmark regardless of how well the game itself does.


----------



## Kpjoslee

Quote:


> Originally Posted by *Klocek001*
> 
> "AMD sides still making expected jump from DX11 to DX12."
> this is such BS. multiply 20 frames by 1.5x and you're still behind the green team in dx12.


Still showing they vastly improved performance making a jump from DX11 to DX12 so I don't see a problem with my statement there.
Quote:


> Originally Posted by *Xuper*
> 
> Gotcha!
> 
> 3840x2160 , TAA ( Temporal Anti Aliasing ) , DirectX 12 ( average FPS)
> 
> So MSAA is off.Where is that Updated Benchmark ? ? I think perhaps previous Benchmark couldn't use TAA?
> 
> Update :
> 
> Here : Integrated Benchmark, MSAA off, TAA High, Point Lights High, Shading Samples High, Shadow Quality High, Texture Quality High.
> 
> I thought Updated Bench meant they Fixed MSAA.Please do not confuse ppl! There is No updated bench.


I will revise the words if that creates confusion lol. I meant to say that they updated their benchmark results under different settings from what they did last time.
Quote:


> Originally Posted by *CasualCat*
> 
> One thing I wish I'd see if they're going to have factory OC'd cards, that's fine, but if they're also comparing them to cards running stock (FuryX and TitanX ), they should probably also have stock versions of the OC'd cards.
> 
> edit: and where is the 1440p bench
> 
> 
> 
> 
> 
> 
> 
> 
> 
> edit2: Separately I kinda wish Oxide would release a stand alone benchmark without the game, but I suspect they're hoping to get some sales from people just wanting the benchmark regardless of how well the game itself does.


I think there are only reference versions available for Titan X and Fury X, since they used all non-ref factory overclocked cards other than those two, I don't think that would invalidate their results.


----------



## CasualCat

Quote:


> Originally Posted by *Kpjoslee*
> 
> I think there are only reference versions available for Titan X and Fury X, since they used all non-ref factory overclocked cards other than those two, I don't think that would invalidate their results.


I didn't say it was invalid, it'd just make a nicer apple to apples comparison. (Also there are factory over clocked TitanX models or at least there were.) Sure have some OC'd 980, 980Ti, 290X in there, but throw in stock clock ones too.


----------



## sinholueiro

Quote:


> Originally Posted by *Assirra*
> 
> From what i can see here it uses MSAA and the extreme tech article actually saids that is in the advantage of AMD.


Quote:


> Originally Posted by *Xuper*
> 
> Gotcha!
> 
> 3840x2160 , TAA ( Temporal Anti Aliasing ) , DirectX 12 ( average FPS)
> 
> So MSAA is off.Where is that Updated Benchmark ? ? I think perhaps previous Benchmark couldn't use TAA?
> 
> Update :
> 
> Here : Integrated Benchmark, MSAA off, TAA High, Point Lights High, Shading Samples High, Shadow Quality High, Texture Quality High.
> 
> I thought Updated Bench meant they Fixed MSAA.Please do not confuse ppl! There is No updated bench.


http://www.geforce.com/hardware/technology/txaa
TXAA is a technology created by Nvidia to apply AA. It's like gameworks. It is implemented in some games like Batman Arkham Origins and it performs worse in AMD. This is why I see a 980 better than a Fury, I think.


----------



## Kpjoslee

Quote:


> Originally Posted by *CasualCat*
> 
> I didn't say it was invalid, it'd just make a nicer apple to apples comparison. (Also there are factory over clocked TitanX models or at least there were.) Sure have some OC'd 980, 980Ti, 290X in there, but throw in stock clock ones too.


Oh yeah, there were some for Titan X. I completely forgot about them







. Yeah, it would have been much better if they also used a reference versions of them as well.


----------



## CasualCat

Quote:


> Originally Posted by *sinholueiro*
> 
> http://www.geforce.com/hardware/technology/txaa
> TXAA is a technology created by Nvidia to apply AA. It's like gameworks. It is implemented in some games like Batman Arkham Origins and it performs worse in AMD. This is why I see a 980 better than a Fury, I think.


Nvidia's TXAA doesn't run on AMD. Doesn't mean though that Oxide can't create and use their own temporal AA. CD Projekt RED did their own in Witcher 3 I believe.


----------



## Kpjoslee

Quote:


> Originally Posted by *sinholueiro*
> 
> http://www.geforce.com/hardware/technology/txaa
> TXAA is a technology created by Nvidia to apply AA. It's like gameworks. It is implemented in some games like Batman Arkham Origins and it performs worse in AMD. This is why I see a 980 better than a Fury, I think.


I think TAA in AoS is Oxides's own implementation. There are no mentions of TAA being problem on AMD's GPUs.


----------



## sinholueiro

So, my bad. Sorry for that


----------



## Klocek001

Quote:


> Originally Posted by *ku4eto*
> 
> Your comment contributed so much to this thread... Really, a 290x outperforms a 780 Ti. And where does the green team leads ? Only with 980 Ti. The Titan X is behind Fury X., where one costs 1,000$ +, and the other one is ~700$. Oh and the Titan X is 12GB, while the Fury X is 4GB (even although is HBM).


I meant 390x loses to 980 - where is that "massive gain from dx12"? I never meant to include Kepler in my statement, I didn't even see 780ti was there,but that's a good point you raised. Putting Titan X anywhere as a reference is just wrong. Why didn't you use aftermarket 980ti for this comparison? It's there in the test and doing better than Fury X. It's a bit faster,a bit cheaper and has a bit more OC headroom than Fury X.
There's 115 pages of total chaos here, but in the end it'll just be status quo:
980ti->Fury X- - ->980->390X even in DX12 and this whole discussion will be hundreds of pages about how things were to change but they haven't.


----------



## Mahigan

Quote:


> Originally Posted by *Serandur*
> 
> Screenshots:
> 
> 1920x1080
> 
> 
> 
> 3840x2160


I was waiting for someone to post this...

I will respond the same way I responded in a PM.

I figured out what pcgameshardware.de did. They disregarded Oxide's reviewer's guide.

1. They disabled one Post Processing effect (Glare). In other words they disabled a lot of Async shading.
2. They set the TAA to 12x (who uses 12x TAA??? that would blur the image like crazy and puts un-necessary strain on your GPU).
3. They increased terrain shading samples from 8 to 16 Million (putting extra strain on the rasterizers (small triangle issue), no extra image quality)
4. They only posted the result for the heavy batches (averaged them but all cards end up with a low FPS).

This would undoubtedly show the Fury-X faster than the 290x by a long shot. Better able to handle tessellation. As the test is no longer compute limited, you would see nVIDIA pulling away (though all the cards would be outputting unplayable frame rates).

The settings ArsTechnica (and everyone else used):


The settings pcgameshardware used:


The results from PCGamesHardware actually bolster the arguments I've made in this thread. When Post Processing effects are utilized (such as Glare and Lighting) you tap into the Compute Performance of the GPUs. In doing so, AMDs GCN has an advantage when using Asynchronous shading (DirectX12).

As for over tessellation:




TAA is considered to be obsolete because of its tendency to blur the image. You only need a small factor (6). You should mix it with MSAA in order to reduce the blurring effect. Anything higher and the image blurs up. TAA has one great effect, it can reduce aliasing when objects are moving around or the camera is moving around in a scene. This is why Oxide use it. Increasing the setting to a factor of 12 though is beyond crazy.

This is a negative side effect of TAA when it is overly used (blurring during a cut scene):


This is why you use TAA (moving camera while while reducing "jaggies"):


This is the effect of using a TAA factor higher than 6 (notice the blurring in the background?):


NNow that's why their results are so odd.

The kicker is that doing this has no discernible image quality advantages. Disabling Glare and boosting TAA to 12 actually drives down image quality. More tessellation leads to no discernible image quality improvements.

Therefore I can conclude one of two things either...

The person doing the benchmarking over at pcgameshardware didn't know what he was doing
or
Those benchmarks were tailored in order to derive a particular result and mis-inform readers.

This is why it is important to have informed readers and why conversations, such as the one we're having in this thread, are of utter importance.

When I left ATi, it was right at the time the x800 series introduced Temporal AA. This was back in 2004.

On the subject of GameWorks, I posted a response over at HardOCP on the matter. The perceived lack of support of Kepler, by nVIDIA, is not true. Kepler has enjoyed many driver updates. You can read below:
http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/70125-gtx-780-ti-vs-r9-290x-rematch.html

I share Joel Hruska opinion over at Extreme Tech on a particular matter which I believe is the culprit behind the perceived abandonment of Kepler by nVIDIA, it has to do with GameWorks:
Quote:


> Tessellation perf on Fury X is better than any previous AMD card.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Bear in mind two things:
> 
> 1). DX11 tessellation is actually terrible. Not terrible in GameWorks, not terrible on AMD or NV hardware, terrible, period. This is something I've been learning about over the past few weeks.
> 
> http://sebastiansylvan.com/2010/04/18/the-problem-with-tessellation-in-directx-11/
> 
> http://www.gdcvault.com/play/1020038/Advanced-Visual-Effects-with-DirectX
> 
> Tessellation bits on Page 9 and forward. It's hard to dig up concrete links on this because most discussions of it are either old articles on the launch of DX11 or discussion of tessellation implementations in specific games. Still, good reading there.
> 
> Regardless, I haven't seen Fury X taking a hit from tessellation in any non-GW title that would be out of band. It's a feature that always hits AMD cards a touch heavier than NV, but not in a dramatic fashion.
> 
> Regarding yield and scalability? I continue to hear that HBM yields are fine and that there are no issues there. Nonetheless, there's undoubtedly a bring-up period.
> 
> I am less concerned about GPU profitability because GPUs have never been very profitable for AMD. I once put together a chart of AMD's net profit on GPUs from 2006 - 2011:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Here's a simplified version of that chart from 2008 - 2011:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> AMD doesn't make much money on GPUs. It never has. So the chance that Fury X will change that is minimal. I once did these charts updated through 2012, before AMD changed its reporting, and the end impact was still the same. Even at the height of its popularity when HD 5000 was making hash of the GT200 family, the profits just weren't there.
> 
> I don't see any reason to think that's changed in the intervening years.


Quote:


> So let's talk about tessellation, because this is a good example of how a feature can get twisted.
> 
> *Nvidia's tessellation engine is much more powerful than AMD's, no question*. *Some GameWorks or NV-optimized titles use insane amounts of tessellation*. AMD says that its lower level of tessellation performance is a better fit for real-world titles. Meanwhile, some developers have said that software tessellation, which allows for custom implementations, is a much better method of using the feature than DX11 allows.
> 
> The only way to weigh whether or not tessellation is a critical impact for AMD is to compare game performance. It's kinda hard to find hard figures on this, but here's what I've come up with after some googling:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> vs
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> In Civ V, when it was new, the Radeon 5000 series lost 10% perf with tessellation enabled, while the GTX 480 lost 6%. Not game-breaking in either case.
> 
> http://www.tomshardware.com/reviews/deus-ex-human-revolution-performance-benchmark,3012-5.html
> 
> Tessellation was no problem in Human Revolution. We can't draw a single-point comparison because THG went from DX9 at lowest details to DX11 at medium details (including tessellation). But we can see that in this comparison, the GTX 550 Ti went from 59.3 FPS in DX9 to 62 FPS in DX11, while the 5770 went from 62 FPS to 51.5 FPS. To be sure, the GTX 550 looks better in this comparison, but the 5770 delivered a perfectly playable frame rate (minimum frame rate of 45 FPS).
> 
> http://www.tomshardware.com/reviews/radeon-hd-7970-benchmark-tahiti-gcn,3104-6.html
> 
> In Arkham City, turning tessellation on vs. off barely impacted performance at all on the HD 7970.
> 
> Final point: Look at Tessmark x32 performance from the GTX 680 launch:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> and compare against Fury X:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If we assume that the GTX 680 / GTX 770 still represent a midrange GPU (and I would argue that they do), then AMD's tessellation performance on Fury X remains competitive with what a midrange graphics card offers today. There is no doubt that NV has a huge theoretical advantage in this area. There is also no proof that less tessellation performance has meaningfully held AMD back. *More to the point, I have never seen a problem with tessellation in a game that wasn't a GW title since that program was initiated.*
> 
> Even in Arkham Origins, which has a *systemic* tilt towards NV (as shown by the fact that the R9 290X merely ties a GTX 770 as opposed to beating the pants off it), the impact of tessellation was small. From my own data:
> 
> Arkham Origins: Tessellation Enabled
> 
> GTX 770: 145 FPS
> R9 290X: 148 FPS
> 
> Tessellation Disabled:
> 
> GTX 770: 148 FPS
> R9 290X: 154 FPS
> 
> The GTX 770 paid a 2% penalty for tessellation in that game. The R9 290X paid a 4% penalty.
> 
> Tessellation performance isn't just "crap AMD pulls." The fact is, *NV's theoretical tessellation advantage only shows itself when huge amounts of tessellation are loaded into titles to tilt them in that direction*. Arkham Origins showed systemic bias, and yes, it used high levels of tessellation (the *total* benchmark was less skewed than the tessellation-sensitive section of the benchmark, because the R9 290X outperformed the GTX 770 in some shadow and lighting tests).
> 
> I'll speak to the rest in a separate post.


This is sort of what I assume we will continue to see, going into DirectX12, in GameWorks titles. The problem is that this "over the top" use of Tessellation, in GameWorks titles, affect nVIDIAs older hardware too. Therefore if you play a GameWorks title... you're going to come off with the impression that nVIDIA has abandoned Kepler.

The reason nVIDIA do this is quite clear. There is little use in denying a fact that is as clear as night and day. They use GameWorks as a means to play to their architectural advantages while hampering performance on their competitors products. The problem is, they end up hampering the performance of their older products in the process as they need to keep making GameWorks titles heavier in order to maintain their lead as AMD makes attempts to catch up with each new GPU release. There is no discernible visual advantage to using that degree of Tessellation. You'd be better off saving the GPU resources and concentrating on adding Post Processing effects, lighting and other in-game visuals which improve the overall cinematic effects of the game. The problem is that if you were to do this, you'd be playing to some of GCN's architectural advantages, this is especially true going into DirectX 12 as the Ashes of the Singularity highlights. People are surprised because they did not expect a card released back in Q4 2013 to be able to keep up with a GTX 980 Ti. In compute heavy workloads, which makes use of Asynchronous shading, you will undoubtedly see this result repeat time and time again. You can claim that Oxide is playing to AMDs architectural advantages, I believe it's the other way around. AMD has provided Oxide what they wanted in order to make the game they wanted to make. That is a new breed of RTS. Which is much more pleasing, visually, as well as in terms of the amount of units you can have, on the screen, without hampering performance down to single digit-land. The ability to use multiple light sources and to apply other post processing effects adds to the visual quality of a game. Gamers benefit from Post Processing effects. Ashe's of the Singularity also uses a good amount of hardware Tessellation in their engine. That's why the maps look clean. Oxide also make use of Shading and Terrain shading samples which play to nVIDIA's architectural advantages. Perhaps even explaining the results we see with the Fury-X to a degree in that title. Point being, if a game is compute heavy, by way of making use of Asynchronous shading, it may tilt the balance in AMD GCN's favor. We can call this a bias but the end result is a better cinematic gaming experience. As a gamer, you walk away having had a better experience.

Aside from my opinions regarding upcoming DirectX 12 titles, which people can ignore, the GameWorks information mentioned in this post has been well documented by various sources.


----------



## bvsbutthd101

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I was waiting for someone to post this...
> 
> I will respond the same way I responded in a PM.
> 
> I figured out what pcgameshardware.de did. They disregarded Oxide's reviewer's guide.
> 
> 1. They disabled Post Processing effects (Glare). In other words they disabled Async shading.
> 2. They set the Temporal AA to 12x (who uses 12x Temporal AA??? that would blur the image like crazy and puts un-necessary strain on your GPU).
> 3. They increased terrain shading samples from 8 to 16 Million (putting extra strain on the rasterizers as this is a tessellation factor of 16)
> 4. They only posted the result for the heavy batches.
> 
> This would undoubtedly show the Fury-X faster than the 290x by a long shot. As the test is no longer compute limited. Since Fury-X uses color compression, to improve ROp output, you'd see a big bump.
> 
> The only reason you would do this is to showcase one hardware manufacturer in a favorable light (nVIDIA). No other review site did this.
> 
> The settings ArsTechnica (and everyone else used):
> 
> 
> The settings pcgameshardware used:
> 
> 
> The results from PCGamesHardware actually bolster the arguments I've made in this thread. When Post Processing effects are utilized (such as Glare and Lighting) you tap into the Compute Performance of the GPUs. In doing so, AMDs GCN has an advantage when using Asynchronous shading (DirectX12).
> 
> As for over tessellation:
> 
> 
> 
> 
> Temporal AA (or TAA for short) is considered to be obsolete because of its tendency to blur the image. You only need a small factor (6). You should mix is with MSAA in order to reduce the blurring effect. Anything higher and the image blurs up when objects are moving around quickly. Increasing the setting to a factor of 12 is beyond crazy.
> 
> This is the effect of using a Temporal AA factor higher than 6:
> 
> 
> 
> 
> Now that's why their results are so odd.
> 
> The kicker is that doing this has no discernible image quality advantages. Disabling Glare and boosting Temporal AA to 12 actually drives down image quality.
> 
> Therefore I can conclude one of two things either...
> 
> The person doing the benchmarking over at pcgameshardware didn't know what he was doing
> or
> Those benchmarks were tailored in order to derive a particular result and mis-inform readers.
> 
> When I left ATi, it was right at the time the x800 series introduced Temporal AA. This was back in 2004.
> 
> On the subject of GameWorks, I posted a response over at HardOCP on the matter. The perceived lack of support of Kepler, by nVIDIA, is not true. Kepler has enjoyed many driver updates. You can read below:
> 
> This is sort of what I assume we will continue to see, going into DirectX12, in GameWorks titles. The problem is that this "over the top" use of Tessellation, in GameWorks titles, affect nVIDIAs older hardware too. Therefore if you play a GameWorks title... you're going to come off with the impression that nVIDIA has abandoned Kepler.
> 
> The reason nVIDIA do this is quite clear. There is little use in denying a fact that is as clear as night and day. They use GameWorks as a means to play to their architectural advantages while hampering performance on their competitors products. The problem is, they end up hampering the performance of their older products in the process as they need to keep making GameWorks titles heavier in order to maintain their lead as AMD makes attempts to catch up with each new GPU release. There is no discernible visual advantage to using that degree of Tessellation. You'd be better off saving the GPU resources and concentrating on adding Post Processing effects, lighting and other in-game visuals which improve the overall cinematic effects of the game. The problem is that if you were to do this, you'd be playing to some of GCN's architectural advantages, this is especially true going into DirectX 12 as the Ashes of the Singularity highlights. People are surprised because they did not expect a card released back in Q4 2013 to be able to keep up with a GTX 980 Ti. In compute heavy workloads, which makes use of Asynchronous shading, you will undoubtedly see this result repeat time and time again. You can claim that Oxide is playing to AMDs architectural advantages, I believe it's the other way around. AMD has provided Oxide what they wanted in order to make the game they wanted to make. That is a new breed of RTS. Which is much more pleasing, visually, as well as in terms of the amount of units you can have, on the screen, without hampering performance down to single digit-land. The ability to use multiple light sources and to apply other post processing effects adds to the visual quality of a game. Gamers benefit from Post Processing effects. Ashe's of the Singularity also uses a good amount of hardware Tessellation in their engine. That's why the maps look clean. Oxide also make use of Shading and Terrain shading samples which play to nVIDIA's architectural advantages. Perhaps even explaining the results we see with the Fury-X to a degree in that title. Point being, if a game is compute heavy, by way of making use of Asynchronous shading, it may tilt the balance in AMD GCN's favor. We can call this a bias but the end result is a better cinematic gaming experience. As a gamer, you walk away having had a better experience.
> 
> 
> 
> Aside from my opinions regarding upcoming DirectX 12 titles, which people can ignore, the GameWorks information mentioned in this post had been well documented by various sources.


Please use spoilers for images so your post doesn't take up the whole page. Thank you


----------



## mtcn77

Quote:


> Originally Posted by *bvsbutthd101*
> 
> Please use spoilers for images so your post does't take up the whole page. Thank you


Why don't you block him and be done with it?


----------



## bvsbutthd101

Quote:


> Originally Posted by *mtcn77*
> 
> Why don't you block him and be done with it?


Why would I? I didn't say I don't like reading his post. Some of us don't run high res monitors. Plus using spoilers on the images help reduce the clutter on each page.


----------



## ku4eto

Quote:


> Originally Posted by *Klocek001*
> 
> I meant 390x loses to 980 - where is that "massive gain from dx12"? I never meant to include Kepler in my statement, I didn't even see 780ti was there,but that's a good point you raised. Putting Titan X anywhere as a reference is just wrong. Why didn't you use aftermarket 980ti for this comparison? It's there in the test and doing better than Fury X. It's a bit faster,a bit cheaper and has a bit more OC headroom than Fury X.
> There's 115 pages of total chaos here, but in the end it'll just be status quo:
> 980ti->Fury X- - ->980->390X even in DX12 and this whole discussion will be hundreds of pages about how things were to change but they haven't.


Read the new Mahigan post, and think again. Also, as someone else said, why are there OC'ed GPU's next to stock ones ?!


----------



## Mahigan

By the end of this, people will think I work for AMD. I must stress that I do NOT work for AMD and I have absolutely no contact with anyone who does. What I am posting is entirely based on my concerns for where the PC Gaming industry is headed. I don't like what I am seeing with all the partisanship and mis-information circulating. I do not know everything, but I know enough to be highly concerned.

Initiatives, such as GameWorks, concern me. Why would we, as PC Gamers, accept a reduction in Cinematic effects in return for no discernible image quality improvements? It makes absolutely no sense.

We're sitting on massively parallel Graphics hardware. We should be able to have a return on our investment that is "honest". I think this makes sense if we set partisanship aside. It is both reasonable and rational to expect better use of our hardware by developers in order to derive better looking games.

We should be moving forward. Just as we were prior to all this marketing and partisanship arriving on the scene. We should make our Graphics card purchases based solely upon logical and rational decisions. This creates a market incentive which has the effect of pushing the Hardware and Software makers into a direction which serves our collective interests.

That being said,

Game on









As for using spoilers, I'll do so in the future. I quite honestly am not yet used to using all of the features on this forum. I apologize if my posts have inconvenienced you.


----------



## Themisseble

Little of topic but what is main difference between tress FX and Hairworks .. and how do they work?


----------



## ku4eto

Quote:


> Originally Posted by *Themisseble*
> 
> Little of topic but what is main difference between tress FX and Hairworks .. and how do they work?


TressFX is purely AMD, while HairWorks is purely nVidia.
It is about game optimization, how the game responds to certain GPU types/architectures. They lower the driver impact on that game, as the drivers are usually released to fix performance on game titles. Those 2 methods are pre-driver release, as the hardware manufacturers (AMD/nVidia) are working with the game developers in order to provide better performance (and eye candy) before the game release.


----------



## KyadCK

Quote:


> Originally Posted by *ku4eto*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> Little of topic but what is main difference between tress FX and Hairworks .. and how do they work?
> 
> 
> 
> *TressFX is purely AMD, while HairWorks is purely nVidia.*
> It is about game optimization, how the game responds to certain GPU types/architectures. They lower the driver impact on that game, as the drivers are usually released to fix performance on game titles. Those 2 methods are pre-driver release, as the hardware manufacturers (AMD/nVidia) are working with the game developers in order to provide better performance (and eye candy) before the game release.
Click to expand...

No and no.

TressFX is DirectCompute based.

Hairworks is Tessellation based.

Both work on both sides, and TressFX hits both teams for the same performance impact after a patch. Hairworks _would_ favor nVidia, but AMD's drivers have Tessellation control, so they can manually change the settings lower to lessen the impact while nVidia users previously (and possibly still) could not.


----------



## p4inkill3r

Quote:


> Originally Posted by *KyadCK*
> 
> No and no.
> 
> TressFX is DirectCompute based.
> 
> Hairworks is Tessellation based.
> 
> Both work on both sides, and TressFX hits both teams for the same performance impact after a patch. Hairworks _would_ favor nVidia, but AMD's drivers have Tessellation control, so they can manually change the settings lower to lessen the impact while nVidia users previously (and possibly still) could not.


Interestingly enough, [H] discusses some of the difference they see between TressFX and Hairworks in their latest Witcher 3 performance review.

http://www.hardocp.com/article/2015/08/25/witcher_3_wild_hunt_gameplay_performance_review/9#.VeDBnyVVgSU
Quote:


> Let's talk about NVIDIA HairWorks, and let's separate HairWorks from the greater overall debate on GameWorks itself. The discussion about the relevancy, fairness and business practices associated with GameWorks as a whole is a different topic. I want to focus in on the specific graphical feature called NVIDIA HairWorks as it is implemented in this game. HairWorks was the cause of a lot of controversy and debate.
> 
> The first thing to recognize about NVIDIA HairWorks is that this is a DX11 DirectCompute feature. Therefore, it is possible to run on NVIDIA and AMD hardware, there are no vendor lockouts like PhysX. Second, HairWorks is not the only hair technology that exists in the world. AMD offers a competitive feature called TressFX.
> 
> TressFX was used quite successfully in the 2013 game Tomb Raider. When comparing the image quality of HairWorks versus TressFX, Tomb Raider is often the game people cite as an example because it had a great implementation of TressFX. You can see a video demonstration of TressFX here. TressFX has improved a lot since then, with the current version now being TressFX 3.0. In fact, the new TressFX 3.0 will be used in Deus Ex: Mankind Divided. Now, naturally using GameWorks feature CD Projekt RED chose to use HairWorks.
> 
> All the controversy and allegations of unfair tessellation factors aside, we just need to boil it all down to the facts as these relate to performance and image quality. How does HairWorks perform? How does HairWorks look? Does the performance demand justify the visual improvement? So let's tackle each one now based on our testing we have done in this game.


----------



## ku4eto

Quote:


> Originally Posted by *KyadCK*
> 
> No and no.
> 
> TressFX is DirectCompute based.
> 
> Hairworks is Tessellation based.
> 
> Both work on both sides, and TressFX hits both teams for the same performance impact after a patch. Hairworks _would_ favor nVidia, but AMD's drivers have Tessellation control, so they can manually change the settings lower to lessen the impact while nVidia users previously (and possibly still) could not.


I meant that they are coming from AMD/nVidia, not that they are working only on AMD or nVidia. I don't think there are even any options that are not available to the other party (excluding Mantle and GeForce Experience stuff).
I have not played many games with TressFX, nor i did had the chance to test it on multiple GPU configurations, so i guess you are right for TressFX performance. As for HairWorks, i have only knowledge from Witcher 3 (not experience, knowledge).


----------



## KyadCK

Quote:


> Originally Posted by *ku4eto*
> 
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> No and no.
> 
> TressFX is DirectCompute based.
> 
> Hairworks is Tessellation based.
> 
> Both work on both sides, and TressFX hits both teams for the same performance impact after a patch. Hairworks _would_ favor nVidia, but AMD's drivers have Tessellation control, so they can manually change the settings lower to lessen the impact while nVidia users previously (and possibly still) could not.
> 
> 
> 
> I meant that they are coming from AMD/nVidia, not that they are working only on AMD or nVidia. I don't think there are even any options that are not available to the other party (excluding Mantle and GeForce Experience stuff).
> I have not played many games with TressFX, nor i did had the chance to test it on multiple GPU configurations, so i guess you are right for TressFX performance. As for HairWorks, i have only knowledge from Witcher 3 (not experience, knowledge).
Click to expand...

Fair enough.

Also yes there are. Worms Revolution for example, the version of AA they used is nVidia only. If you use AMD, you get jaggies, and the box is greyed out.


----------



## mtcn77

Gerbalt 1
Gerbalt 2

__
https://www.reddit.com/r/36jpe9/how_to_run_hairworks_on_amd_cards_without/


----------



## ku4eto

Quote:


> Originally Posted by *mtcn77*
> 
> Gerbalt 1
> Gerbalt 2
> 
> __
> https://www.reddit.com/r/36jpe9/how_to_run_hairworks_on_amd_cards_without/


Mmm this cracked me up, the comments are amazing. Also, the amount of difference is hardly noticeable between 4x and 8x, but between 8x and 16x cannot be seen (unless i am blind), although the multiplier is the same -x2, it doesn't seem worth it.

And one of the comments made much sense:
Quote:


> The tesselation options in CCC were born from games like Crysis 2 that used ridiculous tesselation that crippled AMD cards performance. So since AMD can't force developers to use reasonable amounts of tesselation, they instead used a driver hack. Nvidia has never really had a reason to have this setting since the games that use heavy tesselation are optimized to run well on Nv cards in the first place.


+
Quote:


> It's a rare example where defensive coding actually resulted in something useful?


I know that the nVidia control panel and AMD CCC both allow application override settings for 3D applications, but i did not bothered with them much, as i have not run into such killer settings due to old hardware.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> @Mahigan
> 
> Anything new about CPU performance?


What if it wasn't the AMD FX CPU. What if it was the PCI Express 2.0 bus being the bottleneck?

If someone has an Asus Sabertooth 990fxgen3 r2 motherboard... I'd be curious to see their results...

I say this because the CPU Frame rate is 42.8FPS here using an AMD FX-9370 @4.7 and an R9 290 @1050 core and 1350 memory. We know the R9 can do better. Why such a low frame rate.

System Specs:
AMD FX-9370 @4.7GHz
ASUS Sabertooth 990FX Rev 1.0
CMZ8GX3M2A1866C9 8GB Corsair Ram 9-10-9-27
Sapphire R9 290
Seagate 2TB HD
Seagate 3TB HD
S27A950D 27" 120 Hz 3D Samsung

Result:


Now we know an R9 290x can do 48.9FPS at these same settings:


Since there's so much information being transferred over the PCIe bus... 2.0 might be the bottleneck.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> @Mahigan
> 
> Anything new about CPU performance?
> 
> 
> 
> What if it wasn't the AMD FX CPU. What if it was the PCI Express 2.0 bus being the bottleneck?
> 
> If someone has an Asus Sabertooth 990fxgen3 r2 motherboard... I'd be curious to see their results...
Click to expand...

I don't think it will help there are several latency problems with the PCIE 3.0 controler of the sabertooth gen3.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> What if it wasn't the AMD FX CPU. What if it was the PCI Express 2.0 bus being the bottleneck?
> 
> If someone has an Asus Sabertooth 990fxgen3 r2 motherboard... I'd be curious to see their results...


PCIE 2.0 is fine for a single gpu


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> PCIE 2.0 is fine fpr a single gpu


Factor in the amount of draw calls increasing by the full use of Multi-Core CPUs sending traffic over the PCIe 2.0 bus in DX12.

Previous tests were conducted using DX11


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *PontiacGTX*
> 
> PCIE 2.0 is fine fpr a single gpu
> 
> 
> 
> 
> 
> Factor in the amount of draw calls increasing by the full use of Multi-Core CPUs sending traffic over the PCIe 2.0 bus though.
> 
> Previous tests were conducted using DX11
Click to expand...

So with sli or cfx configuration on z77 and above with only 16 pcie 3.0 lanes and 8 lane for gpu that equals pcie 2.0 16 lanes we will see similar bottleneck ???


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Factor in the amount of draw calls increasing by the full use of Multi-Core CPUs sending traffic over the PCIe 2.0 bus in DX12.
> 
> Previous tests were conducted using DX11


Should be easy to test since most Intel boards (edit: or at least the Gigabyte ones) let you change the PCIe speed. Switch it to gen 2 and test to see the difference - if draw calls are that big an impact even the Futuremark API bench should show it. I'd check it but I'm off to work.


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> I don't think it will help there are several latency problems with the PCIE 3.0 controler of the sabertooth gen3.


Hmm... well that doesn't help.

if I was in Canada I'd test it on my Intel rig with PCIe 2.0 vs PCIe 3.0.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Factor in the amount of draw calls increasing by the full use of Multi-Core CPUs sending traffic over the PCIe 2.0 bus in DX12.
> 
> Previous tests were conducted using DX11


- OneB1T wrote that


Spoiler: Warning: Spoiler!



Here is something:
Saw phenom x4/x6 doing better than FX CPUs - PCIE2, worst memory controller...
see this
[email protected]

==Shot long shot 3 ==================================
Total Time: 5.008700
Avg Framerate : 32.314194 ms (30.946156 FPS)
Weighted Framerate : 32.396713 ms (30.867331 FPS)
CPU frame rate (estimated framerate if not GPU bound): 24.420563 ms (40.949097 FPS)
Percent GPU Bound: 99.271950%
*Driver throughput (Batches per ms): 3960.520752*
Average Batches per frame: 47504.597656

==Shot high vista ==================================
Total Time: 4.970572
Avg Framerate : 35.759510 ms (27.964590 FPS)
Weighted Framerate : 35.872990 ms (27.876127 FPS)
CPU frame rate (estimated framerate if not GPU bound): 26.463301 ms (37.788181 FPS)
Percent GPU Bound: 99.365089%
*Driver throughput (Batches per ms): 3962.454102*
Average Batches per frame: 50500.582031

[email protected]

==Shot long shot 3 ==================================
Total Time: 4.985974
Avg Framerate : 33.462917 ms (29.883827 FPS)
Weighted Framerate : 33.534695 ms (29.819862 FPS)
CPU frame rate (estimated framerate if not GPU bound): 29.043478 ms (34.431137 FPS)
Percent GPU Bound: 93.482292%
*Driver throughput (Batches per ms): 5927.758789*
Average Batches per frame: 47757.062500

==Shot high vista ==================================
Total Time: 4.998882
Avg Framerate : 33.776230 ms (29.606619 FPS)
Weighted Framerate : 33.864346 ms (29.529583 FPS)
CPU frame rate (estimated framerate if not GPU bound): 28.254892 ms (35.392101 FPS)
Percent GPU Bound: 98.633514%
*Driver throughput (Batches per ms): 6525.858398*
Average Batches per frame: 49477.691406

see the batches per ms figure 4k for i5 vs 6k for FX



Diver throughput? 50% more for FX? WHY?
SO i7 5960X same 4K batches per ms... also on old AMD CPU (not FX line) was benchmark which showed 4K batches per ms.


----------



## KyadCK

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> @Mahigan
> 
> Anything new about CPU performance?
> 
> 
> 
> What if it wasn't the AMD FX CPU. What if it was the PCI Express 2.0 bus being the bottleneck?
> 
> If someone has an Asus Sabertooth 990fxgen3 r2 motherboard... I'd be curious to see their results...
> 
> I say this because the CPU Frame rate is 42.8FPS here using an AMD FX-9370 @4.7 and an R9 290 @1050 core and 1350 memory. We know the R9 can do better. Why such a low frame rate.
> 
> System Specs:
> AMD FX-9370 @4.7GHz
> ASUS Sabertooth 990FX Rev 1.0
> CMZ8GX3M2A1866C9 8GB Corsair Ram 9-10-9-27
> Sapphire R9 290
> Seagate 2TB HD
> Seagate 3TB HD
> S27A950D 27" 120 Hz 3D Samsung
> 
> Result:
> 
> 
> Now we know an R9 290x can do 48.9FPS at these same settings:
> 
> 
> Since there's so much information being transferred over the PCIe bus... 2.0 might be the bottleneck.
Click to expand...

HyperTransport. Add it to your list of things to read up on, even if it will go away with Zen. It's incredibly important to the current AM chipsets, PCI-e isn't the only thing standing in the way.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> Factor in the amount of draw calls increasing by the full use of Multi-Core CPUs sending traffic over the PCIe 2.0 bus in DX12.
> 
> Previous tests were conducted using DX11


if thats the issue mantle running on bf4 doesnt increases the draw calls aswell? and people gets noticeable improvement even on PCIE2.0


----------



## sugarhell

Quote:


> Originally Posted by *PontiacGTX*
> 
> if thats the issue mantle running on bf4 doesnt increases the draw calls aswell? and people gets noticeable improvement even on PCIE2.0
> it would be interesting to Test ocing HT on that bench


No it doesnt increase the draw calls. How can an API increase the draw calls?


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> if thats the issue mantle running on bf4 doesnt increases the draw calls aswell? and people gets noticeable improvement even on PCIE2.0
> it would be interesting to Test ocing HT on that bench


Check it out









Specs:
AMD FX-9370 @4.7GHz
ASUS Sabertooth 990FX Rev 1.0
CMZ8GX3M2A1866C9 8GB Corsair Ram 9-10-9-27
Sapphire R9 290 @ 1050MHz
Seagate 2TB HD
Seagate 3TB HD
S27A950D 27" 120 Hz 3D Samsung

Ars Technica Settings Result in this:


pcgameshardware Result in this:


PCGamesHardware

1. Did not perform the test in an objective manner.
or
2. Didn't know what they were doing.


----------



## sugarhell

Who the heck use 12 TAA duration? This is so blur. Its impractical and none use this huge amount


----------



## Themisseble

Quote:


> Originally Posted by *sugarhell*
> 
> Who the heck use 12 TAA duration? This is so blur. Its impractical and none use this huge amount


especially for strategy game... just res scaling or MSAA.

@ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> especially for strategy game... just res scaling or MSAA.
> 
> @ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.


I don't think it is CPU related.

PontiacGTX shared this link with me and I believe he is onto something: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-hypertransport-bus/4/
Quote:


> HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1 and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD processors):
> 1,800 MHz = 3,600 MT/s = 7,200 MB/s
> 2,000 MHz = 4,000 MT/s = 8,000 MB/s
> 2,400 MHz = 4,800 MT/s = 9,600 MB/s
> 2,600 MHz = 5,200 MT/s = 10,400 MB/s
> Sometimes you will see the MT/s numbers published as MHz, as already discussed.
> Socket AM2+ and AM3 processors and their companion chipsets, however, are limited to the 8,000 MB/s transfer rate. Only socket AM3+ CPUs and chipsets are capable of using all the speeds published above. Of course, all CPUs and chipsets are compatible with the lower transfer rates available.
> Keep in mind that socket AM2+ processors can still be installed on socket AM2 motherboards, however, their HyperTransport bus will be limited to HT2 speeds.
> Once again, the transfer rates announced by the HyperTransport consortium are highly exaggerated. They announce HyperTransport 3.0 as having a maximum transfer rate of 41.6 GB/s. To reach this number they considered 32-bit links (and not 16-bit links) and doubled the number found by two because there are two links available. The math used was 2,600 MHz x 32 x 2 / 8 x 2 links. As we have already explained, AMD processors use 16-bit links, not 32-bit ones, and we don't agree with the methodology of doubling the transfer rate, done because there is one link for transmitting and another for receiving data. We would only agree with this if the links were in the same direction.


Now granted the AMD 990FX uses a 3200MHz HT 3.1 link which results in 6,400 MT/s or 12,800 MB/s now look at the schematic below:


The AMD FX Processor communicates with the 990FX Northbridge at 12.8GB/s which talks to the PCIe 2.0 ports at 16GB/s. Therefore, for all intents and purposes, the AMD FX Processor talks to the Graphics card at 12.8GB/s, even if the Graphics card is running on a PCIe 2.0 x16 port.

Now we know that the a PCIe 2.0 x8 slot (8 GB/s) bottlenecks an AMD R9 290 under Ashes of the Singularity. Therefore the culprit for poor AMD performance could very well be the Hypertransport Link.

Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):

PCIe 2.0 x8 is saturated already (8 GB/s). Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.

Again... just a theory.


----------



## Serandur

@Mahigan

This is just my opinion/takeaway based on a lot of what you said across these forums as well as some others (on the partisanship thing, on theoretical capabilities, and on the role of DX12 in Maxwell vs GCN).

First off on the partisanship thing; I agree initiatives like Gameworks are harmful and detrimental... for users of any GPU vendor. Regardless of whether I've been using flagship Radeon or GeForce parts, a lot of Gameworks effects seem unduly demanding (obviously geared more towards the former, however at least Radeons get the benefit of a CCC tesselation slider that can lessen the insanity). Sometimes, even PhysX (on older titles) gives me wonky performance for little/no discernible reason on GeForce cards. Marketing is nasty too imo, but that's always kind of been there. It can be somewhat countered by educated reasoning and discussions on sites such as these, but the tech media will often glance over major issues if it threatens a favored hardware manufacturer.

However, if you're talking partisanship on the level of consumers; it cannot be eradicated imho. Even a perfectly neutral poster pointing out scenarios where there is an objective disparity between two products can be lending ammo for one biased party or be perceived as a biased threat by the other (as I'm sure you've already noticed; it's kind of an instinct everyone has to some degree). To change that, you'd have to alter the fundamental nature of human psychology in which one viewpoint (that person's own) is most salient and their own interests perceived as most important. The mere perception of partisanship (by one self-classified group regarding another) is enough to invoke or strengthen it on the other side as a defense.

On theoretical capabilities and DX12, I'm going to first note that extremely in-depth technical examination (especially over the course of dozens of pages) can make for a convoluted discussion that most people will not fully follow. It is with great pride for my hobby that I say microprocessors are easily among the most sophisticated technologies mankind produces. And so most people even on a site like this will not get much out of seeing a lot of technical jargon explaining obscure specifications considering this is largely not a professional microarchitecture engineering forum.

There's been a lot of discussion and speculation over a still-limited set of data to simply and effectively say ACEs are effective towards properly feeding GCN's shaders and making efficient use of GCN's cycles whereas Maxwell is fairly well-designed to do so regardless. Now here's where the theoretical capabilities come in.

One very important and quintessential factor in any discussion involving GCN vs Maxwell is clock speed and I feel like the comparisons made in this thread are unfair towards understanding the distinction on both a theoretical and realistic level:

If there is any area where I feel both AMD and Nvidia's marketing teams have screwed up in recent years, it would be the reference coolers on Hawaii and GM200 respectively, given the effect it seems to have on how people judge the chips' capabilities. There is reason to believe, given the 980 Ti's poor scaling in the computerbase.de DX12 benchmark, that the reference model wasn't even maintaining it's default boost speed (which is already quite conservative). Pretty much every 980 Ti (particularly the ones educated enthusiasts tend to buy; ie. aftermarket/custom models) can consistently do significantly more than the 1076 MHz the reference model may fall back on by default. ~1450-1500 is both a realistic goal and limit for the 980 Ti on air whereas Hawaii and Fiji both have a more conservative realistic range on air of about 1100-1180 MHz). Naturally, many things are amplified by that specification, so it is fairly important.

From experience, aftermarket 980 Ti's (such as one pcgameshardware.de used) often consistently maintain ~1350-1400 MHz out of the box. Mine certainly does (technically, 1405-1418 MHz with Gigabyte's OC mode and 1367-1380 MHz without). Aftermarket Hawaii cards tend to be around 1050-1100 MHz out of the box (with a significant memory overclock) and Fiji XT is simply 1050 MHz. How this affects theoretical capabilities is huge:

390X:

(2816 shaders x 2)*1.05 = ~5914 GFLOPS

4 rasterizers x 1.05 = 4.2 Gtri/s

64 ROPs x 1.05 = ~67 Gpixel/s

176 TMUs x 1.05 = ~185 Gtexel/s (~93 or half that with fp16/int16)

Fury X:

(4096 x 2)*1.05 = ~8601 GFLOPS

4 rasterizers x 1.05 = 4.2 Gtri/s

64 ROPs x 1.05 = ~67 Gpixel/s

256 TMUs x 1.05 = ~269 Gtexel/s (~135 or half that with fp16/int16)

980 Ti:

(2816 shaders x 2)*1.35 = ~7603 GFLOPS (~29% higher than 390X, ~12% less than Fury X)

6 rasterizers x 1.35 = 8.1 Gtri/s (~93% more or nearly double the 390X/Fury X)

96 ROPs x 1.35 = ~130 Gpixel/s (~93% more or nearly double the 390X/Fury X)

176 TMUs x 1.35 = ~238 Gtexel/s (maintains same rate at fp16/int16)

Visual proof of GCN vs Kepler/Maxwell's fp16/int16 texture filtering rates; not sure about relevance:


Spoiler: Warning: Spoiler!









Additional overclocking capability of all three parts tends to be about another ~10% more than that, so their relative positioning stays more-or-less consistent going even further. I know it's a long post and I thank anyone for reading the entire thing, but this is why I find it highly difficult to believe that even with both at theoretical peak performance, GCN will show any significant advantage that would allow them to outlast their Maxwell competitors in any significant way. Even the Pcper review supposedly utilizing the ACEs did not show an unspecified 390X (for which there is no reference model) more than ~5-13% ahead of a conservatively-clocked reference 980 (against which the 980 Ti has between 37.5% to 50% more of everything while being able to clock nearly every bit as high; in other words, it should be untouchable against the 390X).



Accordingly, even an unspecified 390 with DX12 in the computerbase.de review is only exhibiting about ~10% more than a 970 with DX11 (which should really not be faster than DX12 for the card) and the 280X exhibits a similarly small lead over GTX 770.



Speculation is all well and good, but drawing conclusions about GCN and Maxwell's longevity based on these is pretty weak and that's where some of the disagreement is coming from, imo. The only conclusions I can see from this are inconclusiveness... that, and don't buy reference 980 Tis if you can avoid it; aftermarket ones are in a whole other league when the reference ones choke on their own heat.


----------



## Kpjoslee

Interesting info and all that...but the game is still in the alpha state. Right now the game runs pretty bad on AMD CPU atm but might improve later on lol. We just gotta take the benchmark as what it is currently and wait until they release the another benchmark in more mature state. We definitely need more samples before we can find out what is the limiting factor, either CPU or GPU side.


----------



## Mahigan

While I agree with you that overclocking and switching out reference coolers is something some of us do, the majority of the market does not. Reviewers generally test reference cards first. To establish their performance as recommended by the manufacturer moving onto overclocking and other factors later on in the review cycle (reviewing individual factory overclocked products as well). Overclocking, though something which is amusing (I often run my 290x's at 1,250MHz each) and can yield considerable results, is not based on the recommended manufacturer settings.

I acknowledge your point, but given that overlocking returns vary, one cannot use overclocked cards in order to give the majority of consumers an idea of the performance they can expect.

I understand what you're saying, but I was attempting to explain the results people were seeing in the reviews they were reading about the Ashes of the Singularity DX12 benchmark. The reviewers weren't using overclocked cards. Therefore explaining what caused the performance levels people were seeing is what I did.

As for going forward. I cannot factor overclocked cards for the reasons I mentioned prior in this post. You just cannot predict, with any degree of certainty, what overclock a user will achieve. Therefore it is far more prudent to go by the manufacturers recommended settings. That means quoting nVIDIA and AMD. They designed their GPUs with certain clock speeds in mind. Those are the clock speeds upon which one ought to recommend a product. You can mention one card has a propensity to overclock higher than another, as reviewers do, but you can't promise a degree of performance based on how well your particular card overclocks.

One thing is for certain, I would have liked to see pcgameshardware test those factory overclocked cards at the recommended benchmark settings rather than attempt to derive a particular result. It would have added to the discussion, rather than render their results unusable..


----------



## KyadCK

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> especially for strategy game... just res scaling or MSAA.
> 
> @ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.
> 
> 
> 
> I don't think it is CPU related.
> 
> PontiacGTX shared this link with me and I believe he is onto something: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-hypertransport-bus/4/
> Quote:
> 
> 
> 
> HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1 and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD processors):
> 1,800 MHz = 3,600 MT/s = 7,200 MB/s
> 2,000 MHz = 4,000 MT/s = 8,000 MB/s
> 2,400 MHz = 4,800 MT/s = 9,600 MB/s
> 2,600 MHz = 5,200 MT/s = 10,400 MB/s
> Sometimes you will see the MT/s numbers published as MHz, as already discussed.
> Socket AM2+ and AM3 processors and their companion chipsets, however, are limited to the 8,000 MB/s transfer rate. Only socket AM3+ CPUs and chipsets are capable of using all the speeds published above. Of course, all CPUs and chipsets are compatible with the lower transfer rates available.
> Keep in mind that socket AM2+ processors can still be installed on socket AM2 motherboards, however, their HyperTransport bus will be limited to HT2 speeds.
> Once again, the transfer rates announced by the HyperTransport consortium are highly exaggerated. They announce HyperTransport 3.0 as having a maximum transfer rate of 41.6 GB/s. To reach this number they considered 32-bit links (and not 16-bit links) and doubled the number found by two because there are two links available. The math used was 2,600 MHz x 32 x 2 / 8 x 2 links. As we have already explained, AMD processors use 16-bit links, not 32-bit ones, and we don't agree with the methodology of doubling the transfer rate, done because there is one link for transmitting and another for receiving data. We would only agree with this if the links were in the same direction.
> 
> Click to expand...
> 
> Now granted the AMD 990FX uses a 3200MHz HT 3.1 link which results in 6,400 MT/s or 12,800 MB/s now look at the schematic below:
> 
> 
> The AMD FX Processor communicates with the 990FX Northbridge at 12.8GB/s which talks to the PCIe 2.0 ports at 16GB/s. Therefore, for all intents and purposes, the AMD FX Processor talks to the Graphics card at 12.8GB/s, even if the Graphics card is running on a PCIe 2.0 x16 port.
> 
> Now we know that the a PCIe 2.0 x8 slot (8 GB/s) bottlenecks an AMD R9 290 under Ashes of the Singularity. Therefore the culprit for poor AMD performance could very well be the Hypertransport Link.
> 
> Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):
> 
> PCIe 2.0 x8 is saturated already (8 GB/s). Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.
> 
> Again... just a theory.
Click to expand...

Some incorrect math in there.

990FX stock HT clock is 2.6Ghz but can on some motherboards be overclocked to ~3Ghz. They are not stock clocked at 3.2Ghz. They are also 16-bit links in each direction, not 32-bit. 10.4GB/s unidirectional.

PCI-e 2.0 is 500MB/s/lane pre-encode. That's 8GB/s pre-encoded on an x16, and 6.4GB/s post encode (8/10 encode rate). PCI-e 3.0 is 1GB/s per lane and 128/130 encode rate for just under 16GB/s.

The 990FX board with 3.0 adds a PLX chip on top of that. It takes 32 lanes of 2.0 and splits them into either x16 or x8/x8 of 3.0, including the encoding changes. Hance the latency.


----------



## PontiacGTX

Quote:


> Originally Posted by *sugarhell*
> 
> No it doesnt increase the draw calls. How can an API increase the draw calls?


I thought that draw calls were set by API/Driver because amd suggested the bottleneck on DX11 were the draw calls .but I had seen tgat AMD suggested that dying light required 40k or 70k draw calls

And now tell this


It might based on the graphic engine code,driver and maybe well the API?


----------



## Mahigan

Quote:


> Originally Posted by *KyadCK*
> 
> Some incorrect math in there.
> 
> 990FX stock HT clock is 2.6Ghz but can on some motherboards be overclocked to ~3Ghz. They are not stock clocked at 3.2Ghz. They are also 16-bit links in each direction, not 32-bit. 10.4GB/s unidirectional.
> 
> PCI-e 2.0 is 500MB/s/lane pre-encode. That's 8GB/s pre-encoded on an x16, and 6.4GB/s post encode (8/10 encode rate). PCI-e 3.0 is 1GB/s per lane and 128/130 encode rate for just under 16GB/s.
> 
> The 990FX board with 3.0 adds a PLX chip on top of that. It takes 32 lanes of 2.0 and splits them into either x16 or x8/x8 of 3.0, including the encoding changes. Hance the latency.


Well if anyone did the math incorrectly it would be AMD. I took the 6.4 GT/s 990FX shot from their PR material. Based on Hardware secrets information, I simply did a 6.4*2 for 12.8 GB/s. If the boards do only function at 2600MHz, rather than 3200MHz according to the PR material, then I am not sure why they would have placed this slide in their information. It would therefore appear to be rather dishonest on their part.


----------



## HalGameGuru

It could explain some of the performance back and forths we see with APU's and Athlon X4's in relation to the FX CPUs.


----------



## Mahigan

Quote:


> Originally Posted by *HalGameGuru*
> 
> It could explain some of the performance back and forths we see with APU's and Athlon X4's in relation to the FX CPUs.


Yes... very good point.


----------



## KyadCK

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> Some incorrect math in there.
> 
> 990FX stock HT clock is 2.6Ghz but can on some motherboards be overclocked to ~3Ghz. They are not stock clocked at 3.2Ghz. They are also 16-bit links in each direction, not 32-bit. 10.4GB/s unidirectional.
> 
> PCI-e 2.0 is 500MB/s/lane pre-encode. That's 8GB/s pre-encoded on an x16, and 6.4GB/s post encode (8/10 encode rate). PCI-e 3.0 is 1GB/s per lane and 128/130 encode rate for just under 16GB/s.
> 
> The 990FX board with 3.0 adds a PLX chip on top of that. It takes 32 lanes of 2.0 and splits them into either x16 or x8/x8 of 3.0, including the encoding changes. Hance the latency.
> 
> 
> 
> Well if anyone did the math incorrectly it would be AMD. I took the 6.4 GT/s 990FX shot from their PR material. Based on Hardware secrets information, I simply did a 6.4*2 for 12.8 GB/s. If the boards do only function at 2600MHz, rather than 3200MHz according to the PR material, then I am not sure why they would have placed this slide in their information. It would therefore appear to be rather dishonest on their part.
Click to expand...

They do use 3.2Ghz HT. Just not on 990FX. More of a server thing.

Just correcting some variables in your equations. Had to do all this math before for other reasons, it helps to have correct numbers, and I have quite a bit of experience with the chipset.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> While I agree with you that overclocking and switching out reference coolers is something some of us do, the majority of the market does not. Reviewers generally test reference cards first. To establish their performance as recommended by the manufacturer moving onto overclocking and other factors later on in the review cycle (reviewing individual factory overclocked products as well). Overclocking, though something which is amusing (I often run my 290x's at 1,250MHz each) and can yield considerable results, is not based on the recommended manufacturer settings.
> 
> I acknowledge your point, but given that overlocking returns vary, one cannot use overclocked cards in order to give the majority of consumers an idea of the performance they can expect.
> 
> I understand what you're saying, but I was attempting to explain the results people were seeing in the reviews they were reading about the Ashes of the Singularity DX12 benchmark. The reviewers weren't using overclocked cards. Therefore explaining what caused the performance levels people were seeing is what I did.
> 
> As for going forward. I cannot factor overclocked cards for the reasons I mentioned prior in this post. You just cannot predict, with any degree of certainty, what overclock a user will achieve. Therefore it is far more prudent to go by the manufacturers recommended settings. That means quoting nVIDIA and AMD. They designed their GPUs with certain clock speeds in mind. Those are the clock speeds upon which one ought to recommend a product. You can mention one card has a propensity to overclock higher than another, as reviewers do, but you can't promise a degree of performance based on how well your particular card overclocks.
> 
> One thing is for certain, I would have liked to see pcgameshardware test those factory overclocked cards at the recommended benchmark settings rather than attempt to derive a particular result. It would have added to the discussion, rather than render their results unusable..


I apologize if it came across that way, but I wasn't talking about altering the cards or even actually overclocking (my 980 Ti G1 for instance literally comes out of the box as is doing 1367-1380 MHz, no personal OC applied). I mean custom models by the AIB partners like the G1 Gaming, ACX, Strix, Tri-X/Vapor-X, etc. that come out of the factory with significantly higher clock speeds (about +150 MHz), coolers to maintain them, and full-fledged warranties (legal guarantees of stability and longevity at those speeds). Going by the number of reviews per model on Newegg, custom models seem to be the majority of these.

From the perspective of any consumer, there's no reason to fear the custom models whatsoever. Sometimes, they're the only models available (initially including the market-dominating GTX 970, the R9 Fury, and the R9 390/X).

High clocks really are guaranteed with Maxwell even for the non-overclocking consumer if they pick the right model. Which is of course different from explaining results with a reference one and I agree with you on the rest of your post.


----------



## CrazyHeaven

Quote:


> Originally Posted by *Serandur*
> 
> High clocks really are guaranteed with Maxwell even for the non-overclocking consumer if they pick the right model. Which is of course different from explaining results with a reference one and I agree with you on the rest of your post.


High clocks are never guaranteed. In fact anything overclockrd is not guaranteed. It doesn't matter how many cards have oc pasted a certain point. The next one someone buys may not oc even a little. Poor oc cards can easily be sold for a loss and replaced with another. It is similar to playing the lottery.

Me, I oc as much as I can get then I'm happy with whatever that is. A card that oc more isn't worth the loss of price trying to get it. I'd be better off just going sli if I really need the extra boost. That way I'd get guaranteed improvements.


----------



## GorillaSceptre

The Dx12 update for ARK Survival Evolved was delayed due to driver problems, it should be done sometime next week.

I'm looking forward to seeing who comes out on top in this title, lets see if Dx12 actually gives Nvidia a boost this time.


----------



## mtcn77

Is physics
Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I don't think it is CPU related.
> 
> PontiacGTX shared this link with me and I believe he is onto something: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-hypertransport-bus/4/
> Now granted the AMD 990FX uses a 3200MHz HT 3.1 link which results in 6,400 MT/s or 12,800 MB/s now look at the schematic below:
> 
> 
> The AMD FX Processor communicates with the 990FX Northbridge at 12.8GB/s which talks to the PCIe 2.0 ports at 16GB/s. Therefore, for all intents and purposes, the AMD FX Processor talks to the Graphics card at 12.8GB/s, even if the Graphics card is running on a PCIe 2.0 x16 port.
> 
> 
> *Now we know that the a PCIe 2.0 x8 slot (8 GB/s) bottlenecks an AMD R9 290 under Ashes of the Singularity.* Therefore the culprit for poor AMD performance could very well be the Hypertransport Link.
> 
> Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):
> 
> PCIe 2.0 x8 is saturated already (8 GB/s).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.
> 
> Again... just a theory.


Can you please elucidate the part about 290 with direct pointers? I cannot distinguish whether there has been a prior 290 example that I missed, or gtx 980 implies the a similar phenomenon.


----------



## Themisseble

Quote:


> Originally Posted by *GorillaSceptre*
> 
> The Dx12 update for ARK Survival Evolved was delayed due to driver problems, it should be done sometime next week.
> 
> I'm looking forward to seeing who comes out on top in this title, lets see if Dx12 actually gives Nvidia a boost this time.


Its very simple. As mahigan explained that AMD has advantages in DX12.
- API overhead (no more CPu bottleneck at 1080P)
- Better parallelism
- async shaders

NVIDIA will always be better at tessellation on DX9, DX11 or DX12. Tessellation take huge perf. on rasterizer efficiency - with better tessellation and 50% more ROPs NVIDIA has huge advantages.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Yes... very good point.


What about other games like GTA V, Watchdogs, AC:Unity which are also very high on drawcalls, yet you can see big difference between FX 4300 and FX 8350.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Themisseble*
> 
> Its very simple. As mahigan explained that AMD has advantages in DX12.
> - API overhead (no more CPu bottleneck at 1080P)
> - Better parallelism
> - async shaders
> 
> NVIDIA will always be better at tessellation on DX9, DX11 or DX12. Tessellation take huge perf. on rasterizer efficiency - with better tessellation and 50% more ROPs NVIDIA has huge advantages.


I think you quoted the wrong post?

As far as Mahigans theory goes, well.. I won't use the word debunked, but people on the more tech savvy forums disagree with his reasoning. The consensus seems to be that the biggest differences will come from how the game is programmed, not as simple as X is better than Y.


----------



## Anna Torrent

Quote:


> Originally Posted by *Themisseble*
> 
> Its very simple. As mahigan explained that AMD has advantages in DX12.
> - API overhead (no more CPu bottleneck at 1080P)
> - Better parallelism
> - async shaders
> 
> NVIDIA will always be better at tessellation on DX9, DX11 or DX12. Tessellation take huge perf. on rasterizer efficiency - with better tessellation and 50% more ROPs NVIDIA has huge advantages.


You will have API overhead, but to a lower extant. You can see in the AoS benchmark (which is not a rule ofcourse)
Parallelism is better, but the work is not equally shared by all cores, yet, so faster cores still have the upper hand in many cases, and Intel cores are way faster in terms of IPC
ACE - indeed. The Maxwell II has something similar, but more limited (only 2) and I'm not sure how well it is implemented vs AMD ACE


----------



## Anna Torrent

Quote:


> Originally Posted by *Themisseble*
> 
> Its very simple. As mahigan explained that AMD has advantages in DX12.
> - API overhead (no more CPu bottleneck at 1080P)
> - Better parallelism
> - async shaders
> 
> NVIDIA will always be better at tessellation on DX9, DX11 or DX12. Tessellation take huge perf. on rasterizer efficiency - with better tessellation and 50% more ROPs NVIDIA has huge advantages.


You will have API overhead, but to a lower extant. You can see in the AoS benchmark (which is not a rule ofcourse)
Quote:


> Originally Posted by *GorillaSceptre*
> 
> I think you quoted the wrong post?
> 
> As far as Mahigans theory goes, well.. I won't use the word debunked, but people on the more tech savvy forums disagree with his reasoning. The consensus seems to be that the biggest differences will come from how the game is programmed, not as simple as X is better than Y.


Agreed about the second point. Even in DX11 we can see games that are considered "heavy" without a CPU limitation. Even BF4 and Crysis 3, at least on 1080p (link)


----------



## Noufel

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> Its very simple. As mahigan explained that AMD has advantages in DX12.
> - API overhead (no more CPu bottleneck at 1080P)
> - Better parallelism
> - async shaders
> 
> NVIDIA will always be better at tessellation on DX9, DX11 or DX12. Tessellation take huge perf. on rasterizer efficiency - with better tessellation and 50% more ROPs NVIDIA has huge advantages.
> 
> 
> 
> I think you quoted the wrong post?
> 
> As far as Mahigans theory goes, well.. I won't use the word debunked, but people on the more tech savvy forums disagree with his reasoning. The consensus seems to be that the biggest differences will come from how the game is programmed, not as simple as X is better than Y.
Click to expand...

Not good if true, i think the majority of devs will programme their games better on nvidia gpus ( market shares and all ) and amd will continue to have problems of performance against nvidia but less than when they were on dx11


----------



## semitope

Quote:


> Originally Posted by *Noufel*
> 
> Not good if true, i think the majority of devs will programme their games better on nvidia gpus ( market shares and all ) and amd will continue to have problems of performance against nvidia but less than when they were on dx11


Most AAA games are on consoles. It's unlikely the programming will favor nvidia when gameworks is not involved. Also I doubt its just down to how things are programmed. Sure you might reduce nvidias hardware issues by making things harder on the devs, but there are hardware aspects all the same. Doing things in a way that might help nvidia in this case probably won't mess with AMDs performance if it's still standard dx12 usage.


----------



## Dudewitbow

Quote:


> Originally Posted by *Noufel*
> 
> Not good if true, i think the majority of devs will programme their games better on nvidia gpus ( market shares and all ) and amd will continue to have problems of performance against nvidia but less than when they were on dx11


just keep in mind that AAA games in question tend to be on console too, which are GCN powered. market share is relatively equal if consoles are added into AMD's favor. A dev would probably be more prone to optimizing for nvidia gpus(disregarding amd or nvidia influences) if its a standalone PC title as the developer themselves dont have to necessarily work with GCN hardware directly at one point in development.


----------



## GorillaSceptre

Quote:


> Originally Posted by *semitope*
> 
> I suggested looking at other parts of the system many posts ago.
> 
> I want a copy of mgs5
> Most AAA games are on consoles. It's unlikely the programming will favor nvidia when gameworks is not involved. Also I doubt its just down to how things are programmed. Sure you might reduce nvidias hardware issues by making things harder on the devs, but there are hardware aspects all the same. Doing things in a way that might help nvidia in this case probably won't mess with AMDs performance if it's still standard dx12 usage.


That's assuming there is hardware issues, and most disagree with that assessment.

We need more evidence than a single benchmark from an alpha game, from a studio that has been heavily involved in promoting mantle. Nvidia also came out and said the results aren't representative.

I'm not saying anyone's lying or that Stardock are bought off, just that we need more testing. Not to mention that it isn't exactly a land slide win for AMD, they are within a couple frames of each other, with Nvidia sometimes coming out on top.

With all the talk of DX12 and it's performance benefits, Nvidia actually LOOSING performance seems a bit strange.


----------



## KSIMP88

Oh please let TES6 be DX12


----------



## semitope

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That's assuming there is hardware issues, and most disagree with that assessment.
> 
> We need more evidence than a single benchmark from an alpha game, from a studio that has been heavily involved in promoting mantle. Nvidia also came out and said the results aren't representative.
> 
> I'm not saying anyone's lying or that Stardock are bought off, just that we need more testing. Not to mention that it isn't exactly a land slide win for AMD, they are within a couple frames of each other, with Nvidia sometimes coming out on top.
> 
> With all the talk of DX12 and it's performance benefits, Nvidia actually LOOSING performance seems a bit strange.


losing performance is not strange when you hardware is more suited for another API. The actual framerate loss is probably bigger since it wipes out aspects of dx12 that benefit nvidia. Their hardware limitation might be more significant than the results suggest. Sucking at an API feature is not something new. Supporting is not the same as doing it well. My guess is that a "representative" dx12 game will be one that does not use a feature they suck at.


----------



## GorillaSceptre

Quote:


> Originally Posted by *semitope*
> 
> losing performance is not strange when you hardware is more suited for another API. The actual framerate loss is probably bigger since it wipes out aspects of dx12 that benefit nvidia. Their hardware limitation might be more significant than the results suggest. Sucking at an API feature is not something new. Supporting is not the same as doing it well. My guess is that a "representative" dx12 game will be one that does not use a feature they suck at.


You're obviously taking Mahigans theory as fact. We may as well leave this discussion here then.


----------



## semitope

Quote:


> Originally Posted by *GorillaSceptre*
> 
> You're obviously taking Mahigans theory as fact. We may as well leave this discussion here then.


Actually wasn't the first source for this information. Got the information from a developer talking about VR IIRC. Nvidias hardware is not as robust currently.

It makes all kinds of sense and I expect things might get worse. A fury X matching a 980ti means its still not being fully utilized.


----------



## provost

Quote:


> Originally Posted by *Noufel*
> 
> Not good if true, i think the majority of devs will programme their games better on nvidia gpus ( market shares and all ) and amd will continue to have problems of performance against nvidia but less than when they were on dx11


Plus rep, as you have hit the gist of the counter argument on the head.
Mahigan's theory appeals to me because he has gone to great lengths to research and share his opinions as to the why AMD's architecture works better than Nvidia if the developers properly utilize the benefits of Dx12 to reduce the overhead. All I have seen by way of counter argument is why his theory doesn't work due to yet to be seen optimizations for Nvidia, which I interpret as follows:

a) until Nvidia catches up with Pascal architecture or
b) until developers have been incented enough to code away from consumer friendly dx 12 to put the pc gamers in the same position as they were in with dx11, I.e there ain't no such thing as a free (lunch) performance, if you what more performance you got to pay for it.









But, no one has proposed an alternative detailed theory that demystifies the dx 12 performance riddle of the GPU makers


----------



## PhantomTaco

By all means correct me if I'm wrong but there's a few things I don't understand. For starters are these theories based on the single Ashes of the Singularity benchmark? IIRC the game was developed with AMD helping the dev out. Would it be crazy to assume there were choices made that specifically improved performance for AMD? I'm not saying they necessarily actively made choices that hampered NVIDIA intentionally or even directly, but if true I'd assume some choices made would specifically benefit AMD while not helping, or potentially hurting NVIDIA hardware. Assuming this is all still based on Ashes alone, that's a single engine. There's at least half a dozen other engines out there that either have dx12 support or have it coming that are not necessarily going to behave the same way, so doesn't it seem a bit too early to draw any conclusions based on a sample size of 1?


----------



## HalGameGuru

AMD does not have the track record of nVidia on making their tech inaccessible or inefficient to their competitors in a premeditated fashion. When TressFX came out and nVidia had trouble they released the code to let nVidia optimize for it. Most AMD tech is made available for the industry as a whole to make use of and optimize, TressFX, FreeSync, Mantle, etc. nVidia could have made use of Mantle if they wished.


----------



## ku4eto

Quote:


> Originally Posted by *PhantomTaco*
> 
> By all means correct me if I'm wrong but there's a few things I don't understand. For starters are these theories based on the single Ashes of the Singularity benchmark? IIRC the game was developed with AMD helping the dev out. Would it be crazy to assume there were choices made that specifically improved performance for AMD? I'm not saying they necessarily actively made choices that hampered NVIDIA intentionally or even directly, but if true I'd assume some choices made would specifically benefit AMD while not helping, or potentially hurting NVIDIA hardware. Assuming this is all still based on Ashes alone, that's a single engine. There's at least half a dozen other engines out there that either have dx12 support or have it coming that are not necessarily going to behave the same way, so doesn't it seem a bit too early to draw any conclusions based on a sample size of 1?


The game uses DirectX12 and makes fine use of the Asynchronous computing. This are the benefits for AMD. For nVidia , the only benefit is the DirectX12 more or less. They have not done anything to hinder both AMD and nVidia.


----------



## semitope

Quote:


> Originally Posted by *PhantomTaco*
> 
> By all means correct me if I'm wrong but there's a few things I don't understand. For starters are these theories based on the single Ashes of the Singularity benchmark? IIRC the game was developed with AMD helping the dev out. Would it be crazy to assume there were choices made that specifically improved performance for AMD? I'm not saying they necessarily actively made choices that hampered NVIDIA intentionally or even directly, but if true I'd assume some choices made would specifically benefit AMD while not helping, or potentially hurting NVIDIA hardware. Assuming this is all still based on Ashes alone, that's a single engine. There's at least half a dozen other engines out there that either have dx12 support or have it coming that are not necessarily going to behave the same way, so doesn't it seem a bit too early to draw any conclusions based on a sample size of 1?


You should read this http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
Quote:


> Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. *All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months.* Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.


Quote:


> Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone's machine, regardless of what hardware our players have.
> 
> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. *For example, when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.*
> 
> We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn't move the engine architecture backward (that is, we are not jeopardizing the future for the present).


There is nvidia code in there. This will likely be one of the most fair benchmarks we will have. When nvidia gets their hands in a dx12 game things might look different. Other times thee IHVs won't have had as much access to the game. This particular benchmark has had everyone involved for a long time. If its not performing as might be desired, its likely because of the graphics card (and maybe driver).


----------



## PhantomTaco

Quote:


> Originally Posted by *HalGameGuru*
> 
> AMD does not have the track record of nVidia on making their tech inaccessible or inefficient to their competitors in a premeditated fashion. When TressFX came out and nVidia had trouble they released the code to let nVidia optimize for it. Most AMD tech is made available for the industry as a whole to make use of and optimize, TressFX, FreeSync, Mantle, etc. nVidia could have made use of Mantle if they wished.


This doesn't really prove anything, past performance isn't an indicator of anything, it may hold merit sometimes, but without evidence it's nothing more than suspicion.
Quote:


> Originally Posted by *ku4eto*
> 
> The game uses DirectX12 and makes fine use of the Asynchronous computing. This are the benefits for AMD. For nVidia , the only benefit is the DirectX12 more or less. They have not done anything to hinder both AMD and nVidia.


This doesn't say much either.
Quote:


> Originally Posted by *semitope*
> 
> You should read this http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
> 
> There is nvidia code in there. This will likely be one of the most fair benchmarks we will have. When nvidia gets their hands in a dx12 game things might look different. Other times thee IHVs won't have had as much access to the game. This particular benchmark has had everyone involved for a long time. If its not performing as might be desired, its likely because of the graphics card (and maybe driver).


This, though, does say something. I'm interested to see when UE4 based Ark launches the DX12 patch next week to get some more data points to add. While it is nice to know that they did open the source code up, it doesn't entirely mean it is unbiased. As I recall Oxide games was one of the first to work with AMD on mantle, meaning they had a past track record with AMD working on developing their engine. In that respect it makes me wonder whether or not they still did make choices that specifically benefitted AMD back with mantle that were repeated with Ashes. It also means (in theory at least), that AMD has had more than the past year working with Oxide on this title, whereas Intel and Nvidia have had a year working on it. I'm not calling foul play, but I am still questioning the data until more titles are launched based on different engines.


----------



## CrazyElf

Quote:


> Originally Posted by *PhantomTaco*
> 
> By all means correct me if I'm wrong but there's a few things I don't understand. For starters are these theories based on the single Ashes of the Singularity benchmark? IIRC the game was developed with AMD helping the dev out. Would it be crazy to assume there were choices made that specifically improved performance for AMD? I'm not saying they necessarily actively made choices that hampered NVIDIA intentionally or even directly, but if true I'd assume some choices made would specifically benefit AMD while not helping, or potentially hurting NVIDIA hardware. Assuming this is all still based on Ashes alone, that's a single engine. There's at least half a dozen other engines out there that either have dx12 support or have it coming that are not necessarily going to behave the same way, so doesn't it seem a bit too early to draw any conclusions based on a sample size of 1?


As the other poster has indicated.

The issue is that Nvidia had access to the DX12 code for over a year.
http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
Quote:


> *Our code has been reviewed by Nvidia*, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. *All IHVs have had access to our source code for over year,* and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally, the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.
> 
> ...
> 
> Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone's machine, regardless of what hardware our players have.
> 
> To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback*. For example, when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.*
> 
> We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn't move the engine architecture backward (that is, we are not jeopardizing the future for the present).


This would suggest to me that Nvidia knew what was coming and there hasn't be excessive favoritism here and Nvidia even had the opportunity to contribute to improve performance for their hardware.

What Mahigan is saying is that historically, Nvidia has relied heavily on driver based optimizations. That has paid handsome dividends for DX11 performance. However the way they have designed their architecture - serial heavy, means that it will not do as well on DX12, where it more parallel intensive.

The other of course is that there is a close relationship between Mantle, compared with DX12 and Vulkan. AMD must have planned this together and built their architecture around that, even sacrificing DX11 performance (less money spent on DX11 drivers). In other words, if Mahigan's hypothesis is right, they played the long game.

Quote:


> Originally Posted by *PhantomTaco*
> 
> This, though, does say something. I'm interested to see when UE4 based Ark launches the DX12 patch next week to get some more data points to add. While it is nice to know that they did open the source code up, it doesn't entirely mean it is unbiased. As I recall Oxide games was one of the first to work with AMD on mantle, meaning they had a past track record with AMD working on developing their engine. In that respect it makes me wonder whether or not they still did make choices that specifically benefitted AMD back with mantle that were repeated with Ashes. It also means (in theory at least), that AMD has had more than the past year working with Oxide on this title, whereas Intel and Nvidia have had a year working on it. I'm not calling foul play, but I am still questioning the data until more titles are launched based on different engines.


Same here. I would like a bigger sample size to draw a definitive conclusion. See my response to Provost below for my full thoughts - I think that Mahigan's hypothesis is probable, but there are some mysteries.

Quote:


> Originally Posted by *Mahigan*
> 
> Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):
> 
> PCIe 2.0 x8 is saturated already (8 GB/s). Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.
> 
> Again... just a theory.


The full review on TPU
https://www.techpowerup.com/reviews/NVIDIA/GTX_980_PCI-Express_Scaling/

I suppose there's process of elimination. What is the Bulldozer/Steamroller architecture very weak at? Well there's raw single threaded performance and the module design isn't good at floating point, but there's got to be something specific.

The question is, what communicates between the GPU and CPU? That may be a good place to start. Another may be, what has Intel done decisively better?

Quote:


> Originally Posted by *provost*
> 
> Plus rep, as you have hit the gist of the counter argument on the head.
> Mahigan's theory appeals to me because he has gone to great lengths to research and share his opinions as to the why AMD's architecture works better than Nvidia if the developers properly utilize the benefits of Dx12 to reduce the overhead. All I have seen by way of counter argument is why his theory doesn't work due to yet to be seen optimizations for Nvidia, which I interpret as follows:
> 
> a) until Nvidia catches up with Pascal architecture or
> b) until developers have been incented enough to code away from consumer friendly dx 12 to put the pc gamers in the same position as they were in with dx11, I.e there ain't no such thing as a free (lunch) performance, if you what more performance you got to pay for it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> But, no one has proposed an alternative detailed theory that demystifies the dx 12 performance riddle of the GPU makers


+Rep

This is basically where we are at:

We know that something is causing the DX12 leap in AMD's arch. We don't know what, but Mahigan's hypothesis is the design of AMD's architecture, which they optimized around for DX12, perhaps at the expense of DX11.
At the moment, AMD is at a drawback and needs that market/mind-share. Combined with GCN consoles, they may have narrowed the gap in their ability to drive future games development.
The opportunity for driver based optimizations is far more limited in DX12, due to it's "close to metal" nature.
Nvidia can and will catch up. They have the money and mindshare to do so. The question is when? Pascal? Or is it very compute centric, in which case they may go with Volta.
I would agree that there hasn't been any well researched, well thought out alternative hypothesis. That is not to say that Mahigan's ideas are infallible - they are not, as we still do not have a conclusive explanation as to why the Fury X does not scale very well (and apparently a second mystery now - the AMD CPU's poor performance). Left unresolved, that may require a substantial modification to any hypothesis. Personally I accept that it's the most probable explanation right now.

I think that in the short term, this may help stem the tide for AMD, perhaps a generation or maybe two. But in the long run, they still are at a drawback. They have been cutting R&D money for GPUs and focusing mostly on Zen for example. AMD simply does not have the kind of money to spend. Nvidia is outspending them. In the long run, I fear there will be a reversal if they cannot come up with something competitive.

For AMD though, it's very important they figure out what is the problem, because they need to know where the transistor budget should go for the next generation (although admittedly, if the rumors are true, it's already taped out - it's important to keep in mind that GPUs are designed years in advance).

Remember everyone - it's best to keep 2 GPU vendors that are very competitive with each other. That's when the consumer wins. We want the best performance for a competitive price. For that reason, I'm hoping that AMD actually wins the next GPU round - and that Zen is a success (IMO, Intel monopoly is also bad for us). A monopoly is a lose for us.


----------



## Kuivamaa

Quote:


> Originally Posted by *PontiacGTX*
> 
> I thought that draw calls were set by API/Driver because amd suggested the bottleneck on DX11 were the draw calls .but I had seen tgat AMD suggested that dying light required 40k or 70k draw calls
> 
> And now tell this
> 
> 
> It might based on the graphic engine code,driver and maybe well the API?


ΒF4 is the same exact game regardless of graphics API- there are no extra things drawn under Mantle.


----------



## Forceman

Quote:


> Originally Posted by *CrazyElf*
> 
> What Mahigan is saying is that historically, Nvidia has relied heavily on driver based optimizations. That has paid handsome dividends for DX11 performance. However the way they have designed their architecture - serial heavy, *means that it will not do as well on DX12, where it more parallel intensive.*


And yet even in this benchmark, which would appear to be well-suited to AMD's hardware, the Fury X is still neck and neck with the 980 Ti. The real issue the benchmark highlights, to me anyway, is not that AMD's DX12 performance is so good, but that their DX11 performance (particularly in this example) is so bad. I don't see how you can look at this one benchmark and draw the conclusion that AMD has a better architecture for the future, when their premier card is still tied with Nvidia's (likewise for the 980 and 390X).

Edit: And if anyone is curious, here are the Gen 3/Gen 2 results for the 3DMark API test on a 290X: (Gen 2 / Gen 3)

DX11 ST: 1,298,473 / 1,265,204
DX11 MT: 1,324,209 / 1,311,353
Mantle: 18,476,350 / 18,917,266
DX12: 17,774,164 / 20,542,867

So Mantle and DX12 are faster, but somehow DX11 is slower in Gen 3.

www.3dmark.com/3dm/8388481
www.3dmark.com/3dm/8388578


----------



## Kollock

Quote:


> Originally Posted by *PhantomTaco*
> 
> This doesn't really prove anything, past performance isn't an indicator of anything, it may hold merit sometimes, but without evidence it's nothing more than suspicion.
> This doesn't say much either.
> This, though, does say something. I'm interested to see when UE4 based Ark launches the DX12 patch next week to get some more data points to add. While it is nice to know that they did open the source code up, it doesn't entirely mean it is unbiased. As I recall Oxide games was one of the first to work with AMD on mantle, meaning they had a past track record with AMD working on developing their engine. In that respect it makes me wonder whether or not they still did make choices that specifically benefitted AMD back with mantle that were repeated with Ashes. It also means (in theory at least), that AMD has had more than the past year working with Oxide on this title, whereas Intel and Nvidia have had a year working on it. I'm not calling foul play, but I am still questioning the data until more titles are launched based on different engines.


Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.

Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD







As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.

If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic







).

Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.

From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.

I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.

Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?

In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,







)

--
P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


----------



## PhantomTaco

Quote:


> Originally Posted by *CrazyElf*
> 
> As the other poster has indicated.
> 
> The issue is that Nvidia had access to the DX12 code for over a year.
> http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
> This would suggest to me that Nvidia knew what was coming and there hasn't be excessive favoritism here and Nvidia even had the opportunity to contribute to improve performance for their hardware.
> 
> What Mahigan is saying is that historically, Nvidia has relied heavily on driver based optimizations. That has paid handsome dividends for DX11 performance. However the way they have designed their architecture - serial heavy, means that it will not do as well on DX12, where it more parallel intensive.
> 
> The other of course is that there is a close relationship between Mantle, compared with DX12 and Vulkan. AMD must have planned this together and built their architecture around that, even sacrificing DX11 performance (less money spent on DX11 drivers). In other words, if Mahigan's hypothesis is right, they played the long game.
> Same here. I would like a bigger sample size to draw a definitive conclusion. See my response to Provost below for my full thoughts - I think that Mahigan's hypothesis is probable, but there are some mysteries.
> The full review on TPU
> https://www.techpowerup.com/reviews/NVIDIA/GTX_980_PCI-Express_Scaling/
> 
> I suppose there's process of elimination. What is the Bulldozer/Steamroller architecture very weak at? Well there's raw single threaded performance and the module design isn't good at floating point, but there's got to be something specific.
> 
> The question is, what communicates between the GPU and CPU? That may be a good place to start. Another may be, what has Intel done decisively better?
> +Rep
> 
> This is basically where we are at:
> 
> We know that something is causing the DX12 leap in AMD's arch. We don't know what, but Mahigan's hypothesis is the design of AMD's architecture, which they optimized around for DX12, perhaps at the expense of DX11.
> At the moment, AMD is at a drawback and needs that market/mind-share. Combined with GCN consoles, they may have narrowed the gap in their ability to drive future games development.
> The opportunity for driver based optimizations is far more limited in DX12, due to it's "close to metal" nature.
> Nvidia can and will catch up. They have the money and mindshare to do so. The question is when? Pascal? Or is it very compute centric, in which case they may go with Volta.
> I would agree that there hasn't been any well researched, well thought out alternative hypothesis. That is not to say that Mahigan's ideas are infallible - they are not, as we still do not have a conclusive explanation as to why the Fury X does not scale very well (and apparently a second mystery now - the AMD CPU's poor performance). Left unresolved, that may require a substantial modification to any hypothesis. Personally I accept that it's the most probable explanation right now.
> 
> I think that in the short term, this may help stem the tide for AMD, perhaps a generation or maybe two. But in the long run, they still are at a drawback. They have been cutting R&D money for GPUs and focusing mostly on Zen for example. AMD simply does not have the kind of money to spend. Nvidia is outspending them. In the long run, I fear there will be a reversal if they cannot come up with something competitive.
> 
> For AMD though, it's very important they figure out what is the problem, because they need to know where the transistor budget should go for the next generation (although admittedly, if the rumors are true, it's already taped out - it's important to keep in mind that GPUs are designed years in advance).
> 
> Remember everyone - it's best to keep 2 GPU vendors that are very competitive with each other. That's when the consumer wins. We want the best performance for a competitive price. For that reason, I'm hoping that AMD actually wins the next GPU round - and that Zen is a success (IMO, Intel monopoly is also bad for us). A monopoly is a lose for us.


Thank you for a very illuminating post. It's better than the other responses I got that threw something out there with little background info or little logic/support, rep for that alone. Having access for the past year like I said in my post is great, but AMD has been working with Oxide for far longer than that. We know this, and this isn't new information by any means. Let me preface what I'm going to say next with this is all pure conjecture and speculation, but it's one reason why I still have my doubts. Let me also preface with I am not remotely an expert in APIs, merely following logic. When you design something, anything really, you have a groundwork. You have a basis upon which you build everything else upon. If you started this groundwork in collaboration with someone, there is every reason to believe that both worked together to set up and establish this framework; how it runs at a very basic level, what is chosen to execute which types of commands, how they are executed, etc. Now if you are doing so with someone who ALSO happens to be one of the people that will be making use of this going forward, profiting off of it (indirectly) and working as a company to make it as good as possible for themselves, there is going to be opportunity. Opportunity to really show off what your product is capable, opportunity to meld the way things are done going forward. There's also opportunity to potentially make decisions that will SPECIFICALLY benefit YOUR PRODUCTS at a fundamental level. Am I saying it was done with ill intention? Not remotely, you want it to work well, and you want people that buy your product to feel like they made the right decision in choosing you. No harm, no foul on that whatsoever. There is also a darker side to it, potentially. You could also ACTIVELY choose to lay in a framework that INTENTIONALLY benefits your products OVER your competitor. And if it's something that's been laid in at the onset, before anyone else had access to it, by the time it's opened up to others, it may be something that can no longer be changed without tearing down the entire foundation. Am I saying this is the case? No, not at all. Am I saying it is possible? Maybe. Intentional? I don't know, and it doesn't serve to prove anything remotely important had it been intentional or not. But this illustrates my point, because AMD has been tied in with Oxide since at least 2013 according to a Google News search. That means (in theory) they've had at minimum a year's worth of time working with Oxide before NVIDIA or Intel got access to their new engine.

Yes NVIDIA was given access and they do so nicely point out a specific example of where NVIDIA made changes to help improve performance on their hardware. That still doesn't account for the full year minimum AMD had helping develop and lay the groundwork with Oxide. And yes before anyone points it out I see where it says that their requirements include not beinga loss for other hardware implementations and it doesn't drive the engine arch backward. As I've understood it DX12 and Mantle both are reinventing the way games are developed and rendered, providing developers with options and power before untapped and unconsidered, the full extent of which is going to take a good while to fully comb through and make use of. Who's to say that at the onset the choices made were obvious? That it was obvious that going with option A instead of option B would end up costing other competitors potential performance opportunities? Who's to say that Oxide would have been able to tell, or even AMD had it not been intended as such?

Provost post is very thought provoking, and well laid out, but not without potential flaws/questions. Why is it that he only interprets optimizations as meaning either new hardware or incentivizing developers. It may be something far simpler, it may be something far more complex, there is most certainly always another alternative. Most games are based on other makers engines (UE4, Frostbite ETC). It comes down to those engine developers making active decisions regarding the future of their engines based on the market. The market right now is very heavily NVIDIA saturated for better or worse (I'd argue for worse). So, if you're a developer making a new engine, and you want it to run seamlessly on the majority of the market, which would you choose to work with to ensure that? Probably the one that has the most market share, meaning your customers products will run fantastically on their customers hardware. So when a game comes out on an engine that happens to run better on the less established platform, it raises questions (not skepticism, just questions). Is this because that platform happens to be better suited for this new engine and API? If so, fantastic, that's a boon for everyone because it brings that competitor back into the spotlight and lets them try to reclaim market share. But if it's because you've been working with them extensively for several years prior to even launching the game or the engine, then more questions arise.

As you've said, I've said, and I'm sure countless dozens of others have said, we are basing this all on a single pre release game, a single engine. It's the first foray into DX12 and we're trying to draw conclusions and claim victories and try to stem losses. I'm not in favor of any of this garbage. I'm most certainly with you in that I want AMD to succeed, I was on the verge of buying a Fury X and holding off on a 980TI until it came out and personally didn't appeal to me enough. That being said I still see AMD as a very viable option, and will basically recommend AMD at every level but the flagship purely because of price performance. But all we have is a single data point, a single source to draw conclusions from, and that in and of itself invalidates any conclusions we can make from it, it's barely a stone's throw away from conjecture. All I'm adding to it is more conjecture saying let's hold off, let's wait for more to come out, and then we can start drawing conclusions.


----------



## Forceman

Quote:


> Originally Posted by *Kollock*
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.


Wait, so all this analysis and conclusions about how async compute is going to make AMD's architecture the better one in the future, and this benchmark doesn't even use async compute on the Nvidia side?


----------



## PhantomTaco

Quote:


> Originally Posted by *Kollock*
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


I hate to double post but I wanted to thank you for posting this. It is probably the single most illuminating thing I've read on the subject to date. For the record I don't believe a single engine is ever a good measure of anything. Yes UE4 and it's prior iterations are among the most popularly used engines, but like you said they definitely have had a bias, and I wouldn't call a single benchmark from a single engine alone enough to draw any conclusions, regardless of whose engine it is. Your agreement with AMD is what keeps me questioning, and like I said in the last post I put up, I don't mark it as a sign of foul play or otherwise, merely something that draws questions. What I'd honestly love is a post from an NVIDIA representative to talk more from their perspective on these performance numbers, as well as more input from them as other DX12 enabled games launch (such as Ark) to better explain the choices they made at a hardware level and what their impact is.


----------



## PontiacGTX

Quote:


> Originally Posted by *Kollock*
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


what about Ashes of singularity`s cpu performance. It is supposed to take advantage of more cores threads(since Star swarm had a good gain with 6 cores) meanwhile here the 83xx CPUs are as weak as they are on directx 11,does this games uses something that in the AMD architecture/platform is weak or this games takes advantage of less than 6 Cores? What about the results of pcper that shows that a 6700k is faster on DX12?


----------



## Kollock

Quote:


> Originally Posted by *Forceman*
> 
> Wait, so all this analysis and conclusions about how async compute is going to make AMD's architecture the better one in the future, and this benchmark doesn't even use async compute on the Nvidia side?


AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.

Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.


----------



## Kuivamaa

So basically Mahigan was spot on?


----------



## PostalTwinkie

Quote:


> Originally Posted by *HalGameGuru*
> 
> AMD does not have the track record of nVidia on making their tech inaccessible or inefficient to their competitors in a premeditated fashion. When TressFX came out and nVidia had trouble they released the code to let nVidia optimize for it. Most AMD tech is made available for the industry as a whole to make use of and optimize, TressFX, FreeSync, Mantle, etc. nVidia could have made use of Mantle if they wished.












AdaptiveSync is the open standard that is "free". FreeSync is AMD's closed and proprietary implementation with their own closed/private validation process. FreeSync itself is anything but open and free, there are still costs associated to it. AMD just doesn't charge a licensing fee from themselves to display their "FreeSync" logo. They claim "free" because they simply ignore all the costs incurred by the manufacturer to get the "FreeSync Approved" stamp.

Your statements really don't mean much.

EDIT:

On topic: It sounds like there is going to be huge potential for games to really, and I mean really, favor one side or the other. At least if Nvidia takes one methodology in their architectures, and AMD takes another. You will then have publishers that will be able to show one an extreme bias.

At least that is a scary thought/possibility; further segmentation of the hardware market and games.


----------



## caswow

Quote:


> Originally Posted by *PostalTwinkie*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> AdaptiveSync is the open standard that is "free". FreeSync is AMD's closed and proprietary implementation with their own closed/private validation process. FreeSync itself is anything but open and free, there are still costs associated to it. AMD just doesn't charge a licensing fee from themselves to display their "FreeSync" logo. They claim "free" because they simply ignore all the costs incurred by the manufacturer to get the "FreeSync Approved" stamp.
> 
> Your statements really don't mean much.
> 
> EDIT:
> 
> On topic: It sounds like there is going to be huge potential for games to really, and I mean really, favor one side or the other. At least if Nvidia takes one methodology in their architectures, and AMD takes another. You will then have publishers that will be able to show one an extreme bias.
> 
> At least that is a scary thought/possibility; further segmentation of the hardware market and games.


what methodology do devs need to use if they want to make use of nvidias "advanced" architecture? and i mean something really usefull not overtesslation? and what segmentation do you mean? dont you think nvidia will implement more async compute in their next architecture to *boost* perf? because i think i know what direction you are heading...


----------



## black96ws6

Yeah but Nvidia actually LOSES performance in DX12, that makes no sense at all.

I look at it this way -

*Serial (DX11)*
*Nvidia* - Has 1 guy who has to carry a large rock 100 yards. Trains a lot for this.
*AMD* - Has 1 guy who has to carry a large rock 100 yards. Doesn't train as much as Nvidia's guy.

*Result*: Nvidia's guy is faster.

*Parallel (DX12)*
*Nvidia* - Still has 1 guy who has to carry a large rock 100 yards.
*AMD* - Now has 2 guys who carry a large rock 100 yards.

*Result*: AMD's team is now on par with Nvidia's guy.

That just doesn't make sense from an Nvidia point of view. It's great that AMD's arch works better with DX12, that's great for all of us. But, Nvidia shouldn't LOSE performance just because DX12 is more efficient. That makes no sense. If anything they should GAIN, or at least stay the same.

Even if their current architecture causes DX12 to have to wait for Nvidia's GPU while the AMD guys are passing them, it's still not going to be SLOWER than DX11. Their single guy is still going to carry that rock just as fast. So they don't have 2 guys to carry the rock, so what? But if it's slower or even the same, that would mean DX12 is a complete failure for Nvidia cards, which again, doesn't make sense. It's supposed to be faster for everyone.

I realize this is an extremely simple scenario but hopefully you get the point I'm trying to make.


----------



## Mahigan

Quote:


> Originally Posted by *Kuivamaa*
> 
> So basically Mahigan was spot on?


Oops









Basically Anandtech misled everyone when they stated that Maxwell supported Async compute. That's why Razor1 and I had a hard time on the Hardforums trying to figure out the difference between AMDs ACEs and Maxwell AWSs.

AWSs are not independent. They don't function Asynchronously. This is why I marked them as not having the capability of working "out of order" with error checking. They stall on pipeline dependencies, hence why Oxide disabled the code on nVIDIAs hardware at their request.

I feel vindicated from all the hate mail I've received.


----------



## dogen1

Quote:


> Originally Posted by *Kuivamaa*
> 
> So basically Mahigan was spot on?


Kinda of sort of right.


----------



## semitope

Quote:


> Originally Posted by *Kollock*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.
> 
> Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.


Quote:


> Originally Posted by *Kollock*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


Thanks for stopping by.
*
*
Quote:


> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.


Figured as much. Other games, especially console ports, might use much more. Honestly, I want a cookie for this. Not the browser kind either. Feels good when wild ignorant speculation is seeming to be correct.


----------



## Mahigan

Quote:


> Originally Posted by *Kollock*
> 
> AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.
> 
> Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.


Thank you very much for the clarifications. I look forward to playing your game. It's been a long time since a good RTS title was released. Keep up the good work


----------



## semitope

Quote:


> Originally Posted by *black96ws6*
> 
> Yeah but Nvidia actually LOSES performance in DX12, that makes no sense at all.
> 
> I look at it this way -
> 
> *Serial (DX11)*
> *Nvidia* - Has 1 guy who has to carry a large rock 100 yards. Trains a lot for this.
> *AMD* - Has 1 guy who has to carry a large rock 100 yards. Doesn't train as much as Nvidia's guy.
> 
> *Result*: Nvidia's guy is faster.
> 
> *Parallel (DX12)*
> *Nvidia* - Still has 1 guy who has to carry a large rock 100 yards.
> *AMD* - Now has 2 guys who carry a large rock 100 yards.
> 
> *Result*: AMD's team is now on par with Nvidia's guy.
> 
> That just doesn't make sense from an Nvidia point of view. It's great that AMD's arch works better with DX12, that's great for all of us. But, Nvidia shouldn't LOSE performance just because DX12 is more efficient. That makes no sense. If anything they should GAIN, or at least stay the same.
> 
> Even if their current architecture causes DX12 to have to wait for Nvidia's GPU while the AMD guys are passing them, it's still not going to be SLOWER than DX11. Their single guy is still going to carry that rock just as fast. So they don't have 2 guys to carry the rock, so what? But if it's slower or even the same, that would mean DX12 is a complete failure for Nvidia cards, which again, doesn't make sense. It's supposed to be faster for everyone.
> 
> I realize this is an extremely simple scenario but hopefully you get the point I'm trying to make.


if you think about it in a simplistic way. This is supposed to be more efficient so it should be faster.

A way to look at it could be to think of AMD and Nvidia as building amphibious cars. Previously these cars had to drive on a windy road and nvidia built theirs to take on that road really well. Then it turns out that there is a faster straight path across a body of water and AMD having built their car antipating this put much more effort into being able to take on water. They reach the water and nvidia's super car is slower.

Basically its a different challenge and it may just not be as good at it as the previous. The body of water may be a more direct and faster path IF the vehicle was built right. Maybe nvidias current cards won't be happy at all with dx12 and dx11 would be recommended for them till later on.


----------



## ToTheSun!

Quote:


> Originally Posted by *Mahigan*
> 
> I feel vindicated from all the hate mail I've received.


Keep doing God's work!


----------



## HalGameGuru

Quote:


> Originally Posted by *PostalTwinkie*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> AdaptiveSync is the open standard that is "free". FreeSync is AMD's closed and proprietary implementation with their own closed/private validation process. FreeSync itself is anything but open and free, there are still costs associated to it. AMD just doesn't charge a licensing fee from themselves to display their "FreeSync" logo. They claim "free" because they simply ignore all the costs incurred by the manufacturer to get the "FreeSync Approved" stamp.
> 
> Your statements really don't mean much.


Any monitor manufacturer can make a freesync supporting monitor, if it has adaptive sync and an AMD GPU can make use of it freesync will work. Freesync is merely the AMD implementation of adaptive sync, which they pushed to have included in the VESA spec rather than push for a bespoke piece of hardware. Anyone can make use of Adaptive Sync, FreeSync is merely what adaptive sync is called when an AMD GPU is making use of it. There is no added cost for a monitor manufacturer to put out a monitor that is DP Spec compliant that will work with an AMD GPU under FreeSync and Intel's future with adaptive sync will only push the tech further and make it more ubiquitous and inexpensive.
https://techreport.com/news/28865/intel-plans-to-support-vesa-adaptive-sync-displays#metal

TressFX and Mantle stand on their own. And both have media history on their accessibility and the avenues left open to the other manufacturers as to their implementation.

I'm seeing a lot of inductive reasoning going on taken with quite a lot of acceptance, but, AMD's history of putting out spec's and tech's that the industry as a whole can make use of is the bridge too far?


----------



## Noufel

Quote:


> Originally Posted by *Kollock*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Forceman*
> 
> Wait, so all this analysis and conclusions about how async compute is going to make AMD's architecture the better one in the future, and this benchmark doesn't even use async compute on the Nvidia side?
> 
> 
> 
> AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.
> 
> Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.
Click to expand...

Thanks for posting here on ocn








I have a question that bothers me : why the gains on the furyX aren't the same as the ones on the 390X with dx12 ?


----------



## PostalTwinkie

Quote:


> Originally Posted by *caswow*
> 
> what methodology do devs need to use if they want to make use of nvidias "advanced" architecture? and i mean something really usefull not overtesslation? and what segmentation do you mean? dont you think nvidia will implement more async compute in their next architecture to *boost* perf? because i think i know what direction you are heading...


Well, that is the possible concern. As it appears now, Nvidia isn't invested heavily into this Async Compute situation. We don't know what Pascal will be yet, so we can't say what they are going to do. However, it has been argued that developers will have more control over the actual performance of the game, and less control is given to the actual GPU manufacturer. So if Nvidia decides they want to take another approach to it, whatever those options might be (who knows), we could have two very different philosophies. Maybe even more so than now...

Why that could all be a potential concern is that if AMD goes heavy support Async Compute, and Nvida does XYZ SomethingForUs, you now have developers with two very clear paths. Do they have the funding to support full development and optimization for both of those unique paths? Did Nvidia, with their trucks of cash, flat buy out a developer?

If Nvidia and AMD can't make huge impacts with drivers, what happens if two clear paths emerge and a developer takes just one? This isn't even a path of two different APIs going to war, but different paths within a single API.

It leaves an extreme amount of room for developer bias. If DX12 locks out the GPU manufacturer as much as some claim in terms of performance. We think we see heavy bias now in games, I can't imagine what it would look like if a developer didn't give equal treatment, _and the left out party couldn't make extreme driver improvements on their own_.

Quote:


> Originally Posted by *HalGameGuru*
> 
> Any monitor manufacturer can make a freesync supporting monitor, if it has adaptive sync and an AMD GPU can make use of it freesync will work. Freesync is merely the AMD implementation of adaptive sync, which they pushed to have included in the VESA spec rather than push for a bespoke piece of hardware. Anyone can make use of Adaptive Sync, FreeSync is merely what adaptive sync is called when an AMD GPU is making use of it. There is no added cost for a monitor manufacturer to put out a monitor that is DP Spec compliant that will work with an AMD GPU under FreeSync and Intel's future with adaptive sync will only push the tech further and make it more ubiquitous and inexpensive.
> https://techreport.com/news/28865/intel-plans-to-support-vesa-adaptive-sync-displays#metal
> 
> TressFX and Mantle stand on their own. And both have media history on their accessibility and the avenues left open to the other manufacturers as to their implementation.
> 
> I'm seeing a lot of inductive reasoning going on taken with quite a lot of acceptance, but, AMD's history of putting out spec's and tech's that the industry as a whole can make use of is the bridge too far?


Actually AMD has their own validation requirements and tests they run specific to FreeSync, as Freesync is specific to AMD. What the default DP spec for AdaptiveSync does isn't enough for FreeSync to work as FreeSync is marketed. It requires a hell of a lot of R&D and tuning to get done; Nixeus has commented on this heavily.

So while Intel picking up AdaptiveSync and going with their own VRR will be great, it won't really directly impact FreeSync specifically. As it is an entirely separate product/offering/process.

EDIT:

You have the umbrella of VRR. Under that you have the different offerings.


G-Sync
FreeSync
In-Sync* (As refereed to here on OCN).


----------



## sirroman

Quote:


> Originally Posted by *black96ws6*
> 
> Yeah but Nvidia actually LOSES performance in DX12, that makes no sense at all.
> 
> I look at it this way -
> 
> *Serial (DX11)*
> *Nvidia* - Has 1 guy who has to carry a large rock 100 yards. Trains a lot for this.
> *AMD* - Has 1 guy who has to carry a large rock 100 yards. Doesn't train as much as Nvidia's guy.
> 
> *Result*: Nvidia's guy is faster.
> 
> *Parallel (DX12)*
> *Nvidia* - Still has 1 guy who has to carry a large rock 100 yards.
> *AMD* - Now has 2 guys who carry a large rock 100 yards.
> 
> *Result*: AMD's team is now on par with Nvidia's guy.
> 
> That just doesn't make sense from an Nvidia point of view. It's great that AMD's arch works better with DX12, that's great for all of us. But, Nvidia shouldn't LOSE performance just because DX12 is more efficient. That makes no sense. If anything they should GAIN, or at least stay the same.
> 
> Even if their current architecture causes DX12 to have to wait for Nvidia's GPU while the AMD guys are passing them, it's still not going to be SLOWER than DX11. Their single guy is still going to carry that rock just as fast. So they don't have 2 guys to carry the rock, so what? But if it's slower or even the same, that would mean DX12 is a complete failure for Nvidia cards, which again, doesn't make sense. It's supposed to be faster for everyone.
> 
> I realize this is an extremely simple scenario but hopefully you get the point I'm trying to make.


Nvidia driver replaces shaders built to work at parallel with serial shaders.


----------



## HalGameGuru

Quote:


> To take advantage of the benefits of AMD FreeSync™ technology, users will require: a monitor compatible with DisplayPort Adaptive-Sync, a compatible AMD Radeon™ GPU with a DisplayPort connection, and a compatible AMD Catalyst™ graphics driver.
> 
> - Project FreeSync will utilize DisplayPort Adaptive-Sync protocols to enable dynamic refresh rates for video playback, gaming and power-saving scenarios.


According to AMD FreeSync will work anywhere AdaptiveSync is available and to spec. Certification not withstanding.

All the sources I have read claim parity. Aside from certification, which is not necessary for the VRR tech to work. FreeSync uses what is in AdaptiveSync, if its standards compliant FreeSync will work with it. And anyone else can do the same.


----------



## provost

Quote:


> Originally Posted by *CrazyElf*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> As the other poster has indicated.
> 
> The issue is that Nvidia had access to the DX12 code for over a year.
> http://www.oxidegames.com/2015/08/16/the-birth-of-a-new-api/
> This would suggest to me that Nvidia knew what was coming and there hasn't be excessive favoritism here and Nvidia even had the opportunity to contribute to improve performance for their hardware.
> 
> What Mahigan is saying is that historically, Nvidia has relied heavily on driver based optimizations. That has paid handsome dividends for DX11 performance. However the way they have designed their architecture - serial heavy, means that it will not do as well on DX12, where it more parallel intensive.
> 
> The other of course is that there is a close relationship between Mantle, compared with DX12 and Vulkan. AMD must have planned this together and built their architecture around that, even sacrificing DX11 performance (less money spent on DX11 drivers). In other words, if Mahigan's hypothesis is right, they played the long game.
> Same here. I would like a bigger sample size to draw a definitive conclusion. See my response to Provost below for my full thoughts - I think that Mahigan's hypothesis is probable, but there are some mysteries.
> The full review on TPU
> https://www.techpowerup.com/reviews/NVIDIA/GTX_980_PCI-Express_Scaling/
> 
> I suppose there's process of elimination. What is the Bulldozer/Steamroller architecture very weak at? Well there's raw single threaded performance and the module design isn't good at floating point, but there's got to be something specific.
> 
> The question is, what communicates between the GPU and CPU? That may be a good place to start. Another may be, what has Intel done decisively better?
> +Rep
> 
> This is basically where we are at:
> 
> We know that something is causing the DX12 leap in AMD's arch. We don't know what, but Mahigan's hypothesis is the design of AMD's architecture, which they optimized around for DX12, perhaps at the expense of DX11.
> At the moment, AMD is at a drawback and needs that market/mind-share. Combined with GCN consoles, they may have narrowed the gap in their ability to drive future games development.
> The opportunity for driver based optimizations is far more limited in DX12, due to it's "close to metal" nature.
> Nvidia can and will catch up. They have the money and mindshare to do so. The question is when? Pascal? Or is it very compute centric, in which case they may go with Volta.
> I would agree that there hasn't been any well researched, well thought out alternative hypothesis. That is not to say that Mahigan's ideas are infallible - they are not, as we still do not have a conclusive explanation as to why the Fury X does not scale very well (and apparently a second mystery now - the AMD CPU's poor performance). Left unresolved, that may require a substantial modification to any hypothesis. Personally I accept that it's the most probable explanation right now.
> 
> I think that in the short term, this may help stem the tide for AMD, perhaps a generation or maybe two. But in the long run, they still are at a drawback. They have been cutting R&D money for GPUs and focusing mostly on Zen for example. AMD simply does not have the kind of money to spend. Nvidia is outspending them. In the long run, I fear there will be a reversal if they cannot come up with something competitive.
> 
> For AMD though, it's very important they figure out what is the problem, because they need to know where the transistor budget should go for the next generation (although admittedly, if the rumors are true, it's already taped out - it's important to keep in mind that GPUs are designed years in advance).
> 
> 
> 
> Remember everyone - it's best to keep 2 GPU vendors that are very competitive with each other. That's when the consumer wins. We want the best performance for a competitive price. For that reason, I'm hoping that AMD actually wins the next GPU round - and that Zen is a success (IMO, Intel monopoly is also bad for us). A monopoly is a lose for us.


A lot of good posts and a ton of information in this thread, even since the last time I checked in . Great contribution to the community.

You raise a good point about Intel. We have been taking about two players in this game while ignoring the one player who is truly the 800 pound gorilla in the desktop pc market.
The gains from Dx12 by lowering the CPU head is really a zero-sum game; benefit for Nvidia and AMD comes at the expense of Intel, doesn't it... Lol

Unless Intel doesn't care about the consumer pc market at all, one would think that there ought to be a reaction from Intel to defend its turf. And, whether that action (or lack thereof) is defensive or offensive , would tell us a lot about how the pc gaming industry evolves over the next few years.

Quote:


> Originally Posted by *PhantomTaco*
> 
> I hate to double post but I wanted to thank you for posting this. It is probably the single most illuminating thing I've read on the subject to date. For the record I don't believe a single engine is ever a good measure of anything. Yes UE4 and it's prior iterations are among the most popularly used engines, but like you said they definitely have had a bias, and I wouldn't call a single benchmark from a single engine alone enough to draw any conclusions, regardless of whose engine it is. Your agreement with AMD is what keeps me questioning, and like I said in the last post I put up, I don't mark it as a sign of foul play or otherwise, merely something that draws questions. What I'd honestly love is a post from an NVIDIA representative to talk more from their perspective on these performance numbers, as well as more input from them as other DX12 enabled games launch (such as Ark) to better explain the choices they made at a hardware level and what their impact is.


Don't you think if Nvidia had a substantive counter argument to make it would have already put forward such an argument by getting out in front of this debate, especially given its marketing savvy and depth of the PR machine?

The fact that has not happened in itself lends credence to mahigan's arguments, even if it's at a basic intuitive level.


----------



## Remij

Quote:


> Originally Posted by *provost*
> 
> Don't you think if Nvidia had a substantive counter argument to make it would have already put forward such an argument by getting out in front of this debate, especially given its marketing savvy and depth of the PR machine?
> 
> The fact that has not happened in itself lends credence to mahigan's arguments, even if it's at a basic intuitive level.


No. It's too early. People aren't running out and buying AMD gpus at breakneck speeds based off this one benchmark. The people who are claiming this early victory for AMD are more than likely AMD fanboys. I remember well when Mantle was gonna destroy Nvidia in games that supported both Mantle and DX11, and we saw how that turned out. Nvidia's already said to expect the same thing that happened with DX11 to happen with DX12.

But I'm sure it will come full circle. In the near future once DX12 is out and Nvidia is ahead again, people will cite all the technical reasons why it shouldn't be so and claim Nvidia sabotages their competitors performance with proprietary features/code and their stranglehold on the market..

It would be cool to see AMD smash the hell out of Nvidia and show them they aren't invincible, but even these early tests aren't painting that picture, so I wouldn't expect it, but would rather be pleasantly surprised if it does happen.


----------



## Mahigan

I think that I may have opened a big can of worms for nVIDIA.
Quote:


> ARK DirectX 12 Delay
> 
> Hello Survivors,
> 
> It's been a long week here at Studio Wildcard as the programming team has been grinding to get the DX12 version ready for release. It runs, it looks good, but unfortunately we came across some driver issues that we can't entirely tackle ourselves . We've reached out to both NVIDIA and AMD and will be working with them to get it resolved as soon as possible! Once that's tackled, we'll be needing to do more solid testing across a range of hardware with the new fixes. Sadly, we're gonna have to delay its release until some day next week in order to be satisfied with it. It's disappointing to us too and we're sorry for the delay, really thought we'd have it nailed today but we wouldn't want to release ARK DX12 without the care it still needs at this point. Hang in there, and when it's ready for public consumption, it should be worth the wait!


I think that nVIDIA does support Async shading, on paper, but if your architectures take a hit from using the feature, relative to DX11 performance, then you don't really support the feature. If you don't support Async shading then you don't support one of the most widely praised features of DX12.

Instead, they have to work with developers in order to circumvent using Async Shading. That's rather curious.


----------



## CrazyElf

Quote:


> Originally Posted by *PhantomTaco*
> 
> Thank you for a very illuminating post. It's better than the other responses I got that threw something out there with little background info or little logic/support, rep for that alone. Having access for the past year like I said in my post is great, but AMD has been working with Oxide for far longer than that. We know this, and this isn't new information by any means. Let me preface what I'm going to say next with this is all pure conjecture and speculation, but it's one reason why I still have my doubts. Let me also preface with I am not remotely an expert in APIs, merely following logic. When you design something, anything really, you have a groundwork. You have a basis upon which you build everything else upon. If you started this groundwork in collaboration with someone, there is every reason to believe that both worked together to set up and establish this framework; how it runs at a very basic level, what is chosen to execute which types of commands, how they are executed, etc. Now if you are doing so with someone who ALSO happens to be one of the people that will be making use of this going forward, profiting off of it (indirectly) and working as a company to make it as good as possible for themselves, there is going to be opportunity. Opportunity to really show off what your product is capable, opportunity to meld the way things are done going forward. There's also opportunity to potentially make decisions that will SPECIFICALLY benefit YOUR PRODUCTS at a fundamental level. Am I saying it was done with ill intention? Not remotely, you want it to work well, and you want people that buy your product to feel like they made the right decision in choosing you. No harm, no foul on that whatsoever. There is also a darker side to it, potentially. You could also ACTIVELY choose to lay in a framework that INTENTIONALLY benefits your products OVER your competitor. And if it's something that's been laid in at the onset, before anyone else had access to it, by the time it's opened up to others, it may be something that can no longer be changed without tearing down the entire foundation. Am I saying this is the case? No, not at all. Am I saying it is possible? Maybe. Intentional? I don't know, and it doesn't serve to prove anything remotely important had it been intentional or not. But this illustrates my point, because AMD has been tied in with Oxide since at least 2013 according to a Google News search. That means (in theory) they've had at minimum a year's worth of time working with Oxide before NVIDIA or Intel got access to their new engine.


+Rep to you for your thoughts. Neither side really has a dominant advantage right now.

I'll repost this from earlier:

AMD Advantages

More parallelism
Their cards do seem to age better
Crossfire also scales better
They are ahead on HBM and I suspect that next generation, they will probably have a better memory controller than Nvidia
Nvidia Advantages

Tessellation and complex geometry
Better optimization of memory bandwidth for color compression
Rasterization
Net it works out to more power efficiency on DX11 and more OC headroom, although I will note that Nvidia cards don't scale linearly with overclocks (AMD ones generally do)

Overall I'd give the upper hand to Nvidia. They've got money, market share, mind share, and the influence in the gaming industry. Even though AMD now has the console market, and DX12, they still have huge problems.

*AMD Weakpoints*
Let me explain in depth.

1. The Rasterization (from Techreport):


The AMD Fury X has a worse throughput than a 780Ti! (I will note that the Fury X can support more draw calls, but this is still a serious problem). It didn't improve between the 290X and Fury X.

The two architectures compared (290X vs Fury X):


Spoiler: Warning: Spoiler!



 



2. Nvidia is currently more efficient at managing what VRAM (and of course the bandwidth) is currently available. https://techreport.com/blog/28800/how-much-video-memory-is-enough

This may be one of the reasons why the massive bandwidth of HBM did not prove to be the advantage that it should have been on paper. For future generations of GPUs, AMD will be at a drawback unless it can develop as good compression algorithms or a substantially better memory controller because Nvidia will be better at managing what bandwidth it does have - and next generation, Nvidia will get HBM2.

3. Tessellation. Again refer to the chart. Back when the 5870 launched, AMD introduced this, but Nvidia has since overtaken them in a massive way. This can be useful for complex geometry. It's not a huge advantage, but it still exists.

They can exploit it too.
https://techreport.com/review/21404/crysis-2-tessellation-too-much-of-a-good-thing

The Gameworks (and Hairworks) caused considerably controversy for this reason. They basically added something of no value to gamers in an attempt at vendor lock in and to make AMD look bad.

4. Drivers (again AMD's drivers were not well optimized at all). Nvidia has historically relied on this more heavily than AMD. This will not be much of advantage now with DX12 (closer to the metal). It did come at the expense of CPU, but of course, most games are simply not CPU bound.

Anyways, net for Nvidia at DX11 performance, they were able to make a more powerful at overall performance, power efficient, and with Maxwell, better performance per mm^2 than AMD. 1 and 2 are big problems for AMD.

Quote:


> Originally Posted by *Kollock*
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.


I'd like to thank you for taking the time to respond on behalf of the entire OCN community.









For what it's worth, I am very much looking forward to your RTS. I think that RTS development has stagnated. We haven't seen anything that truly makes me go "wow". Supreme Commander 1 modded was the closest I got to seeing the envelope pushed.

On the 4X front, I am hoping that Galactic Civilizations 3 (yeah I know, that's Stardock's flagship baby and you probably don't have as much to do with it), I am hoping that in the coming years, they'll beef it up with some solid expansions. There are a few other games too that I have high hopes for (too long to list here).

Anyways, good luck with your game.

Quote:


> Originally Posted by *Forceman*
> 
> And yet even in this benchmark, which would appear to be well-suited to AMD's hardware, the Fury X is still neck and neck with the 980 Ti. The real issue the benchmark highlights, to me anyway, is not that AMD's DX12 performance is so good, but that their DX11 performance (particularly in this example) is so bad. I don't see how you can look at this one benchmark and draw the conclusion that AMD has a better architecture for the future, when their premier card is still tied with Nvidia's (likewise for the 980 and 390X).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Edit: And if anyone is curious, here are the Gen 3/Gen 2 results for the 3DMark API test on a 290X: (Gen 2 / Gen 3)
> 
> DX11 ST: 1,298,473 / 1,265,204
> DX11 MT: 1,324,209 / 1,311,353
> Mantle: 18,476,350 / 18,917,266
> DX12: 17,774,164 / 20,542,867
> 
> So Mantle and DX12 are faster, but somehow DX11 is slower in Gen 3.
> 
> www.3dmark.com/3dm/8388481
> www.3dmark.com/3dm/8388578


If you've been reading my and Mahigan's posts closely (which judging by this response, you've just skimmed over without looking more closely), we've been having a very lively conversation as to why the Fury X is failing to scale up very well.

Anyways, to repeat myself - neither the Fury X nor the 980Ti is a good value for money purchase. The 980Ti for reasons discussed (not parallel, plus it may be that driver optimizations for DX12 will be much more limited) and the Fury X because it simply does not scale very well. The Fury X is clearly bottlenecked somewhere. We are debating where, but for the context of what to buy, it doesn't matter. It's not a future proof investment. Then there's other problems like the 4GB of VRAM.

So far, the consensus is to either buy a 290 or 290X used or the get the 390/390X. It has been suggested that the 8Gb of VRAM might be of use later. In that case, the Fury X will not age well at all. Sell the 290X (or whatevever you bought) when the 16 nm GPUs come out.

To reiterate, the two generations of Nvidia GPUs might be the biggest revolution since the 8800GTX.

First there is the fact that there is an entire node, which means a much larger transistor budget, even if it is a 300mm^2 chip
HBM2 is coming to Nvidia, which considering its superior compression algorithms
We could see gains comparable to from when the SMX (Kepler) scaled to the SMM (Maxwell); main gains were from the larger cache, the crossbar architectural changes, and the more efficient usage of transistor switching
Lesser importance, but introduction of NVLink may drastically improve SLI scaling

For AMD

Likewise, a bigger transistor budget is expected
HBM2 will improve the GPU bandwidth
GCN is expected to undergo a major revision - this has HUGE implications. The 7970 being GCN saw pretty good gains because it shared the same architecture as the 290X and Fury X. It's why it aged better than the 680 and the 290X too aged very well too. Depending on what happens, the bulk of the optimizations might be towards GCN 2.0. Kind of like how Kepler got "left behind" in drivers when Maxwell came out, I suspect this may happen to GCN 1.X. Certainly, the VLIW architecture stopped getting optimized after GCN came out.
AMD will unquestionably try to address the bottleneck that is hampering Fury X with the extra transistors as well

Either way, this makes buying a Fury X or a 980 Ti a bad future proof investment.

I suppose you could always sell the 980Ti when DX12 gets bigger and take a few hundred dollars loss. It's hard to say whether the card will depreciate faster due to these developments or not. It's a bit like getting a DX9 GPU like the 7900 GTX right before you know the 8800 GTX is about to come out.

Quote:


> Originally Posted by *Mahigan*
> 
> Oops
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Basically Anandtech misled everyone when they stated that Maxwell supported Async compute. That's why Razor1 and I had a hard time on the Hardforums trying to figure out the difference between AMDs ACEs and Maxwell AWSs.
> 
> AWSs are not independent. They don't function Asynchronously. This is why I marked them as not having the capability of working "out of order" with error checking. They stall on pipeline dependencies, hence why Oxide disabled the code on nVIDIAs hardware at their request.
> 
> I feel vindicated from all the hate mail I've received.


So far the evidence is supporting your hypothesis more and more.

We still need to determine what is causing the Fury X bottleneck and come up with definitive evidence. I am hoping that future DX12 titles will provide a bigger sample size to further prove (or disprove) your hypothesis and offer an opportunity to identify the real bottleneck.

Finally, there is the matter of AMD CPUs. The bottleneck there remains unknown. For now, I hate to say this, but for serious gaming, AMD CPUs right now do not offer much. This is especially serious in CPU bound games. A Haswell CPU is about 60% faster clock for clock (70% now with Skylake). Let's hope that Zen succeeds. Intel bottleneck is bad for us.

Quote:


> Originally Posted by *provost*
> 
> A lot of good posts and a ton of information in this thread, even since the last time I checked in . Great contribution to the community.
> 
> You raise a good point about Intel. We have been taking about two players in this game while ignoring the one player who is truly the 800 pound gorilla in the desktop pc market.
> The gains from Dx12 by lowering the CPU head is really a zero-sum game; benefit for Nvidia and AMD comes at the expense of Intel, doesn't it... Lol
> 
> Unless Intel doesn't care about the consumer pc market at all, one would think that there ought to be a reaction from Intel to defend its turf. And, whether that action (or lack thereof) is defensive or offensive , would tell us a lot about how the pc gaming industry evolves over the next few years.


You raise an interesting point. IF you think about it, Intel has been giving enthusiasts the shaft for a long time.

Overclocking used to be about getting more performance from a cheaper CPU. Now with everything locked save the $200 USD + and $ 300 USD+ CPUs (6600k and 6700K), along with only the expensive HEDT CPUs being unlocked (but server stuff being locked), we really are locked out. I guess you could consider the G3258 a bone of sorts, but it lacks AVX and there are a ton of benchmarks that reflect that.

But yeah, they really haven't done much for consumers. We're such a small part of the total market that for a monopolist, I guess it's irrational. They are profit maximizing at our expense. I think that now with the PC market in decline, they are slowly changing (ex: they did make Bclk overclocking much better in Skylake and aggressively binned), but it's still not what we enthusiasts want.

It actually may get better for AMD in one regard - if DX12 can make multicore better, then guess what? We'll see the Steamroller/Bulldozer get better relative to now (they are terrible at single core, but at "embarrassingly parallel" operations they aren't that terrible save for their high power consumption). Even better, if Zen delivers and they can offer more cores at the same price, but at a worse single threaded performance (remember, Skylake is 70% ahead, so if they deliver a 40% faster CPU, there's still a big advantage for Intel), the weaker single threaded performance may not be a big penalty.

Again, it all comes down to DX12 implementation, developers, and how well AMD does with Zen (fingers crossed on this one).


----------



## PostalTwinkie

Quote:


> Originally Posted by *HalGameGuru*
> 
> According to AMD FreeSync will work anywhere AdaptiveSync is available and to spec. Certification not withstanding.
> 
> All the sources I have read claim parity. Aside from certification, which is not necessary for the VRR tech to work. FreeSync uses what is in AdaptiveSync, if its standards compliant FreeSync will work with it. And anyone else can do the same.


Yes, and when you use just AdaptiveSync as a basis, and ignore the rest....

You get overshoot, ghosting, and minimal VRR windows like 45 Hz to 75 Hz. In order to actually get a playable, working product, as they heavily marketed as _"Free is better than G"_, great amounts of time and money must be spent. Again, look at the posts by Nixeus in relation to being the first to market with a true 30 Hz to 144 Hz FreeSync display.

Also, AMD just recently added new validation requirements to their product that they are keeping locked away. As Nixeus said they couldn't comment on what the new requirements where when asked by @Mand12


----------



## HalGameGuru

That's an issue for those ranges in adaptive sync in general. And their new panel is unique as well, I've personally never seen a panel set up like that. And that is something stemming from manufacture. Those panels are odd.

Adaptive sync, in and of itself, can offer and support the whole gamut, it is not an AMD/Manufacturer impasse that new tech has growing pains. When monitors comfortably support adaptive sync down to 30, or lower, and as up to 144, or higher, it will benefit freesync users and other adaptive sync utilizers like Intel. The spec dictates the commands and processes, but it does not dictate the hardware, panels, scalars, firmwares, etc will all have an effect, and its not an issue with adaptive sync or FreeSync. Some people are willing to accept TN some demand IPS, some want 1440 or better, some want 1080. And the manufacturers are going to decide how far to push individual SKUs in AS Support.

Those issues stemming from Adaptive Sync effect all AS users, not just FS. As Adaptive Sync matures generic AS use and FS use will improve.


----------



## PostalTwinkie

Quote:


> Originally Posted by *HalGameGuru*
> 
> That's an issue for those ranges in adaptive sync in general. And their new panel is unique as well, I've personally never seen a panel set up like that. And that is something stemming from manufacture. Those panels are odd.
> 
> Adaptive sync, in and of itself, can offer and support the whole gamut, it is not an AMD/Manufacturer impasse that new tech has growing pains. When monitors comfortably support adaptive sync down to 30, or lower, and as up to 144, or higher, it will benefit freesync users and other adaptive sync utilizers like Intel. The spec dictates the commands and processes, but it does not dictate the hardware, panels, scalars, firmwares, etc will all have an effect, and its not an issue with adaptive sync or FreeSync. Some people are willing to accept TN some demand IPS, some want 1440 or better, some want 1080. And the manufacturers are going to decide how far to push individual SKUs in AS Support.
> 
> Those issues stemming from Adaptive Sync effect all AS users, not just FS. As Adaptive Sync matures generic AS use and FS use will improve.


I am not talking about resolution or panel technology. I am talking about the defects inherent in just using the base AdaptiveSync standard only, Ghosting, Overshoot, and narrow VRR windows. Which are all well documented, and currently an issue on panels. The exception are the latest release(s) and those that had their firmware updated.

In other words, out of the box AdaptiveSync won't properly work as VRR as marketed.


----------



## HalGameGuru

And that's not an issue for AS as a tech on into the future, that is early iterative issues. They are already being ameliorated. Those all can and will be negotiated. VR took iterations, Stereoscopic 3D took iterations, VRR will take iterations. Even nVidia had to iteratively improve GSync (unto redesigning the chip from the ground up). You are looking at young tech and discounting whether it can mature.


----------



## Slaughterem

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I am not talking about resolution or panel technology. I am talking about the defects inherent in just using the base AdaptiveSync standard only, Ghosting, Overshoot, and narrow VRR windows. Which are all well documented, and *currently* an issue on panels. The exception are the *latest releases and those that had their firmware updated.
> 
> *In other words, out of the box AdaptiveSync won't properly work as VRR.


How can it be currently an issue if it has been firmware updated and latest releases have resolved the issues?


----------



## Mahigan

Quote:


> Originally Posted by *Kollock*
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


@Kollock Thank you for taking the time to share your thoughts with us. It is quite rare to see a developer being so open and transparent with the gaming community but it is definitely well received by the community as a whole.

I was wondering if you could help us with something. Would you have the ability to run the game through a profiler and post the results? I understand that this might be asking a lot but it would shed some light on several fronts which could help us all make the correct choices when buying a Graphics card going forward. A few Screenshot should be enough, with the pertinent shaders making use of the async code. It would be nice to see it on both an AMD and nV test system, but either one is enough for now preferably, nV (if possible a GTX 980 Ti)?

Once again, kuddos for all of the information you have shared with us thus far









Thank You


----------



## gamervivek

AMD GPUs also have delta compression and Fury X does better than 980Ti with fewer ROPS and clockspeed. Theoretical specifications don't mean much.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Slaughterem*
> 
> How can it be currently an issue if it has been firmware updated and latest releases have resolved the issues?


Because there are those with the issue still (currently), but the latest product(s) have it resolved. So if you are one of many that purchased FreeSync at intial release, but haven't had a firmware update, you have the issue. Most firmware updates require the display to be sent back in.

So, not only does it still currently impact people, it has also been resolved on the newer display(s).


----------



## p4inkill3r

Many thanks to @Kollock for stopping by.
I've updated the OP to include your response. 90k views of this thread means that there are many people interested in your work!


----------



## jologskyblues

Buy AMD Radeon

/thread


----------



## Mahigan

Quote:


> Originally Posted by *jologskyblues*
> 
> Buy AMD Radeon
> 
> /thread


LOL

Not that simple. GameWorks titles will continue to favor nVIDIA.


----------



## diggiddi

Quote:


> Originally Posted by *jologskyblues*
> 
> Buy AMD Radeon
> 
> /thread


LOL


----------



## Sin0822

Quote:


> Originally Posted by *Kollock*
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD
> 
> 
> 
> 
> 
> 
> 
> As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic
> 
> 
> 
> 
> 
> 
> 
> ).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it,
> 
> 
> 
> 
> 
> 
> 
> )
> 
> --
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.


Where is multiadapter? We were given early access to test out the benchmark, and I forwarded it to our GPU editor. I couldn't write about the benchmark unless it had multi-adapter technology (I am the CPU/chipset/motherboard/OC editor, not GPU), and I thought that most if not all DX12 titles would have it. It would make for a great article regarding integrated CPU graphics. I have been asking about it for a while (from AMD, Intel, and NV), I think it's referred to as explicit multi-GPU rendering. Do you have any idea when your demo will have this feature? The reason I ask specifically for AOS, is because AMD uses AOS as their demo for multi-adapter in their presentation decks........

Were you guys asked by the big GPU companies not to integrate it? I have a feeling that there is a push towards killing the technology before its released if not only to ensure dGPU sales don't decline. However, I would think that if the iGPU and dGPU could work together in mutually beneficial way (but not like Virtu MVP where the iGPu was only used as an output buffer) it would revolutionize the use and sale of integrated GPUs as a better gaming product.


----------



## Digidi

Quote:


> Originally Posted by *CrazyElf*
> 
> +Rep to you for your thoughts. Neither side really has a dominant advantage right now.
> 
> I'll repost this from earlier:
> 
> AMD Advantages
> 
> More parallelism
> Their cards do seem to age better
> Crossfire also scales better
> They are ahead on HBM and I suspect that next generation, they will probably have a better memory controller than Nvidia
> Nvidia Advantages
> 
> Tessellation and complex geometry
> Better optimization of memory bandwidth for color compression
> Rasterization
> Net it works out to more power efficiency on DX11 and more OC headroom, although I will note that Nvidia cards don't scale linearly with overclocks (AMD ones generally do)
> 
> Overall I'd give the upper hand to Nvidia. They've got money, market share, mind share, and the influence in the gaming industry. Even though AMD now has the console market, and DX12, they still have huge problems.
> 
> *AMD Weakpoints*
> Let me explain in depth.
> 
> 1. The Rasterization (from Techreport):
> 
> 
> The AMD Fury X has a worse throughput than a 780Ti! (I will note that the Fury X can support more draw calls, but this is still a serious problem). It didn't improve between the 290X and Fury X.
> 
> The two architectures compared (290X vs Fury X):
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> ).


Maybe they didn't Change the rasterizer because it's more than enough power?

If you see the DX12 Drawcalltest in 3dmark, its also a Polygonoutput test because each drawcall existst of 112-127 Polygons. That means that AMD can put out more Polygons through its rasterizer than nvidia. So rasterizer is no bottleneck under dx12 for amd.

Having a lot rasterizer is one Thing, the other Thing is to have them feeded from the command porcessor. At this Point Nvidia is very week at high drawcalls. And its a hardwarelimit. No big changes can be made by Driver update.


----------



## GorillaSceptre

If the 980 Ti isn't using Async then why isn't the Fury X slaughtering it? Where is Fury X's bottleneck?

Hopefully Oxide respond to Mahigan.


----------



## Noufel

Quote:


> Originally Posted by *GorillaSceptre*
> 
> If the 980 Ti isn't using Async then why isn't the Fury X slaughtering it? Where is Fury X's bottleneck?
> 
> Hopefully Oxide respond to Mahigan.


That's the question that bothers me, and why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus knowing that dx12 will benefit from ACE ?
i think that oxide have programmed their game especialy for the gcn 1.1 arch and they haven't the time to optimize it for the fiji gpus that's my toughts and they are not facts


----------



## garwynn

@Kollock,

AFAIK Ark DX12 was delayed slightly.
There *is* one other way to test this right now but you won't see public results until it gets out of closed beta - and that's Fable Legends.
I'm actually wishing I had a NV card so I could gather data on it.


----------



## garwynn

Quote:


> Originally Posted by *Noufel*
> 
> That's the question that bothers me, why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus ???


Best guess? The combination of HBM and the new GPU design was enough of a risk, would honestly expect them to kick it into high gear next year.
A relative example was when Samsung had issues with its Exynos Octa design - they went to Qualcomm while they fixed it, then switched back to their own SoC.

Appreciate everyone's input on this, between this and much more research it really does seem like the competition between the two is about to get tight again.
Buckle your seat belts, this could be a good ride.


----------



## Noufel

Quote:


> Originally Posted by *garwynn*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> That's the question that bothers me, why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus ???
> 
> 
> 
> Best guess? The combination of HBM and the new GPU design was enough of a risk, would honestly expect them to kick it into high gear next year.
> A relative example was when Samsung had issues with its Exynos Octa design - they went to Qualcomm while they fixed it, then switched back to their own SoC.
> 
> Appreciate everyone's input on this, between this and much more research it really does seem like the competition between the two is about to get tight again.
> Buckle your seat belts, this could be a good ride.
Click to expand...

AMD again in the competition will be excelent for the consumers








I hope that will bring prices down again and we wont see middle class gpus for 500$ anymore


----------



## HalGameGuru

I still think the Fiji architecture is meant more for LiquidVR than straight gaming. All the shaders, HBM, small form factor, etc. I think it may be a way to jumpstart the VR market sphere and try to give AMD the majority market share, there have been estimates of as many as 14 million VR devices being sold in 2016, many in the professional, commercial, or industrial spheres. I'm sure AMD would love to have a product on deck that provides an industry capable device to feed those VR sets.


----------



## Noufel

Quote:


> Originally Posted by *HalGameGuru*
> 
> I still think the Fiji architecture is meant more for LiquidVR than straight gaming. All the shaders, HBM, small form factor, etc. I think it may be a way to jumpstart the VR market sphere and try to give AMD the majority market share, there have been estimates of as many as 14 million VR devices being sold in 2016, many in the professional, commercial, or industrial spheres. I'm sure AMD would love to have a product on deck that provides an industry capable device to feed those VR sets.


That could be a goog reason but can AMD feed all this markets with enough fiji gpus knowing the slow production of fury/X .


----------



## HalGameGuru

That's slow RIGHT NOW, the VR sales boom is supposedly hitting us next year, you have to prove your product early, then ramp up production when its stable and cheap. Prove now, ramp up next half or quarter.

Would that work right NOW? No. But I think that's why they pushed it out when they did, with HBM1 as they did, incrementally over current gen GCN like they did, etc. It gives them a cushion of time to work out kinks and prove the product before the high margin and super conservative professional and industrial market dips its toes in VR.


----------



## Xuper

Yeah AMD LiquidVR are far better than Nvidia.AMD Latency ~ 11ms , Nvidia Latency ~ 25 to 27ms.


----------



## Cyro999

Quote:


> Originally Posted by *Xuper*
> 
> Yeah AMD LiquidVR are far better than Nvidia.AMD Latency ~ 11ms , Nvidia Latency ~ 25 to 27ms.


Source on that? 25-27ms sounds too high, yet 11ms impossibly low.


----------



## criminala

Some more test of Ashes benchmark for you guys to compare with .

My hardware :
AMD HD7970 GHz edition PCIe 2.0 x16
4790K 4600Mhz , Uncore 4400 Mhz
16GB DDR3 2200 Mhz Cas 9
Windows 10
Latest AMD Drivers 15.20.1062.1004-150803a-184226E

DX11 2560x1440 (Full screen)


DX12 2560x1440 (Full screen)


DX11 1904x1041 (windowed)


DX12 1904x1041 (windowed)


Note :
When playing the DX12 singleplayer game , I notice my videocard often drops core frequency way below the standard 1000mhz (1050 Mhz boost) . Like 600-700 Mhz . This happens a lot more in a singleplayer game , and only occasionally happens during the benchmark .
In DX11 the core never throttles and stays at a constant speed of 1000Mhz ..
Probably due to the heavy strain on the GPU in DX12 (even though I have power draw overclock set to +15%) .
Could also be because it's alpha of course .

Another interesting thing is that I was running the GPU on PCIe 2.0 . I enabled PCIe 3.0 now in the bios and the benchmarks gained quite a bit from this . Also the GPU now runs at 1050mhz boost trough the test , whereas before it would run at 1000 and throttle sometimes .

See :

DX12 2560x1440 (full screen) @ PCIe 3.0 :


----------



## Cyro999

Quote:


> When playing the singleplayer game , I notice my videocard often drops core frequency way below the standard 1000mhz (1050 Mhz boost) . Like 600-700 Mhz .
> Probably due to the heavy strain on the GPU


That usually happens because of lack of strain on the GPU, not heavy strain. It's easy in a lot of games (especially rts/mmo) to be CPU bound to the point where you don't have enough FPS to challenge your GPU


----------



## Xuper

Quote:


> Originally Posted by *Cyro999*
> 
> Source on that? 25-27ms sounds too high, yet 11ms impossibly low.


http://forums.anandtech.com/showpost.php?p=37649656&postcount=246


__
https://www.reddit.com/r/3i6dks/maxwell_cant_do_vr_well_an_issue_of_latency/


----------



## criminala

Quote:


> Originally Posted by *Cyro999*
> 
> That usually happens because of lack of strain on the GPU, not heavy strain. It's easy in a lot of games (especially rts/mmo) to be CPU bound to the point where you don't have enough FPS to challenge your GPU


-When the GPU load is 100% during the DX12 benchmark .
-When the charts say I'm 100% GPU bound (see the DX12 2560x1440 bench)
-When in DX11 the load is not 100% but only 70-80% , core speed then is constant 1000 Mhz
--> I believe the drop in GPU core speed really is not because of lack of strain in this case


----------



## mtcn77

Quote:


> Originally Posted by *criminala*
> 
> -When the GPU load is 100% during the DX12 benchmark .
> -When the charts say I'm 100% GPU bound (see the DX12 2560x1440 bench)
> -When in DX11 the load is not 100% but only 70-80% , core speed then is constant 1000 Mhz
> --> I believe the drop in GPU core speed really is not because of lack of strain in this case


You could undervolt for double the power efficiency benefit underclocking would bring, as the heat output formula is voltage _squared_ times transistor count and frequency.


----------



## criminala

I'm not looking to "solve" the downclocking in any way . Just wanted to put the information out there for other people so they know it happens .
Btw : My core is undervolted already and is indeed a good way to keep power consumption down .


----------



## mtcn77

Quote:


> Originally Posted by *criminala*
> 
> I'm not looking to "solve" the downclocking in any way . Just wanted to put the information out there for other people so they know it happens .
> Btw : My core is undervolted already and is indeed a good way to keep power consumption down .


I meant that it would help to verify whether it continues doing that at a lower voltage level. That would probably rule out a power budget constraint as those types of observations have been reported in previous generations(including GCN).


----------



## Cyro999

Quote:


> Originally Posted by *Xuper*
> 
> http://forums.anandtech.com/showpost.php?p=37649656&postcount=246
> 
> 
> __
> https://www.reddit.com/r/3i6dks/maxwell_cant_do_vr_well_an_issue_of_latency/


We've already measured latency less than 25ms on the nvidia side without the OS tweaks (they say they reduce by 10ms) or async warp being used at all, though that's during 144hz gameplay. You'd lose 3-4ms going to 90hz

and 11ms is extremely low. It's possible, but it's not possible on 90hz oculus. The screen refresh ALONE is 90hz, which means a 11.11ms delay between screen refreshes even if you have an infinitely fast CPU, GPU and OS/driver.

I'd expect both parties to be around the 15-20ms range on oculus, less seems unlikely but more is possible (and would be disappointing, going to a head tracking display and GAINING latency vs 144hz desktop screen)

this is irrelevant for ashes of the singularity benchmark, to be honest at this point i think there should be another thread made for all of the gcn async compute stuff but VR is a new level of off-topic.


----------



## Txeptsyip

Quote:


> Originally Posted by *criminala*
> 
> -When the GPU load is 100% during the DX12 benchmark .
> -When the charts say I'm 100% GPU bound (see the DX12 2560x1440 bench)
> -When in DX11 the load is not 100% but only 70-80% , core speed then is constant 1000 Mhz
> --> I believe the drop in GPU core speed really is not because of lack of strain in this case


AMD cards tend to underclock depending on load

this keeps temps and power usage down

it does this more if you have vsync on

anything you may use to monitor GPU load will say its at 100% regardless of current clock speed because all the GPU is in use just not at top speed because top speed is not needed

as for GPU bound i do not think the 7970 has as much async compute engines (ACEs) as newer stuff it might be that it cannot get the throughput needed and more speed will not fix that


----------



## umeng2002

Quote:


> Originally Posted by *jologskyblues*
> 
> Buy AMD Radeon
> 
> /thread


Boom, done it's over. nVidia is finished. It's official.


----------



## umeng2002

Quote:


> Originally Posted by *Kollock*
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> ...
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?


This is what I've been thinking too. Although I wouldn't think the difference between Tier 2 and 3 would cause that big of an issue.

If Maxwell really can't handle Asynchronous Compute, and console devs are really using it for the GCN-based PS4 and XO, AMD might have a large leg up in the next year when/ if these titles are ported to the PC... Assuming Pascal doesn't fix it. Or on the other hand, we just might a shift were the nVidia cards are worse on CPU performance instead of AMD.


----------



## gamervivek

Quote:


> Originally Posted by *GorillaSceptre*
> 
> If the 980 Ti isn't using Async then why isn't the Fury X slaughtering it? Where is Fury X's bottleneck?
> 
> Hopefully Oxide respond to Mahigan.


Because AMD don't cripple nvidia's performance. Read his post again.


----------



## Kuivamaa

Quote:


> Originally Posted by *Noufel*
> 
> That's the question that bothers me, and why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus knowing that dx12 will benefit from ACE ?
> i think that oxide have programmed their game especialy for the gcn 1.1 arch and they haven't the time to optimize it for the fiji gpus that's my toughts and they are not facts


The question is what is going on in terms of IQ. Will devs opt to offer their DX12 games without the related post processing when the client detects nvidia hardware? Will they include them with nvidia getting a performance hit? Will nvidia find a workaround? Are they simply gonna take the hit and in gamework titles create codepaths for Radeons to get hit eg in geometry so there is no difference? Are devs of console games that use Async shaders simply gonna opt to omit that feature from their pc ports (this one will cause some serious trouble in the "PCMR" crowd)?


----------



## Xuper

I don't Know , Owners of 980TI will be going mad if it's true that Maxwell doesn't work well under DX12?

Edit : and Also Titan X


----------



## mtcn77

Quote:


> Originally Posted by *Xuper*
> 
> I don't Know , Owners of 980TI will be going mad if it's true that Maxwell doesn't work well under DX12?
> 
> Edit : and Also Titan X


Well, it doesn't have to. People love justifying their effort.


----------



## Casey Ryback

Quote:


> Originally Posted by *Xuper*
> 
> I don't Know , Owners of 980TI will be going mad if it's true that Maxwell doesn't work well under DX12?
> 
> Edit : and Also Titan X


No they won't, many of them were in the fury threads stating that you buy cards for performance now, not for the possibility of increased performance in the future.

The same people will just sell their cards anyway and upgrade to the DX12 benchmark winners.


----------



## zealord

Quote:


> Originally Posted by *Xuper*
> 
> I don't Know , Owners of 980TI will be going mad if it's true that Maxwell doesn't work well under DX12?
> 
> Edit : and Also Titan X


Nvidia wants them to buy new cards next year. If this is true, then Nvidia did it intentionally. Nvidia has mastered the craft of giving people what they need currently instead of what they need next year. AMD is giving people stuff that they don't need yet, but at the point they need it (like HBM) the card it came with is too bad by now.
Also Nvidia owners buy Nvidia cards anyways. They are mad for 2 days that DX12 is not supported like they thought it would be and then they buy the new 1000$ PascalTitan anyways.


----------



## mtcn77

Quote:


> Originally Posted by *Casey Ryback*
> 
> No they won't, many of them were in the fury threads stating that you buy cards for performance now, not for the possibility of increased performance in the future.
> 
> The same people will just sell their cards anyway and upgrade to the DX12 benchmark winners.


Let's face it: they won't just give up their purchase and Nvidia will not give them any reason to - if only they considered upgrading to Pascal.


----------



## Xuper

Fable Legend is Gamework Title ? I thought Only for MS.


----------



## Redeemer

Note sure if this has been posted

http://wccftech.com/oxide-games-dev-replies-ashes-singularity-controversy/

Oxide Games Dev Replies On Ashes of the Singularity Controversy
Quote:


> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.
> 
> Certainly I could see how one might see that we are working closer with one hardware vendor then the other, but the numbers don't really bare that out. Since we've started, I think we've had about 3 site visits from NVidia, 3 from AMD, and 2 from Intel ( and 0 from Microsoft, but they never come visit anyone ;(). Nvidia was actually a far more active collaborator over the summer then AMD was, If you judged from email traffic and code-checkins, you'd draw the conclusion we were working closer with Nvidia rather than AMD wink.gif As you've pointed out, there does exist a marketing agreement between Stardock (our publisher) for Ashes with AMD. But this is typical of almost every major PC game I've ever worked on (Civ 5 had a marketing agreement with NVidia, for example). Without getting into the specifics, I believe the primary goal of AMD is to promote D3D12 titles as they have also lined up a few other D3D12 games.
> 
> If you use this metric, however, given Nvidia's promotions with Unreal (and integration with Gameworks) you'd have to say that every Unreal game is biased, not to mention virtually every game that's commonly used as a benchmark since most of them have a promotion agreement with someone. Certainly, one might argue that Unreal being an engine with many titles should give it particular weight, and I wouldn't disagree. However, Ashes is not the only game being developed with Nitrous. It is also being used in several additional titles right now, the only announced one being the Star Control reboot. (Which I am super excited about! But that's a completely other topic wink.gif).
> 
> Personally, I think one could just as easily make the claim that we were biased toward Nvidia as the only 'vendor' specific code is for Nvidia where we had to shutdownasync compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.
> 
> From our perspective, one of the surprising things about the results is just how good Nvidia's DX11 perf is. But that's a very recent development, with huge CPU perf improvements over the last month. Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11. The other surprise is that of the min frame times having the 290X beat out the 980 Ti (as reported on Ars Techinica). Unlike DX11, minimum frame times are mostly an application controlled feature so I was expecting it to be close to identical. This would appear to be GPU side variance, rather then software variance. We'll have to dig into this one.
> 
> I suspect that one thing that is helping AMD on GPU performance is D3D12 exposes Async Compute, which D3D11 did not. Ashes uses a modest amount of it, which gave us a noticeable perf improvement. It was mostly opportunistic where we just took a few compute tasks we were already doing and made them asynchronous, Ashes really isn't a poster-child for advanced GCN features.
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?
> 
> In the end, I think everyone has to give AMD alot of credit for not objecting to our collaborative effort with Nvidia even though the game had a marketing deal with them. They never once complained about it, and it certainly would have been within their right to do so. (Complain, anyway, we would have still done it, wink.gif)
> 
> -
> P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.
> 
> AFAIK, Maxwell doesn't support Async Compute, at least not natively. We disabled it at the request of Nvidia, as it was much slower to try to use it then to not.
> 
> Weather or not Async Compute is better or not is subjective, but it definitely does buy some performance on AMD's hardware. Whether it is the right architectural decision for Maxwell, or is even relevant to it's scheduler is hard to say.[\QUOTE]


----------



## anonjoe

This has been posted in this very own thread on page 121 by Oxide Dev ,no need to quote wccftech









Quote:


> Originally Posted by *Redeemer*
> 
> Note sure if this has been posted
> 
> http://wccftech.com/oxide-games-dev-replies-ashes-singularity-controversy/
> 
> Oxide Games Dev Replies On Ashes of the Singularity Controversy
> Quote:
> 
> 
> 
> Wow, there are lots of posts here, so I'll only respond to the last one. The interest in this subject is higher then we thought. The primary evolution of the benchmark is for our own internal testing, so it's pretty important that it be representative of the gameplay. To keep things clean, I'm not going to make very many comments on the concept of bias and fairness, as it can completely go down a rat hole.......ect.[\QUOTE]
Click to expand...


----------



## Mahigan

I emailed Oxide asking for the CPU optimizations. I shared their response here. Then I emailed asking if they could share their thoughts on the GPU side, they came here to post.

They're probably the most open and transparent devs I have ever encountered. The information they have shared, in both instances, is light years away from what you normally encounter.


----------



## Mahigan

I hope it's not a problem that my email signature contained....

[name]
Overclock.net


----------



## Mahigan

Going forward I'll pay for an overclock.net email. That way when I obtain info... It will be in a more professional manner.


----------



## ku4eto

Quote:


> Originally Posted by *Mahigan*
> 
> I hope it's not a problem that my email signature contained....
> 
> [name]
> Overclock.net


Mahigan, you da real MVP !







Although i didn't seem to see anything regarding the FX CPU performance from the Oxide Dev. Did i missed it, or they did not made any statements regarding this ?


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> I emailed Oxide asking for the CPU optimizations. I shared their response here. Then I emailed asking if they could share their thoughts on the GPU side, they came here to post.
> 
> They're probably the most open and transparent devs I have ever encountered. The information they have shared, in both instances, is light years away from what you normally encounter.


Thank you for sharing all those information mahigan








If you can and have the time to mail them for the difference in the preformance gain between the furyX and the 390X and if they know what is bottlenecking fhe former one.
Thnx in advance mahigan


----------



## semitope

Quote:


> Originally Posted by *Noufel*
> 
> That's the question that bothers me, and why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus knowing that dx12 will benefit from ACE ?
> i think that oxide have programmed their game especialy for the gcn 1.1 arch and they haven't the time to optimize it for the fiji gpus that's my toughts and they are not facts


Quote:


> Originally Posted by *GorillaSceptre*
> 
> If the 980 Ti isn't using Async then why isn't the Fury X slaughtering it? Where is Fury X's bottleneck?
> 
> Hopefully Oxide respond to Mahigan.


Quote:


> Originally Posted by *garwynn*
> 
> Best guess? The combination of HBM and the new GPU design was enough of a risk, would honestly expect them to kick it into high gear next year.
> A relative example was when Samsung had issues with its Exynos Octa design - they went to Qualcomm while they fixed it, then switched back to their own SoC.
> 
> Appreciate everyone's input on this, between this and much more research it really does seem like the competition between the two is about to get tight again.
> Buckle your seat belts, this could be a good ride.


you guys didn't read the dev post. The reason is because they just kinda sorta used async compute. Its not a major thing in their game. They didn't go out to fully take advantage of it but instead used it opportunistically from what I gather.

Most of AMDs gains are probably in other dx12 areas. Getting rid of that dx11 weight so they just get close.

Other games that use it more will show better results.


----------



## garwynn

Quote:


> Originally Posted by *semitope*
> 
> you guys didn't read the dev post. The reason is because they just kinda sorta used async compute. Its not a major thing in their game. They didn't go out to fully take advantage of it but instead used it opportunistically from what I gather.
> 
> Most of AMDs gains are probably in other dx12 areas. Getting rid of that dx11 weight so they just get close.
> 
> Other games that use it more will show better results.


I did read the post - so much that I'm referring to it in another piece.
I was replying to the question of why Fiji didn't go higher on the ACE count compared to Hawaii.
This too was a guess, but I will be following up with AMD to see if they can shed light on that. If they do, will share the response.


----------



## semitope

Quote:


> Originally Posted by *garwynn*
> 
> I did read the post - so much that I'm referring to it in another piece.
> I was replying to the question of why Fiji didn't go higher on the ACE count compared to Hawaii.
> This too was a guess, but I will be following up with AMD to see if they can shed light on that. If they do, will share the response.


oh. cool. its likely that it would be overkill and waste silicon. More ACEs vs more shaders etc. 8 with 8 queues per. Someone suggested going up to 8 ACEs was due to Sony wanting it for PS4 but I doubt it. Another source claims there are only 4 ACEs and 4 HWS(?) with the HWS being able to act as ACEs but also pack additionaly functionality.


----------



## epic1337

Quote:


> Originally Posted by *Casey Ryback*
> 
> No they won't, many of them were in the fury threads stating that you buy cards for performance now, not for the possibility of increased performance in the future.
> 
> The same people will just sell their cards anyway and upgrade to the DX12 benchmark winners.


yup, i don't see DX12 taking over in the next 2~3years, maybe big titles gets DX12 support but that doesn't mean DX11 would suddenly become worthless.
in any case, the maxwell cards are still fine as it is, people who finds them lacking could just upgrade like they usually do.

look at DX9, still alive and kicking.


----------



## ku4eto

Quote:


> Originally Posted by *epic1337*
> 
> yup, i don't see DX12 taking over in the next 2~3years, maybe big titles gets DX12 support but that doesn't mean DX11 would suddenly become worthless.
> in any case, the maxwell cards are still fine as it is, people who finds them lacking could just upgrade like they usually do.
> 
> look at DX9, still alive and kicking.


Cuz indie 2D games dont really need DX11. And you can run them on 5 year old configuration on max details (well probably not if its 4K resolution, but who plays indies like that ?!).
If we are talking about number of games that are being released on DX9 and number of players on those DX9 games versus DX10/DX11 titles, there will be major differences. Yes, DX9 games are outnumbering the DX11, but the % of players are more on the DX10/11.
In a year or two maximum, you will have more players on DX12 titles than on DX10 and DX9. Just like DX11 happened. This time a bit faster, due to changes in the game industry as well in the marketing, hardware and so on.


----------



## epic1337

Quote:


> Originally Posted by *ku4eto*
> 
> Cuz indie 2D games dont really need DX11. And you can run them on 5 year old configuration on max details (well probably not if its 4K resolution, but who plays indies like that ?!).
> If we are talking about number of games that are being released on DX9 and number of players on those DX9 games versus DX10/DX11 titles, there will be major differences. Yes, DX9 games are outnumbering the DX11, but the % of players are more on the DX10/11.
> In a year or two maximum, you will have more players on DX12 titles than on DX10 and DX9. Just like DX11 happened. This time a bit faster, due to changes in the game industry as well in the marketing, hardware and so on.


that wasn't my point?


----------



## OneB1t

on FX-8xxx there is no real gain from PCI-E overclock (tested 100mhz vs 150mhz) or HT link overclock (2600mhz vs 3200mhz) (or both of them at same time)

but there is definitely some gains from memory overclock i just switched from 4x2GB 1333mhz 9-9-9-24-CR1 to 2x8GB 1866mhz 10-10-10-24-CR1 and gained about 5-10% FPS boost
also CPU-NB frequency have 5-10% impact on result

==Shot high vista ==================================
Total Time: 4.996590
Avg Framerate : 33.760746 ms (29.620199 FPS)
Weighted Framerate : 33.909782 ms (29.490015 FPS)
CPU frame rate (estimated framerate if not GPU bound): 26.268589 ms (38.068279 FPS)
Percent GPU Bound: 98.643028%
Driver throughput (Batches per ms): 6423.558105
Average Batches per frame: 50627.914063


----------



## semitope

The more demanding games will be dx12. The api makes it possible to do more on the same hardware, so those on dx11 will suffer less performance or less quality. we still have to see. Nvidia's savior might be unreal engine but I rarely like a game on that engine anyway. It's been pretty ugly till unreal 4 and even on that some games don't look good.

definitely not missing my 970


----------



## Noufel

Quote:


> Originally Posted by *semitope*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> That's the question that bothers me, and why AMD would have launched a dx12 gpu ( fiji ) with the same amount of ACE of hawaii gpus knowing that dx12 will benefit from ACE ?
> i think that oxide have programmed their game especialy for the gcn 1.1 arch and they haven't the time to optimize it for the fiji gpus that's my toughts and they are not facts
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Quote:
> 
> 
> 
> Originally Posted by *GorillaSceptre*
> 
> If the 980 Ti isn't using Async then why isn't the Fury X slaughtering it? Where is Fury X's bottleneck?
> 
> Hopefully Oxide respond to Mahigan.
> 
> Click to expand...
> 
> Quote:
> 
> 
> 
> Originally Posted by *garwynn*
> 
> Best guess? The combination of HBM and the new GPU design was enough of a risk, would honestly expect them to kick it into high gear next year.
> A relative example was when Samsung had issues with its Exynos Octa design - they went to Qualcomm while they fixed it, then switched back to their own SoC.
> 
> Appreciate everyone's input on this, between this and much more research it really does seem like the competition between the two is about to get tight again.
> Buckle your seat belts, this could be a good ride.
> 
> Click to expand...
> 
> you guys didn't read the dev post. The reason is because they just kinda sorta used async compute. Its not a major thing in their game. They didn't go out to fully take advantage of it but instead used it opportunistically from what I gather.
> 
> Most of AMDs gains are probably in other dx12 areas. Getting rid of that dx11 weight so they just get close.
> 
> Other games that use it more will show better results.
Click to expand...

Why quoting me, i asked mahigan about the gains on the 390x and the furyx not the 980ti.


----------



## epic1337

Quote:


> Originally Posted by *OneB1t*
> 
> on FX-8xxx there is no real gain from PCI-E overclock (tested 100mhz vs 150mhz) or HT link overclock (2600mhz vs 3200mhz) (or both of them at same time)
> 
> but there is definitely some gains from memory overclock i just switched from 4x2GB 1333mhz 9-9-9-24-CR1 to 2x8GB 1866mhz 10-10-10-24-CR1 and gained about 5-10% FPS boost
> also CPU-NB frequency have 5-10% impact on result


thats probably because of AMD's slow IMC, and high latency caches.


----------



## GorillaSceptre

Quote:


> Originally Posted by *gamervivek*
> 
> Because AMD don't cripple nvidia's performance. Read his post again.


I did read his post..
Quote:


> Originally Posted by *semitope*
> 
> you guys didn't read the dev post. The reason is because they just kinda sorta used async compute. Its not a major thing in their game. They didn't go out to fully take advantage of it but instead used it opportunistically from what I gather.
> 
> Most of AMDs gains are probably in other dx12 areas. Getting rid of that dx11 weight so they just get close.
> 
> Other games that use it more will show better results.


You're misunderstanding what i asked.

Something is holding Fiji back in comparison to Hawaii, it should in principle be quite a bit ahead of the 980 Ti, but it's not. Could Ashes be presenting a CPU bottleneck? Maybe that's why we aren't seeing a bigger difference.

That's what i was asking, but i'm sure Mahigan is on the task as we speak


----------



## Mahigan

I think that with the consoles pushing DX12, Xbox One, and Vulcan-like implementations, PS4, we won't see a slow DX12 implementation on the PC. I think that in less than a years time... The majority of new titles will be either coded for DX12 or add a path through a patch.

I'm very adamant about this. We're, and have been, in a period of stagnation. We're entering a period of enormous changes. These changes will create a lot of controversy but I think that, in the end, the entire industry will be better because of these changes.


----------



## OneB1t

Quote:


> Originally Posted by *epic1337*
> 
> thats probably because of AMD's slow IMC, and high latency caches.


maybe but still dont understand why I3 to i7 there is double performance increase
but fx-4xxx to fx-8xxx increase is only cosmetic (20-25%)


----------



## Mahigan

It is no longer a matter of two separate markets. A console market and a PC market. The two are converging. Cross platform gaming is part of the Microsoft strategy. What we are moving to is a single market. A Gaming market. One single entity.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> I think that with the consoles pushing DX12, Xbox One, and Vulcan-like implementations, PS4, we won't see a slow DX12 implementation on the PC. I think that in less than a years time... The majority of new titles will be either coded for DX12 or add a path through a patch.
> 
> I'm very adamant about this. We're, and have been, in a period of stagnation. We're entering a period of enormous changes. These changes will create a lot of controversy but I think that, in the end, the entire industry will be better because of these changes.


Agreed. And by the way - thanks for the input on the article. I'll post the link when it goes live tomorrow.


----------



## epic1337

Quote:


> Originally Posted by *OneB1t*
> 
> maybe but still dont understand why I3 to i7 there is double performance increase
> but fx-4xxx to fx-8xxx increase is only cosmetic (20-25%)


you mean why Intel's gains a higher amount of performance boost with faster DIMMs?

i'm not sure either, but the probability of efficient caches (lower latency, better cache management) and fast IMC could've affected it.
suppose the high latencies and inefficient caches are inhibiting the contribution of faster DIMMs in AMD's case.


----------



## santerino

Thank you Mahigan & Kollock for revealing the truth behind the DX21 benches.
I hope soon the DX12 games will come the Amd FX series will unleash their 6-8 cores power.

I dont know if is right but this thread become fabulous on tech sites including wccf : http://wccftech.com/oxide-games-dev-replies-ashes-singularity-controversy/

and reddit:

__
https://www.reddit.com/r/3iwt2j/oxide_nv_gpus_do_not_support_dx12_asynchronous/

Let the truth prevail guys.


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> It is no longer a matter of two separate markets. A console market and a PC market. The two are converging. Cross platform gaming is part of the Microsoft strategy. What we are moving to is a single market. A Gaming market. One single entity.


Till the end of the current console cycle anyway. Then fingers crossed things don't get complicated.


----------



## semitope

Quote:


> Originally Posted by *Noufel*
> 
> Why quoting me, i asked mahigan about the gains on the 390x and the furyx not the 980ti.


Quote:


> Originally Posted by *GorillaSceptre*
> 
> I did read his post..
> You're misunderstanding what i asked.
> 
> Something is holding Fiji back in comparison to Hawaii, it should in principle be quite a bit ahead of the 980 Ti, but it's not. Could Ashes be presenting a CPU bottleneck? Maybe that's why we aren't seeing a bigger difference.
> 
> That's what i was asking, but i'm sure Mahigan is on the task as we speak


I was assuming you thought the difference in shader count should mean fury does better in compute and should pull ahead. I guess what you are asking may be where the bottleneck is, which was discussed already.

On compute the game does not exploit ACEs that much so the greater power of the fury x might not come out to make the difference. a small road and wide road conduct the same amount of traffic if there's only one car. 8 ACEs is not the bottleneck IMO, just that it was not used that extensively.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> It is no longer a matter of two separate markets. A console market and a PC market. The two are converging. Cross platform gaming is part of the Microsoft strategy. What we are moving to is a single market. A Gaming market. One single entity.


I hope you are right Mahigan, dx12 will eventualy pave the way to crossplatforme gaming and that's what we need


----------



## PontiacGTX

Quote:


> Originally Posted by *CrazyElf*
> 
> Overall I'd give the upper hand to Nvidia. They've got money, market share, mind share, and the influence in the gaming industry. Even though AMD now has the console market, and DX12, they still have huge problems.


the only problem is how biased are the devs towards nvidia...
Quote:


> 2. Nvidia is currently more efficient at managing what VRAM,is currently available


no, the nvidia equivalent will use more vram than AMD, given the higher memory buffer..and Bandwidth? doo you forget about the GTX 970 ...
Quote:


> This may be one of the reasons why the massive bandwidth of HBM did not prove to be the advantage


wrong most of games dont require that much bandwdith unless you increase a lot the resolution and AA, still AMD can manage to get higherperformance than a GTX 980ti evne fi it has less ROPs..
Quote:


> Drivers (again AMD's drivers were not well optimized at all)


Optimized for ? you mean a lot of games which requires a lot of work to do and, also those which are biased already due to the engine itself? UE4 ....
Quote:


> Nvidia has historically relied on this more heavily than AMD.


Not really because AMD had to manage optimization levle to match a Fury X at performance of as GTX 980Ti
Quote:


> It did come at the expense of CPU, but of course, most games are simply not CPU bound.


Thats because lazy developers doing console ports, and really the trend has been lately support more cores.


----------



## Mahigan

By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.

If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.

This confirms my thought on the need for a new publication.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


There could be some articles mentioning about recent comment we had from Oxide. But other than that, I doubt there would be some kind of deep coverage from it. After all, it is just a one game based on DX12 and definitely needs more benches from more games to do deep analysis of how DX12 works on GPUs from both vendors. I don't think anyone is interested in building a huge skyscraper on fine sand beach.


----------



## ToTheSun!

Quote:


> Originally Posted by *Mahigan*
> 
> This confirms my thought on the need for a new publication.


Link us when it goes live!


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


Comment about what? A single benchmark and a bunch of conjecture? I would expect any competent journalist (as opposed to a rumor mill like WCCF) is going to want to get comments from the developers and AMD/Nvidia before writing about it.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


Mahigan has read a draft of the piece I've got coming up tomorrow. I've also thrown it around to a few people to see if they could help me get it up to a larger audience, because I was concerned the site I run (Fatal Hero) would not get enough readership. A friend has stepped in and offered to host it on his site - it's not going to get the readership of a major site but it's still better than Fatal Hero.

Mahigan is right though, and that's why the site was started. The problem has been finding people interested in writing about things. If you guys want to help in that area, PM me and I'll work with you to help make that happen.


----------



## garwynn

Quote:


> Originally Posted by *Forceman*
> 
> Comment about what? A single benchmark and a bunch of conjecture?


Actually after much reading it's more than that. It's AMD potentially seeing payoffs in a very long game.
Tell you what... Here's an older draft of what's going up tomorrow. Again, I'll put up a link when the analysis goes live.
https://drive.google.com/file/d/0B3WjyoP4lBPaRE9LOHp0dE5JS00/view?usp=sharing

The problem is that even folks who work in semiconductors have told me that this is a tough read.
I think that's why it's struggling to be understood and explained to folks who don't understand the technical details.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


It would be appreciated if you could write up a conclusion to all of this, some of the points and arguments have gotten a bit convoluted.









It does amaze me that the so called "professionals" didn't bring this up. But then again.. How long has the false information about GCN been up on Anandtech?

I also expect these sites to link you as the original source, a lot of these click-bait ad simulators are acting as if they came up with the info.. Either way, well done. It must of taken a lot of research to understand how these architectures work.


----------



## Mahigan

@garwynn

In your article, best to use my description of the issue found on my last post here. Written in a more clear and concise manner:

http://forums.anandtech.com/showthread.php?p=37666352&posted=1#post37666352


----------



## mav451

Quote:


> Originally Posted by *garwynn*
> 
> Actually after much reading it's more than that. It's AMD potentially seeing payoffs in a very long game.
> Tell you what... Here's an older draft of what's going up tomorrow. Again, I'll put up a link when the analysis goes live.
> https://drive.google.com/file/d/0B3WjyoP4lBPaRE9LOHp0dE5JS00/view?usp=sharing
> 
> The problem is that even folks who work in semiconductors have told me that this is a tough read.
> I think that's why it's struggling to be understood and explained to folks who don't understand the technical details.


What exactly do they mean by "tough read?" Do they think it still needs to be revised for a layman audience?

I'm gonna take a look at your draft - thanks for posting it!


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> @garwynn
> 
> In your article, best to use my description of the issue found on my last post here. Written in a more clear and concise manner:
> 
> http://forums.anandtech.com/showthread.php?p=37666352&posted=1#post37666352


Thanks for the update - I'll adjust the quote and link!


----------



## garwynn

Quote:


> Originally Posted by *mav451*
> 
> What exactly do they mean by "tough read?" Do they think it still needs to be revised for a layman audience?
> 
> I'm gonna take a look at your draft - thanks for posting it!


What they meant is that most of the technical terms were _so_ technical that they were difficult to follow, especially in the linked whitepapers and materials (since after all, cite sources...)

I'll be noting that my hope is that while everyone may not catch all the details they get enough out of this to see the point of the article - that AMD seems to have put quite a long-term investment to reach here. Also had to add some notes because PS4 performance seems to be based on much of the same features released in DX12. It's hard to confirm they _all_ are making their way in since we don't know all the proprietary changes made for each console with AMD, but one certainly knew about them and was probably frustrated about it: AMD.


----------



## Mahigan

@garwynn

Here:

A GTX 980 Ti can handle both compute and graphic commands in parallel. What they cannot handle is Asynchronous compute. That's to say the ability for independent units (ACEs in GCN and AWSs in Maxwell/2) to function out of order while handling error correction.

It's quite simple if you look at the block diagrams between both architectures. The ACEs reside outside of the Shader Engines. They have access to the Global data share cache, L2 R/W cache pools on front of each quad CUs as well as the HBM/GDDR5 memory un order to fetch commands, send commands, perform error checking or synchronize for dependencies.

The AWSs, in Maxwell/2, reside within their respective SMMs. They may have the ability to issue commands to the CUDA cores residing within their respective SMMs but communicating or issueing commands outside of their respective SMMs would demand sharing a single L2 cache pool. This caching pool neither has the space (sizing) nor the bandwidth to function in this manner.

Therefore enabling Async Shading results in a noticeable drop in performance, so noticeable that Oxide disabled the feature and worked with NVIDIA to get the most out of Maxwell/2 through shader optimizations.

Its architectural. Maxwell/2 will NEVER have this capability.


----------



## garwynn

Thx @Mahigan. Need to reorder a few things to use that over the old quote since you're referring to the replies by Oxide in this thread.


----------



## Mahigan

@garwynn

Thank you


----------



## Mahigan

My view was theoretical, theirs is demonstrable. Together we have a scientific theory.


----------



## Themisseble

Quote:


> Originally Posted by *OneB1t*
> 
> on FX-8xxx there is no real gain from PCI-E overclock (tested 100mhz vs 150mhz) or HT link overclock (2600mhz vs 3200mhz) (or both of them at same time)
> 
> but there is definitely some gains from memory overclock i just switched from 4x2GB 1333mhz 9-9-9-24-CR1 to 2x8GB 1866mhz 10-10-10-24-CR1 and gained about 5-10% FPS boost
> also CPU-NB frequency have 5-10% impact on result
> 
> ==Shot high vista ==================================
> Total Time: 4.996590
> Avg Framerate : 33.760746 ms (29.620199 FPS)
> Weighted Framerate : 33.909782 ms (29.490015 FPS)
> CPU frame rate (estimated framerate if not GPU bound): 26.268589 ms (38.068279 FPS)
> Percent GPU Bound: 98.643028%
> Driver throughput (Batches per ms): 6423.558105
> Average Batches per frame: 50627.914063


This is near i5 performance?


----------



## Mahigan

So it is what Oxide mentioned and one of the theories I mentioned. 19GB/s memory bandwidth might choke the AMD FX processors.

God I love theory.

As for the requests to look into Fiji, I will. I'm just not home and replying on my smartphone. When I have the time I will. Promise.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> So it is what Oxide mentioned and one of the theories I mentioned. 19GB/s memory bandwidth might choke the AMD FX processors.
> 
> God I love theory.
> 
> As for the requests to look into Fiji, I will. I'm just not home and replying on my smartphone. When I have the time I will. Promise.


If you get stuck, let me know. Robert Hallock from AMD has said to contact him with any Qs so I'm planning on taking him up on this as a possible follow-up piece.
Just didn't want to hold this up any longer, this piece has been in the works for nearly a week. Good thing I didn't post it sooner though, many more pieces have fallen into place.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> So it is what Oxide mentioned and one of the theories I mentioned. 19GB/s memory bandwidth might choke the AMD FX processors.
> 
> God I love theory.
> 
> As for the requests to look into Fiji, I will. I'm just not home and replying on my smartphone. When I have the time I will. Promise.


AotS might be very memory bandwidth.... Havent notice any difference in BF4 mantle mode.


----------



## spacin9

I always knew I would get this game and now I have it. As you might guess from my name and avatar, I've been playing Sins from the beginning and Sup Com for a long time. And now they are married. It's like chocolate and peanut butter, The Dallas Cowboys vs Manchester United, joannie and chachi. You guys can argue the technical merits; I'm not going to read every post in this thread. But...

OH

MY

GOD.

The trouble I see in the visuals seems to be (from what I've seen so far from *only one benchmark run* in Windows 10 DX 12 mode @ 4K), is when the camera zooms in for up-close fighting, the framerate dips pretty bad. When viewed from a wide shot... it's enough to make you want to cry. Stunning.







And butter smooth.

I'm guessing the heavy smoke effects have a big impact zoomed. Total uneducated guess. I haven't read about this game or the benchmark one iota. I don't follow the news about a game. I just play it.

This leads me to believe that the visuals seem GPU bound, at least in the benchmark. And that's a good thing. 10,000 units and no loss of game speed? I think I can turn down AA and shadows for that.









*edit*

On my second benchmark @ 4K gsync, I turned down shadows to medium and turned AA off (no need for it @ 4K) and all benchmark tests went to 55-60 fps from 40-50 fps maxed out. And it still looks amazing. I am hoping there could be an SLI profile for this game soon. And if it works decent... maxing this out @ 4K should be no problem. At least according to the benchmark I've seen.


----------



## umeng2002

Every one says, "It's just one benchmark," but they also forget that the 3.5 GB issue with the GTX 970 was discovered with one "benchmark."


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing more than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


Would normally have to wait till weekdays to make that assessment. Weekends are typically slow.

I had suggested that the underlying hardware might be a cause for the AMD CPU issues. It could also present an artificial limit on the performance of higher end CPU/GPU combinations, just at a higher fps.


----------



## Mahigan

With the comments from Oxide, we can't say "it's only one benchmark" anymore. We can form definitive conclusions... But we can't conclude that either.


----------



## Mahigan

We can't form definitive conclusions*

Sorry can't edit... Windows phone LOL


----------



## Forceman

Quote:


> Originally Posted by *umeng2002*
> 
> Every one says, "It's just one benchmark," but they also forget that the 3.5 GB issue with the GTX 970 was discovered with one "benchmark."


Pretty sure it was people noticing only 3.5GB allocated in Afterburner that first brought attention to it. That program came later.

This benchmark also shows pretty abysmal DX11 performance for AMD, but I don't see anyone extrapolating that to "AMD can't compete in DX11".


----------



## Themisseble

Quote:


> Originally Posted by *Forceman*
> 
> Pretty sure it was people noticing only 3.5GB allocated in Afterburner that first brought attention to it. That program came later.


please do that with GTX 950.

GTX 950 shows really inefficient (should be like x1.5 better) against much old GCN like R7 370.


----------



## sir cuddles

Quote:


> Originally Posted by *Forceman*
> 
> Pretty sure it was people noticing only 3.5GB allocated in Afterburner that first brought attention to it. That program came later.
> 
> This benchmark also shows pretty abysmal DX11 performance for AMD, but I don't see anyone extrapolating that to "AMD can't compete in DX11".


That's because it was already very well known that AMD's dx11 drivers were extremely inefficient compared to Nvidia's.


----------



## Mahigan

there's a difference...

The Fury-X is sometimes behind the GTX 980 ti, sometimes a little ahead. This is acceptable. Though if I were to recommend someone where to spend $650 for a DX11 GPU, it would be the GTX 980 Ti.

What you don't expect is a GPU from 2013 competing with a GTX 980 Ti.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> there's a difference...
> 
> The Fury-X is sometimes behind the GTX 980 ti, sometimes a little ahead. This is acceptable. Though if I were to recommend someone where to spend $650 for a DX11 GPU, it would be the GTX 980 Ti.
> 
> What you don't expect is a GPU from 2013 competing with a GTX 980 Ti.


Just like now in some benchmarks GTX 780Ti is competing with 7970.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> there's a difference...
> 
> The Fury-X is sometimes behind the GTX 980 ti, sometimes a little ahead. This is acceptable. Though if I were to recommend someone where to spend $650 for a DX11 GPU, it would be the GTX 980 Ti.
> 
> What you don't expect is a GPU from 2013 competing with a GTX 980 Ti.


And that's what bothers me it also compete with the furyX like it compete with the 980ti, i hope that oxide will soon give you more information about that matter








Ps: thnx again for your time


----------



## Remij

The only thing I know for fact is that Nvidia's been powering 99% of Microsofts DX12 demos. And AMD has Ashes of the Singularity. That's gotta mean something.

History is going to repeat itself.


----------



## Themisseble

Quote:


> Originally Posted by *Remij*
> 
> The only thing I know for fact is that Nvidia's been powering 99% of Microsofts DX12 demos. And AMD has Ashes of the Singularity. That's gotta mean something.
> 
> History is going to repeat itself.


So you are saying that AotS cripple NVIDIA GPUs in async shaders? bcause if you put those settings with async shaders off and you use more tessellation NVIDIA is faster in DX12 mode.


----------



## Remij

Quote:


> Originally Posted by *Themisseble*
> 
> So you are saying that AotS cripple NVIDIA GPUs in async shaders? bcause if you put those settings with async shaders off and you use more tessellation NVIDIA is faster in DX12 mode.


I'm saying general performance isn't conclusive based off one Alpha benchmark..which has conflicting results across a broad range of websites.

Nvidia's been powering DX12 demos and has been actively involved in its development. They have some of the smartest people in the world working for them, and tons of resources at their disposal.

They seem confident in their ability to push forward with DX12... and the fact that in AoTS DX12 performs even worse than DX11.. leads me to believe that all is not well in that benchmark. And until I see an obvious trend here, I'm not jumping on the AMD wagon just yet.


----------



## semitope

Quote:


> Originally Posted by *Remij*
> 
> The only thing I know for fact is that Nvidia's been powering 99% of Microsofts DX12 demos. And AMD has Ashes of the Singularity. That's gotta mean something.
> 
> History is going to repeat itself.


that can be down to a simple arrangement. Even AMD with ashes looks like just marketing. For the nvidia GPUs in microsoft demos that might be an arrangement to be provided with GPUs by nvidia.


----------



## ku4eto

Quote:


> Originally Posted by *Remij*
> 
> I'm saying general performance isn't conclusive based off one Alpha benchmark..which has conflicting results across a broad range of websites.


This conflicting results are exactly when the settings such as TAA are set to ridiculous levels, where they run better on nVidia. If MSAA is turned on, nVidia performance takes a quick down jump. You can't just use one thing that applies for both sides to bash only one of it.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> there's a difference...
> 
> The Fury-X is sometimes behind the GTX 980 ti, sometimes a little ahead. This is acceptable. Though if I were to recommend someone where to spend $650 for a DX11 GPU, it would be the GTX 980 Ti.
> 
> What you don't expect is a GPU from 2013 competing with a GTX 980 Ti.


GPU from 2013 are hardly that much behind from top end GPUs we got today. We are still stuck on 28nm lol. Hardly an "oh wow" factor. It would have been more shocking if GPUs in 2013 were competing if Fury X and 980Ti were on something like 14/16nm finfet lol.


----------



## Mahigan

@remi

If you read through this thread, I've discredited the "conflicting results" theory.

We've established that the pcgameshardware results were either manufactured, by purposefully toying with the benchmark settings in order to obtain a desired result. Or conducted by someone who didn't know what they were doing.

They disabled glare and boosted TAA to a factor of 12. The result is a blurry mess with near unplayable framerates on every GPU.

Thus the benchmark is invalid.

All other benchmarks conducted by tech review websites, output the sane expected results.

Therefore we have disregarded the results from pcgameshardware.


----------



## Remij

Alright link me to the most trustworthy benchmark of AoTS that you know of. I would like to take a look and then run my own test.


----------



## Mahigan

@Remij

http://arstechnica.com/gaming/2015/08/directx-12-tested-an-early-win-for-amd-and-disappointment-for-nvidia/?comments=1

http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark

http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head

http://www.computerbase.de/2015-08/directx-12-benchmarks-ashes-of-the-singularity-unterschiede-amd-nvidia/2/


----------



## spacin9

The argument you guys seem to be bickering about is irrelevant for this game. It seems the argument is about implications for future games. I have a Titan X and I can make it look gorgeous @ 4K, not maxed of course. What is important with this game is that it is loading on all CPU cores, which is paramount to RTS gamers.

Sins of a Solar Empire looks okay.. but visually it suffers when the units pile up because we're CPU limited. Goes to 10-15 fps no matter what video card you have.

With this game it looks like you can have both visuals and high unit count and not have it suffer no where near as much as Sup Com or Sins when the units pile up. I don't know how this bodes for AMD (cpu and/or GPU) users.

But with my rig and a single Titan X, I pull 105.4 CPU score 4K AA off, terrain shading samples MED. Everything else high. 42.4 average FPS in DX 12; 44.5 fps in DX 11, a slight increase hardly noticeable. In DX 11, I am seeing *4150 MB of VRAM* being used. I can't measure it DX 12 mode; afterburner won't post a display. If the Witcher 3's vram usage is any indication in this game, *I am betting my VRAM usage is higher in DX 12*, meaning possible issues with any video card under 6GB video ram @ 4K and perhaps lesser resolutions depending on the settings used.

I also noticed in DX 11, *1500 MB* of system ram being used. In DX 12, it's *2500* MB.

DX12 is clearly using more resources and I think that will provide an overall better RTS experience as this is game is further refined. If I can turn some settings to medium and stay @ 58 fps with G-sync... that's a monstrous win for the RTS genre. At least those that involve thousands of units.

If it's true async compute makes a huge difference for AMD video card users, and it is now currently off in the alpha, the Fury or 390X may still be limited by VRAM usage in terms of overall smoothness of gameplay @ higher resolutions. But if it can be enabled and give AMD a big advantage, I still won't fret. RTS is about game time and smoothness of play, visuals are secondary, though I believe I will be able to achieve great visuals anyway.

BTW I tried forcing AFR for SLI just for the heck of it. The SLi indicator works but SLi is not being used at all...it just flickers. That's typical of RTS games. It would be a major coup to get a decent SLI profile for this game.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> By the lack of coverage from the big tech websites, I think it is safe to say that they've become nothing morne than 3rd party Public Relations firms rather than journalists.
> 
> If I don't see any comment by tomorrow, I will conclude that none of them can be trusted sources for information.
> 
> This confirms my thought on the need for a new publication.


I am not surprised, as for the big tech sites, "it's not really big deal" when it comes to Nvidia, AMD, or Intel, depending on the tech site, - just as 3.5gb on the 970 wasn't a big deal until their viewers and subscribers compelled them that least acknowledge it, or risk looking complicit in the cover up, or just as Kepler optimization was not really an issue that they wanted to bother with until Nvidia acknowledged it by releasing a hotfix, and just as lack of frame time for sli continues be non issue as there have been no benchmarks to test sli frame times with new drivers, and the list goes on.
So, yes, the major tech sites have become nothing more than the promoters of new products for NV, AMD, Intel or whatever tech company happens to be buttering their bread at the time, while lacking real investigative journalism when it comes to reviewing the supplier provided products, but these same tech sites seemed to not to have any trouble finding their respective journalism integrity when it came to exposing issues with AMD drivers a few years back, and mind you it was well coordinated (I am just using it as an example, but same can be said for pro site bias for AMD, Intel, Etc) I commend them for doing that but it is rather curious when they fail to apply the same standard to their biggest sponsor whether it be the Green Co, the Red Co, or the Blue Co. ... Lol


----------



## PontiacGTX

Quote:


> Originally Posted by *Remij*
> 
> The only thing I know for fact is that Nvidia's been powering 99% of Microsofts DX12 demos. And AMD has Ashes of the Singularity. That's gotta mean something.
> 
> History is going to repeat itself.


AMD has the first available Game on Directx 12, was the first in providing a Low levle API to every hardware/software vendor that wanted to use it.and also they have better compatibility than nvidia equivalent on GCN1.1

btw nvidia was the one who pushed the development of an API which just released games has been broken(DX11) over another that might deliver better HW optimization,and one of their Demos was from a developer which has some partnetship with AMD


----------



## spacin9

And just a revision to my last post...

I used GPU-Z instead of afterburner to test video RAM usage in DX 12. Same settings, significantly less usage in DX 12. About 3700 MB... still pushing 4 GB tho.

In DX 11, I am seeing as high as 4400 MB VRAM usage.

Also, I noticed system RAM go as high as 3 GB in DX 12, and 2200 MB in DX 11.


----------



## wiak

the 980 ti has exacly the same tflops as the 290X/390X at 5.63 TFLOPS and with DX12 . The ACE engine in the 290X/390X seem to run a hell of a lot more efficient


----------



## garwynn

It should be noted that Fable Legends started a closed Beta under NDA with 5 year agreement not to discuss what is seen or heard, lest you get booted and possible further actions taken. They did also hint at an open Beta that, again an impression, would be much more open to discuss.


----------



## spacin9

Quote:


> Originally Posted by *garwynn*
> 
> It should be noted that Fable Legends started a closed Beta under NDA with 5 year agreement not to discuss what is seen or heard, lest you get booted and possible further actions taken. They did also hint at an open Beta that, again an impression, would be much more open to discuss.


No screens or videos posted.. we can talk about it all we like.


----------



## Mahigan

"Asynchronous DMA will allow data uploads without pausing the whole pipeline. It will need two active DMA engines in the GPU so this feature is supported by all GCN-based Radeons or Maxwellv2(GM206/GM204)-based GeForce. Most modern NVIDIA GPUs use two DMA engines, but one of these disabled on the GeForce product line, so in the past this was a professional feature. On the GM206/GM204 GPUs the two DMA not just present in the hardware but activated as well.
- Asynchronous compute allow overlapping of compute and graphics workloads. Most GPUs can use this feature, but not all hardware can execute the workloads efficiently."


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> "Asynchronous DMA will allow data uploads without pausing the whole pipeline. It will need two active DMA engines in the GPU so this feature is supported by all GCN-based Radeons or Maxwellv2(GM206/GM204)-based GeForce. Most modern NVIDIA GPUs use two DMA engines, but one of these disabled on the GeForce product line, so in the past this was a professional feature. On the GM206/GM204 GPUs the two DMA not just present in the hardware but activated as well.
> - Asynchronous compute allow overlapping of compute and graphics workloads. Most GPUs can use this feature, but not all hardware can execute the workloads efficiently."


Alright man, can you please do this tech illiterate a favor and paraphrase the above in English? Lol
I don't have a problem grasping the most complex abstract financial concepts (following the money is easy.







), but this is over my head.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> "Asynchronous DMA will allow data uploads without pausing the whole pipeline. It will need two active DMA engines in the GPU so this feature is supported by all GCN-based Radeons or Maxwellv2(GM206/GM204)-based GeForce. Most modern NVIDIA GPUs use two DMA engines, but one of these disabled on the GeForce product line, so in the past this was a professional feature. On the GM206/GM204 GPUs the two DMA not just present in the hardware but activated as well.
> - Asynchronous compute allow overlapping of compute and graphics workloads. Most GPUs can use this feature, but not all hardware can execute the workloads efficiently."


I honestly don't see AoS as great example of showing Asynchronous computation advantages just yet. Only thing I can see clearly is utilizing Asynchronous path certainly works better in AMD GPUs since GCN architecture is designed to work that way, thus far better ulilization. But current benchmarks really doesn't reflect the computation advantages, but rather bottlenecked by other reasons, by seeing how 980Ti and Fury X are close together. I need to see the DX12 game that heavily stresses the GPU in more traditional sense rather than the game that is heavy in number of units and AI going on, as it would put different stress on GPUs to get a better picture of the whole thing.


----------



## Mahigan

More info here:

http://hardforum.com/showthread.php?p=1041825924&posted=1#post1041825924

Bindless or overlapping means "out of order". What I've explained already.

In theory Maxwell2 was supposed to support it. Up to tier 2

Maxwell doesn't support it. Tier 1.

GCN does all the way to tier 3.

Problem is Oxide confirmed it doesn't work for Maxwell/2.

Which means something doesn't work in NVIDIA's implementation.

NVIDIA asked Oxide to "shut it down" and worked with Oxide on further optimizations. Then NVIDIA went silent.

The goal is to get NVIDIA to speak up on the issue. Why doesn't it work?


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Bindless or overlapping means "out of order". What I've explained already.
> 
> In theory Maxwell2 was supposed to support it. Up to tier 2
> 
> Maxwell doesn't support it. Tier 1.
> 
> GCN does all the way to tier 3.
> 
> Problem is Oxide confirmed it doesn't work for Maxwell/2.
> 
> Which means something doesn't work in NVIDIA's implementation.


The Oxide guy said Nvidia does have Tier 2.
Quote:


> . The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant.


----------



## Mahigan

NVIDIA claimed they supported it. In practice they don't appear too. That's what Oxide was saying.

So the question is why???

And since all the big tech sites are silent. I'm making noise about it. We need to know what is happening.

Oxide, like us, doesn't understand why NVIDIA was getting such poor performance with it turned on. The only difference is the tier level. But it should work on Maxwell 2. It ended up dropping performance once turned on. So for all intents and purposes, Maxwell 2 is not tier 2. It is tier 1. Now why is that?

I think NVIDIA is hoping to bury this. If that's the case... It ain't fair to GTX 980 Ti owners.

So make some noise will ya? Stop arguing with me and demand an answer.


----------



## Kpjoslee

[
Quote:


> Originally Posted by *Mahigan*
> 
> So the question is why???
> 
> And since all the big tech sites are silent. I'm making noise about it. We need to know what is happening.


Um, I would like to know but I don't think they are obligated to open up everything about what is under the hood. It is probably unwise for them to make a big fuss about a game still in alpha state and has no advantage in doing so.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Oxide, like us, doesn't understand why NVIDIA was getting such poor performance with it turned on. The only difference is the tier level. But it should work on Maxwell 2. It ended up dropping performance once turned on. So for all intents and purposes, Maxwell 2 is not tier 2. It is tier 1. Now why is that?


But resource binding level isn't tied to async computing directly, so tier level shouldn't cause them to have to disable async, right? And not supporting async shouldn't mean it is automatically tier 1.


----------



## Mahigan

Resource binding tiers are the async tiers.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I think NVIDIA is hoping to bury this. If that's the case... It ain't fair to GTX 980 Ti owners.
> 
> So make some noise will ya? Stop arguing with me and demand an answer.


what if nvidia is trying to avoid games that requires Asyncrhnous shaders in DX12,and they havent get in touch with oxide because they are trying to keep the silence about the performance,after all they did the same with kepler and the driver support,when they stopped updating them(as you can see on gamegpu/guru3d latest tests)


----------



## Mahigan

Maxwell/2 aren't executing the workloads efficiently. That's why Oxide was getting low performance.

This is understandable for Maxwell but not for Maxwell 2.

Maxwell 2 should have worked. It didn't. It wasn't executing the loads efficiently. Hence the low performance Oxide mentioned. NVIDIA asked Oxide to just shut it down. Instead they worked with Oxide to program a vendor ID specific path. This took a LONG time to program as mentioned by the Oxide dev.

If it was a driver issue... NVIDIA could have fixed it.

I think NVIDIA borked the Maxwell 2 GPUs. It works in their professional cards but not on the consumer GeForce level.

If true, this would require a hardware fix... Taping out a new GPU and re-releasing Maxwell 2.

Maybe NVIDIA don't want the bad PR associated with that.

This pretty much means Pascal will support Tier2 or 3 if true.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> Maxwell/2 aren't executing the workloads efficiently. That's why Oxide was getting low performance.


with Maxwell you mean GM107(750/ti)?


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> Maxwell/2 aren't executing the workloads efficiently. That's why Oxide was getting low performance.
> 
> This is understandable for Maxwell but not for Maxwell 2.
> 
> Maxwell 2 should have worked. It didn't. It wasn't executing the loads efficiently. Hence the low performance Oxide mentioned. NVIDIA asked Oxide to just shut it down. Instead they worked with Oxide to program a vendor ID specific path. This took a LONG time to program as mentioned by the Oxide dev.
> 
> If it was a driver issue... NVIDIA could have fixed it.
> 
> I think NVIDIA borked the Maxwell 2 GPUs. It works in their professional cards but not on the consumer GeForce level.


There are quite a number of Nvidia GPUs out there that aren't Maxwell 2 lol.


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> So the question is why???
> 
> And since all the big tech sites are silent. I'm making noise about it. We need to know what is happening.


Because it's the weekend. Do you expect anyone to be working on optimizations of an unreleased game during a weekend?


----------



## Mahigan

The benchmarks were released on August the 17th. NVIDIA have been silent since then.


----------



## umeng2002

Quote:


> Originally Posted by *Mahigan*
> 
> Maxwell/2 aren't executing the workloads efficiently. That's why Oxide was getting low performance.
> 
> This is understandable for Maxwell but not for Maxwell 2.
> 
> Maxwell 2 should have worked. It didn't. It wasn't executing the loads efficiently. Hence the low performance Oxide mentioned. NVIDIA asked Oxide to just shut it down. Instead they worked with Oxide to program a vendor ID specific path. This took a LONG time to program as mentioned by the Oxide dev.
> 
> If it was a driver issue... NVIDIA could have fixed it.
> 
> I think NVIDIA borked the Maxwell 2 GPUs. It works in their professional cards but not on the consumer GeForce level.


Well I would hope, if this is true, that nVidia is simply taking longer to implement it in their drivers instead of covering up the issue and brushing it off like the 3.5 GB issue (which I still laugh every time I see a site list the specs of the GTX 970 having 4 GB of VRAM).

The industry needs games to push the API spec and not one vendor paying a developer to not use a feature because it makes their GPU perform bad.


----------



## Mahigan

@PontiacGTX yep.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> The benchmarks were released on August the 17th. NVIDIA have been silent since then.


Not entirely, they made the statements about MSAA not working right.


----------



## Mahigan

It already works on their professional cards. Shouldn't have been hard to implement in Maxwell 2. They also have been working with Oxide for quite some time. Rather than asking Oxide to shut it down... You'd think they would have had ample time to implement it in their driver no?


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> The benchmarks were released on August the 17th. NVIDIA have been silent since then.


I don't expect them to make a big noise unless the final benchmarks shows the same results we are seeing atm. I don't see a big problem with Nvidia providing the path that would work better for their GPUs, since it has no affect on how it would perform on AMD GPUs.


----------



## Mahigan

@Forceman ok August 18 or so.

@Kpjoslee I do see a problem. They invested a lot of time with Oxide. Over a year.


----------



## SimBy

Quote:


> Originally Posted by *Forceman*
> 
> Not entirely, they made the statements about MSAA not working right.


Which is kinda weird when you think about it. MSAA is a non issue. It has nothing to do with NV performance or lack of it.


----------



## Mahigan

"NV Fermi: Max UAV is limited to 8 -> TIER1
NV Kepler: Max UAV is limited to 8 -> TIER1
NV Maxwellv1: Max UAV is limited to 8 -> TIER1
NV Maxwellv2: SRVs/stage is limited to 2^20 -> TIER2
Intel Gen7dot5/Gen8: the universal hardware binding table is limited to 255 slot -> TIER1
AMD GCN v1/v2/v3...: GCN is designed to a simplified resource model, so this architecture works more like a CPU than a GPU. This will allow unlimited resource binding -> TIER3"

Seems to me the msaa issue was just to cast doubt on Oxide. It was a non-issue. Worked well. Even Joel Hurska over at extremetech was perplexed.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> @Kpjoslee I do see a problem. They invested a lot of time with Oxide. Over a year.


Surely they did, by seeing how it performs well on their DX11 driver. Perhaps they didn't expect less than stellar results on their hardware on their GPUs under DX12 lol. But still, they don't have to create another publicity that is disadvantageous for them. AMD can open up as much as they want to because results are certainly favoring them. I think AMD would have been in same stance if the results showed completely opposite of what we are seeing right now.


----------



## Mahigan

I think that this may be part of the reason the Oxide dev came here. NVIDIA is casting doubts on their work, NVIDIA fans attacked Oxide. I think it's a diversion.

I think NVIDIA is hoping to release Pascal before any of this becomes public knowledge.

I really think something is wrong with Maxwell 2.

Everyone is busy attacking Oxide. Claiming they're biased. Oxide came here to give us a better idea of what happened. They made extra sure to let us all know they're not biased. Their code is good. They've worked extensively with NVIDIA. Etc.

All this leads me to think something is broken in Maxwell 2.

And the only way we'll know for sure is to push for a response.


----------



## Kand

We can keep speculating but one thing is sure.

Nobody will play this game.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> I think that this may be part of the reason the Oxide dev came here. NVIDIA is casting doubts on their work, NVIDIA fans attacked Oxide. I think it's a diversion.
> 
> I think NVIDIA is hoping to release Pascal before any of this becomes public knowledge.
> 
> I really think something is wrong with Maxwell 2.
> 
> Everyone is busy attacking Oxide. Claiming they're biased. Oxide came here to give us a better idea of what happened. They made extra sure to let us all know they're not biased. Their code is good. They've worked extensively with NVIDIA. Etc.
> 
> All this leads me to think something is broken in Maxwell 2.
> 
> And the only way we'll know for sure is to push for a response.


I really don't think something is wrong with Maxwell 2, by seeing how close Fury X and 980Ti are close under DX12. Nvidia's DX11 results are fine, but only issues we are seeing right now is how it performs under DX12. SInce their DX12 version is pretty much doing identical stuff that is on DX11 version, I don't think hardware is the issue, but either the Oxide's code under DX12 for Nvidia GPU or the Nvidia DX12 driver itself. Oxide and Nvidia are playing blame game right now but no one knows at the moment. Oxide also confirmed that they are looking into that.


----------



## Forceman

Quote:


> Originally Posted by *Kpjoslee*
> 
> I really don't think something is wrong with Maxwell 2, by seeing how close Fury X and 980Ti are close under DX12. Nvidia's DX11 results are fine, but only issues we are seeing right now is how it performs under DX12. SInce their DX12 version is pretty much doing identical stuff that is on DX11 version, I don't think hardware is the issue, but either the Oxide's code under DX12 for Nvidia GPU or the Nvidia DX12 driver itself. Oxide and Nvidia are playing blame game right now but no one knows at the moment. Oxide also confirmed that they are looking into that.


Well, the question would be whether Nvidia's performance would be better if they could have left the async path on,instead of disabling it. And we'll probably never know.


----------



## Mahigan

@Kpjoslee the oxide dev told us the reason behind the poor NVIDIA performance.

They turned off async compute because it performed even worse than the result we see now.

The results we see now are from a Vendor ID specific path. This means that once the Ashes of the Singularity detects NVIDIA hardware, it runs a non-async path.

Since Ashes of the Singularity makes relatively little use of Post Processing effects the boost we see for AMD hardware, due to Async, is minor. That being said it is enough to push a 290x into a parity position with a GTX 980 Ti. The GTX 980 Ti is just running a traditional path. That's why the performance is similar to DX11.

DX11 is, at times, a bit faster because of driver interventions.

People find this weird, as did I. But this pretty much is what we're going to expect from DX12 gaming if Async is used.

Pretty much... There won't be asynchronous compute tier 2 on Maxwell 2. The hardware has a defect. A defect which drives performance down when Async is turned on.

I've looked into it. As you've all read. And I think it is either a defect with the second DMA engine or with the L2 Cache.

For all those who bought Maxwell 2 cards... It seems you won't benefit from Async Compute.

I think you deserve a response from NVIDIA. Maybe you disagree. To me it seems that with the level of Async compute titles on the horizon, you deserve an explanation. Especially when you consider that review sites, quoting NVIDIA, now have their credibility in question.

That's not honest imo.


----------



## Majin SSJ Eric

I doubt this is really that big of a deal tbh. I think its just pretty embarrassing for Nvidia to have these horrid DX12 numbers being reported in the first real DX12 bench and it appears to vindicate what AMD have been saying all along: That once DX12 comes out their cards are going to vastly outperform Nvidia's cards. Do I buy that for one second? Nope, but this is the only test out now and its getting all this press so Nvidia (doing what they always do when under pressure) is on the attack which ironically makes them look even worse. Best thing to do here is let AMD have their little moment in the sun and sit back on their massive market share domination. I mean, by the time more relevant DX12 games start coming out I'm sure that Nvidia will have straightened things out (and most importantly they will have ditched Maxwell and moved on to Pascal) so these early results will be long forgotten....


----------



## Forceman

Quote:


> Originally Posted by *Majin SSJ Eric*
> 
> I think its just pretty embarrassing for Nvidia to have these horrid DX12 numbers being reported in the first real DX12 bench and it appears to vindicate what AMD have been saying all along: .


Are they really so horrid though? The 980 Ti still matches the Fury X, and the 980 matches to 390X.

Edit: I guess the 980 trails a little more than I remembered, but still not really horrid. DX12 being slower than DX11 is weird though.


----------



## Majin SSJ Eric

Quote:


> Originally Posted by *Forceman*
> 
> Are they really so horrid though? The 980 Ti still matches the Fury X, and the 980 matches to 390X.
> 
> Edit: I guess the 980 trails a little more than I remembered, but still not really horrid. DX12 being slower than DX11 is weird though.


I meant horrid DX12 improvement over DX11. Sure with DX12 the cards are pretty much even but had the 980 received just a small bump from DX12 then they would still maintain a substantial advantage over the 390X rather than appearing to be rather underwhelming (considering current pricing). Sure, sure Nvidia can fall back on their superior DX11 performance but eventually they will have to show competence with DX12 (though I bet that won't happen til Pascal after they've long since abandoned Maxwell)...


----------



## provost

Quote:


> Originally Posted by *Kpjoslee*
> 
> I really don't think something is wrong with Maxwell 2, by seeing how close Fury X and 980Ti are close under DX12. Nvidia's DX11 results are fine, but only issues we are seeing right now is how it performs under DX12. SInce their DX12 version is pretty much doing identical stuff that is on DX11 version, I don't think hardware is the issue, but either the Oxide's code under DX12 for Nvidia GPU or the Nvidia DX12 driver itself. Oxide and Nvidia are playing blame game right now but no one knows at the moment. Oxide also confirmed that they are looking into that.


I don't know how much of this is going to matter to the Maxwell owners, depending on their sensitivity to this issue, but Nvidia doesn't play the blame game. If Nvidia has a legitimate upper hand in this argument, based on facts, it would just burry Oxide, but that clearly hasn't happened, which makes Oxide's argument more plausible ... Lol


----------



## Mahigan

The Fury-X performance is another matter entirely. Look at the 290x and the GTX 980 Ti. That's what DX12 games, with a little amount of async shading, will look like. Now imagine the 30% boost when a lot of async shading will be utilized. Do you see what I'm getting at? A 290x faster than a GTX 980 Ti.

Do you think that is just?

Fury-X has odd behavior. It should be faster than both a GTX 980 Ti and a 290x by a fair amount. I think it will be. Once you add more Async compute heavy tasks. Or at least, it won't drop in performance the way a GTX 980 Ti and 290x will.

Fury-X to me, is bottlenecked elsewhere. But that's because Ashes of the Singularity is not an Async benchmark, as Oxide told us.

In other words. The GTX 980 Ti is bottlenecked by compute (lack of Async requiring it run its compute and graphic tasks in serial like DX11 ot Maxwell). Since Maxwell 2 is better at compute than Kepker, it still has an edge. Both are running their graphic and compute tasks synchronysly. Just as they do in DX11.

So don't expect better performance from Maxwell 2 under DX12.

In game titles ported over from consoles with Async, Fury-X will fly by them all. Assuming it doesn't hit whatever Oxide is heavy on which is bottlenecking it right now.

R9 290x, will fly by a GTX 980 Ti. Probably sharing the same or near same bottleneck as the Fury-X.

And given that console titles are light on fillrate, geometry etc (using an APU), both the 290x and Fury-X won't likely be bottlenecked the same way they are on Ashes of the Singularity.

So what's left to figure out now, is what is holding back the Fury-X. This will give us an idea as to what to expect, going forward.

Of course this also means, we won't see Asynchronous shading, on GameWorks titles. At least not until Pascal hits. Which will result in another Kepler cripple effect for current Maxwell 2 owners.

That's what I think. Putting all the pieces together.


----------



## Vesku

Any word from Nvidia on whether Maxwell's Async Compute support can be improved? As said earlier if not it will probably lead to some even more 'interesting' Nvidia Gameworks implementations in the future. They'll have to make sure AMD cards are even more drastically crippled in comparison to Nvidia's on their sponsored titles to offset not having a usable Async Compute path in all the non-Gameworks titles.


----------



## Forceman

Quote:


> Originally Posted by *Majin SSJ Eric*
> 
> I meant horrid DX12 improvement over DX11. Sure with DX12 the cards are pretty much even but had the 980 received just a small bump from DX12 then they would still maintain a substantial advantage over the 390X rather than appearing to be rather underwhelming (considering current pricing). Sure, sure Nvidia can fall back on their superior DX11 performance but eventually they will have to show competence with DX12 (though I bet that won't happen til Pascal after they've long since abandoned Maxwell)...


Yeah, hard to tell how much of that is legit DX12 improvement for AMD though, and how much is just really bad DX11 coding/performance. A 390X shouldn't be getting half the frame rate of a 980 in DX11.

Quote:


> Originally Posted by *Mahigan*
> 
> The Fury-X performance is another matter entirely. Look at the 290x and the GTX 980 Ti. That's what DX12 games, with a little amount of async shading, will look like. Now imagine the 30% boost when a lot of async shading will be utilized. Do you see what I'm getting at? A 290x faster than a GTX 980 Ti.


If the 290X matches the 980Ti, why does the 390x only match the 980? Maybe that 290X/980ti test was flawed.


----------



## Mahigan

I see a lot of people still using a GTX 780 Ti. So I assume a lot of people will still be using their GTX 980 Ti once Pascal is released.

I feel bad for these folks. They dropped $650 for what? A years worth of gaming?

This is a bad trend I see from NVIDIA. Sure, it helps their bottom line... Selling new GPUs by building obsolecense into their previous ones. But in a years time... The GTX 980 Ti will be losing to Hawaii. That just doesn't seem right to me.

If you all don't care. Then more power to you.

One thing is for sure. Greenland and Pascal aren't even out yet. And for the first time in my life. I've already made my choice on which one I'm going to purchase.


----------



## garwynn

Quote:


> Originally Posted by *Forceman*
> 
> But resource binding level isn't tied to async computing directly, so tier level shouldn't cause them to have to disable async, right? And not supporting async shouldn't mean it is automatically tier 1.


Definition of tiers is set by MS - https://msdn.microsoft.com/en-us/library/windows/desktop/Dn899127(v=VS.85).aspx
(This will also be in tomorrow's analysis as a link.)


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> I see a lot of people still using a GTX 780 Ti. So I assume a lot of people will still be using their GTX 980 Ti once Pascal is released.
> 
> I feel bad for these folks. They dropped $650 for what? A years worth of gaming?
> 
> This is a bad trend I see from NVIDIA. Sure, it helps their bottom line... Selling new GPUs by building obsolecense into their previous ones. But in a years time... The GTX 980 Ti will be losing to Hawaii. That just doesn't seem right to me.
> 
> If you all don't care. Then more power to you.
> 
> One thing is for sure. Greenland and Pascal aren't even out yet. And for the first time in my life. I've already made my choice on which one I'm going to purchase.


Based on one game still in alpha? Nothing against your analysis but jumping to a conclusions too soon I am afraid. By the time DX12 only titles hit, this argument might become irrelevant lol.


----------



## umeng2002

The only hope for Maxwell 2 with coming console ports using heavy Async Compute is that desktop GPU's might be so much more powerful than a console, console games' Async Compute still won't stress Maxwell 2 as much as ground-up PC titles like AotS, which might hit it harder since devs are expecting more power to be there on the desktop.


----------



## Remij

Quote:


> Originally Posted by *Mahigan*
> 
> I see a lot of people still using a GTX 780 Ti. So I assume a lot of people will still be using their GTX 980 Ti once Pascal is released.
> 
> I feel bad for these folks. They dropped $650 for what? A years worth of gaming?
> 
> This is a bad trend I see from NVIDIA. Sure, it helps their bottom line... Selling new GPUs by building obsolecense into their previous ones. But in a years time... The GTX 980 Ti will be losing to Hawaii. That just doesn't seem right to me.
> 
> If you all don't care. Then more power to you.
> 
> One thing is for sure. Greenland and Pascal aren't even out yet. And for the first time in my life. I've already made my choice on which one I'm going to purchase.


Man you are so sensationalist.

Can't wait till you're proven wrong


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> I see a lot of people still using a GTX 780 Ti. So I assume a lot of people will still be using their GTX 980 Ti once Pascal is released.
> 
> I feel bad for these folks. They dropped $650 for what? A years worth of gaming?
> 
> This is a bad trend I see from NVIDIA. Sure, it helps their bottom line... Selling new GPUs by building obsolecense into their previous ones. But in a years time... The GTX 980 Ti will be losing to Hawaii. That just doesn't seem right to me.
> 
> If you all don't care. Then more power to you.
> 
> One thing is for sure. Greenland and Pascal aren't even out yet. And for the first time in my life. I've already made my choice on which one I'm going to purchase.


It's posts like this that make people question your objectivity on this subject.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> I see a lot of people still using a GTX 780 Ti. So I assume a lot of people will still be using their GTX 980 Ti once Pascal is released.
> 
> I feel bad for these folks. They dropped $650 for what? A years worth of gaming?
> 
> This is a bad trend I see from NVIDIA. Sure, it helps their bottom line... Selling new GPUs by building obsolecense into their previous ones. But in a years time... The GTX 980 Ti will be losing to Hawaii. That just doesn't seem right to me.
> 
> If you all don't care. Then more power to you.
> 
> One thing is for sure. Greenland and Pascal aren't even out yet. And for the first time in my life. I've already made my choice on which one I'm going to purchase.


I wouldn't say no one cares, as this thread's views suggest otherwise. I also believe that Maxwell owners are watching this debate very closely, as they have "skin in the game".
However, Maxwell owners are likely to be the most loyal of Nvidia customers, given the short upgrade cycles Nvidia has been pushing out due to being stuck on the same 28nm mode for so long . And, a lot of existing Maxwell owners have simply been upgrading from one generation of 28nm video card to another, so they are repeat NV customers.
I think they are going to give Nvidia benefit of the doubt, for now, to respond to the points raised by you and Oxide. If Nvidia doesn't respond in a relatively short period of time, then the Maxwell owners (being the nervous bunch all cutting edge enthusiasts tend to be... Lol) will quietly begin unloading their cards before the propagated theory becomes widely known fact. It doesn't help them to raise hell about it before then, as that would just lower the resale value for these consummate upgraders... Lol

This how I see it anyway, assuming what you have proposed turns out to be how it is, i.e, Maxwell being Dx12 gimped.


----------



## Casey Ryback

Quote:


> Originally Posted by *Kpjoslee*
> 
> Based on one game still in alpha? Nothing against your analysis but jumping to a conclusions too soon I am afraid.


Spot on.
Quote:


> Originally Posted by *Remij*
> 
> Man you are so sensationalist.


Welcome to tech forums


----------



## Mahigan

@Forceman

The 290x was a tad slower than the GTX 980 Ti.

The 390x was a tad faster than the GTX 980. Up to 13.4% faster.

You're looking at
6.1 Tflops GTX 980 Ti
5.8 Tflops R9 290x

5.6 Tflops GTX 980
5.9 Tflops R9 390x

NVIDIA are compute bottlenecked from a lack of async support.

AMD wasn't getting good compute efficiency under DX11, now they do. They're not compute bottlenecked.

I think the only oddball result is the Fury-X. I think that can be explained by the level of Z/stencil work for all that shadowing being done (look at the Ashes of the Singularity settings).

Fury-X and the 290/390x cards have the same Z/stencil rate. So they're not compute bottlenecked. They're z/stencil rate bottlenecked.


----------



## FlankerWang

Quote:


> Originally Posted by *Mahigan*
> 
> @Forceman
> 
> The 290x was a tad slower than the GTX 980 Ti.
> 
> The 390x was a tad faster than the GTX 980. Up to 13.4% faster.
> 
> You're looking at
> 6.1 Tflops GTX 980 Ti
> 5.8 Tflops R9 290x
> 
> 5.6 Tflops GTX 980
> 5.9 Tflops R9 390x
> 
> NVIDIA are compute bottlenecked from a lack of async support.
> 
> AMD wasn't getting good compute efficiency under DX11, now they do. They're not compute bottlenecked.
> 
> I think the only oddball result is the Fury-X. I think that can be explained by the level of Z/stencil work for all that shadowing being done (look at the Ashes of the Singularity settings).
> 
> Fury-X and the 290/390x cards have the same Z/stencil rate. So they're not compute bottlenecked. They're z/stencil rate bottlenecked.


From TPU's database, i found 4.6Tflops GTX 980 & 5.6Tflops GTX 980 Ti


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> @Forceman
> 
> The 290x was a tad slower than the GTX 980 Ti.
> 
> The 390x was a tad faster than the GTX 980. Up to 13.4% faster.
> 
> You're looking at
> 6.1 Tflops GTX 980 Ti
> 5.8 Tflops R9 290x
> 
> 5.6 Tflops GTX 980
> 5.9 Tflops R9 390x
> 
> NVIDIA are compute bottlenecked from a lack of async support.
> 
> AMD wasn't getting good compute efficiency under DX11, now they do. They're not compute bottlenecked.
> 
> I think the only oddball result is the Fury-X. I think that can be explained by the level of Z/stencil work for all that shadowing being done (look at the Ashes of the Singularity settings).
> 
> Fury-X and the 290/390x cards have the same Z/stencil rate. So they're not compute bottlenecked. They're z/stencil rate bottlenecked.


The problem is that you are using different tests to make those comparisons, and there is no transitive property for benchmarks (settings play too big a part). Until someone runs a single test with all the cards, we aren't going to know the relative rankings.


----------



## mav451

I agree that Mahigan may have revealed his hand ultimately, but nonetheless it's a great question to ask.
Mind you my first cards were Matrox and ATi - I have no issues changing camps if it warrants it haha.

If the winds are shifting - one should be paying full attention. There is no need to get emotionally attached haha.


----------



## Mahigan

Not exactly the same z/stencil rate, give or take depending on the operating clock speeds.

And I don't see what is wrong with me saying I'm going to buy Greenland. I'm quite certain that what Oxide is saying, and what I stated by observing the results in theory, point to the lack of async support tier 2 in Maxwell 2.

NVIDIA told everyone they could do it. Even Anandtech. Why would I chance buying a $650 graphics card from NVIDIA, in the future, seeing what they've done with Kepler and now potentially Maxwell/2?

I can overlook them doing it once with Kepker. Fine. And I have.

If Maxwell 2 doesn't end up supporting Async tier 2 then you can be sure GameWorks titles won't until Pascal is released. Pascal will undoubtedly be tier 2 if not 3. Once Pascal releases GameWorks titles will support Async. That means hitting the GTX 980 Ti hard.

That would be twice in a row they do this. Why would I buy a GPU from them if that's the case?

That's making a rational buying decision based on an objective perspective.

Typo. 5.4 not 5.6 and I obtained it from here: http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4

And it depends on the boost clock. It varies up and down depending on load.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> @Forceman
> 
> The 290x was a tad slower than the GTX 980 Ti.
> 
> The 390x was a tad faster than the GTX 980. Up to 13.4% faster.
> 
> You're looking at
> 6.1 Tflops GTX 980 Ti
> 5.8 Tflops R9 290x
> 
> 5.6 Tflops GTX 980
> 5.9 Tflops R9 390x
> 
> NVIDIA are compute bottlenecked from a lack of async support.
> 
> AMD wasn't getting good compute efficiency under DX11, now they do. They're not compute bottlenecked.
> 
> I think the only oddball result is the Fury-X. I think that can be explained by the level of Z/stencil work for all that shadowing being done (look at the Ashes of the Singularity settings).
> 
> Fury-X and the 290/390x cards have the same Z/stencil rate. So they're not compute bottlenecked. They're z/stencil rate bottlenecked.


Hmm, if they were truly compute bottlenecked, then Nvidia's DX11 results should have been hammered as well. If Nvidia's DX11 performance, with its serialized path instead of running in parellel in async, can perform so well doing rendering identical stuff with what is on DX12, how can it be so compute bottlenecked?
Basically, comparing Maxwell to Hawaii is like comparing 8 core 8-way hyperthreaded CPU to 32 core CPU. It is impossible to make apple-to-apple comparison. It will be heavily dependent upon how the game is coded in respect to each architecture. I don't think you can simply declare NVIDIA are compute bottlenecked.


----------



## Mahigan

Its very simple. Read what the Oxide dev stated. Async turned off on a vendor I'd specific path. Therefore you have GCN doing its compute tasks asynchronously, NVIDIA optimizing sharers with Oxide instead.

Therefore GCN won't be compute bottlenecked under Ashes of the Singularity. NVIDIA will. NVIDIA are doing great at efficiency but not as well as GCN. That's only logical.

Now we try and find out what's limiting Fury-X. Well... 16 million or 8 million, depending on the setting chosen, terrain shadow samples and shadow samples set to high.

Evidently, Ashes of the Singularity is high on shadowing.

Shadowing requires Z/stencil work. Low and behold... That's a point of commonality between Fiji and Hawaii.

Seems like a rather logical place to look for a bottleneck.

NVIDIA have a much higher Z/stencil rate (which is based on your capacity for pixel shading which is bound to Rops).

Seems entirely logical.

DX11 allows room for driver interventions. That's one thing and DX11 performance is sometimes lower and sometimes higher than their DX12 performance is ashes of the singularity.


----------



## PirateZ

So who's winning?


----------



## Dudewitbow

Quote:


> Originally Posted by *PirateZ*
> 
> So who's winning?


It's not about whose winning, its more about the true potential of current gen hardware, and its potential future within the next few years for those who bought into it and not looking to upgrade.


----------



## Forceman

Quote:


> Originally Posted by *PirateZ*
> 
> So who's winning?


No one knows, because we only have a single data point and it's hard to get a trend line from a single point.


----------



## Kollock

Quote:


> Originally Posted by *Mahigan*
> 
> @Kpjoslee the oxide dev told us the reason behind the poor NVIDIA performance.
> 
> They turned off async compute because it performed even worse than the result we see now.
> 
> The results we see now are from a Vendor ID specific path. This means that once the Ashes of the Singularity detects NVIDIA hardware, it runs a non-async path.
> 
> Since Ashes of the Singularity makes relatively little use of Post Processing effects the boost we see for AMD hardware, due to Async, is minor. That being said it is enough to push a 290x into a parity position with a GTX 980 Ti. The GTX 980 Ti is just running a traditional path. That's why the performance is similar to DX11.
> 
> DX11 is, at times, a bit faster because of driver interventions.
> 
> People find this weird, as did I. But this pretty much is what we're going to expect from DX12 gaming if Async is used.
> 
> Pretty much... There won't be asynchronous compute tier 2 on Maxwell 2. The hardware has a defect. A defect which drives performance down when Async is turned on.
> 
> I've looked into it. As you've all read. And I think it is either a defect with the second DMA engine or with the L2 Cache.
> 
> For all those who bought Maxwell 2 cards... It seems you won't benefit from Async Compute.
> 
> I think you deserve a response from NVIDIA. Maybe you disagree. To me it seems that with the level of Async compute titles on the horizon, you deserve an explanation. Especially when you consider that review sites, quoting NVIDIA, now have their credibility in question.
> 
> That's not honest imo.


I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.

In regards to the purpose of Async compute, there are really 2 main reasons for it:

1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.

2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.

It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.

Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.

I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


----------



## Noufel

Quote:


> Originally Posted by *Kollock*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Mahigan*
> 
> @Kpjoslee the oxide dev told us the reason behind the poor NVIDIA performance.
> 
> They turned off async compute because it performed even worse than the result we see now.
> 
> The results we see now are from a Vendor ID specific path. This means that once the Ashes of the Singularity detects NVIDIA hardware, it runs a non-async path.
> 
> Since Ashes of the Singularity makes relatively little use of Post Processing effects the boost we see for AMD hardware, due to Async, is minor. That being said it is enough to push a 290x into a parity position with a GTX 980 Ti. The GTX 980 Ti is just running a traditional path. That's why the performance is similar to DX11.
> 
> DX11 is, at times, a bit faster because of driver interventions.
> 
> People find this weird, as did I. But this pretty much is what we're going to expect from DX12 gaming if Async is used.
> 
> Pretty much... There won't be asynchronous compute tier 2 on Maxwell 2. The hardware has a defect. A defect which drives performance down when Async is turned on.
> 
> I've looked into it. As you've all read. And I think it is either a defect with the second DMA engine or with the L2 Cache.
> 
> For all those who bought Maxwell 2 cards... It seems you won't benefit from Async Compute.
> 
> I think you deserve a response from NVIDIA. Maybe you disagree. To me it seems that with the level of Async compute titles on the horizon, you deserve an explanation. Especially when you consider that review sites, quoting NVIDIA, now have their credibility in question.
> 
> That's not honest imo.
> 
> 
> 
> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.
> 
> In regards to the purpose of Async compute, there are really 2 main reasons for it:
> 
> 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.
> 
> 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.
> 
> It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.
Click to expand...

I hope all game devs can behave like you and come to tech forums to share their information and respond to people questions









Bravo


----------



## Xuper

From Anandtech Silverforce11!

Wow I can't believe!

https://www.khronos.org/assets/uploads/developers/library/2015-gdc/Valve-Vulkan-Session-GDC_Mar15.pdf


----------



## spacin9

Quote:


> Originally Posted by *Kpjoslee*
> 
> I need to see the DX12 game that heavily stresses the GPU in more traditional sense rather than the game that is heavy in number of units and AI going on, as it would put different stress on GPUs to get a better picture of the whole thing.


I've been playing it. 90 percent or better most of the time on the GPU. The GPU is being stressed. It sits about 50 percent on all cores on the CPU. I've never seen my Titan X temp so high under gaming load. And that's just with two players. Smooth as silk most of the time @ 4K. This is going to be a great game... when they get full strategic zoom going.

Whereas with Sins or Sup Com, you'll be lucky to get out of 2d clocks. GPU has nothing to do with those games... not like this game.


----------



## Kpjoslee

Quote:


> Originally Posted by *spacin9*
> 
> I've been playing it. 90 percent or better most of the time on the GPU. The GPU is being stressed. It sits about 50 percent on all cores on the CPU. I've never seen my Titan X temp so high under gaming load. And that's just with two players. Smooth as silk most of the time @ 4K. This is going to be a great game... when they get full strategic zoom going.
> 
> Whereas with Sins or Sup Com, you'll be lucky to get out of 2d clocks. GPU has nothing to do with those games... not like this game.


Ah, good to know. I meant to say in terms of games that heavily focuses on graphics in a traditional sense, like adventure, RPG, or FPS. +Rep for you









Quote:


> Originally Posted by *Kollock*
> 
> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.
> 
> In regards to the purpose of Async compute, there are really 2 main reasons for it:
> 
> 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.
> 
> 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.
> 
> It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


Thank you for some clarifications.


----------



## spacin9

Quote:


> Originally Posted by *Kpjoslee*
> 
> Ah, good to know. I meant to say in terms of games that heavily focuses on graphics in a traditional sense, like adventure, RPG, or FPS. +Rep for you
> 
> 
> 
> 
> 
> 
> 
> 
> Thank you for some clarifications.


I understand what you're saying. I guess you're looking for post-process and ambient performance... chromatic whatever and such.

But it is completely out of bounds for a huge scale RTS to be so GPU dependent.

If the devs are reading this, can't wait for the tech upgrades to be unlocked. Hope there are some neat super-weapons and experimentals...

I'm coming into it completely cold of course. I saw and I bought. I could use a play on words with the popular Roman equivalent of my last sentence, but I'd be banned [again] for it.









I have no idea what this game is going to be. But looks great so far...


----------



## Kpjoslee

Quote:


> Originally Posted by *spacin9*
> 
> I understand what you're saying. I guess you're looking for post-process and ambient performance... chromatic whatever and such.
> 
> But it is completely out of bounds for a huge scale RTS to be so GPU dependent.
> 
> If the devs are reading this, can't wait for the tech upgrades to be unlocked. Hope there are some neat super-weapons and experimentals...
> 
> I'm coming into it completely cold of course. I saw and I bought. I could use a play on words with the popular Roman equivalent of my last sentence, but I'd be banned [again] for it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I have no idea what this game is going to be. But looks great so far...


After all this long talk about technical stuff, I almost forgot this is a game








Game certainly looks promising, definitely on the list of future purchases.


----------



## fellix

Here's a simple command-line async compute benchmark for DX12:

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-8#post-1868993

Still it's a very synthetic test and it rises more questions than answers, for now.


----------



## PirateZ

Quote:


> Originally Posted by *Dudewitbow*
> 
> It's not about whose winning, its more about the true potential of current gen hardware, and its potential future within the next few years for those who bought into it and not looking to upgrade.


Quote:


> Originally Posted by *Forceman*
> 
> No one knows, because we only have a single data point and it's hard to get a trend line from a single point.


For the past 1400 posts it seems that there's been a war trying to prove one is better than the other. When i asked a real question you just answer in a diplomatic way, i don't see any diplomacy until now just you know...


----------



## Dudewitbow

Quote:


> Originally Posted by *PirateZ*
> 
> For the past 1400 posts it seems that there's been a war trying to prove one is better than the other. When i asked a real question you just answer in a diplomatic way, i don't see any diplomacy until now just you know...


because you cant conclude anything with just 1 test. It will be concluded relative to the next set of dx12 games. only after than will paint a real picture of how developers will approach dx12. the upcoming games to watch being fable legends and the upcoming dx12 patch for ark survival evolved, both being at least for the most part GPU neutral as one is by microsoft(lionhead) and the other being a indie dev. Though its likely those will run fine on nvidia cards being that both games run on unreal4.

Basically if both games give AMD a massive performance boost and nvidia a minor one, then one can conclude that GCN at least as of that moment is better for DX12 use. But its universal that pretty much everyone agrees that Nvidia is better for DX11 use


----------



## Themisseble

http://www.pcper.com/image/view/60376?return=node%2F63602

i7 4930K bottlenecking Fury X.

- Also PCper -glare Q off, shgadows medium, TAA = 6


----------



## Lantian

Quote:


> Originally Posted by *PhantomTaco*
> 
> By all means correct me if I'm wrong but there's a few things I don't understand. For starters are these theories based on the single Ashes of the Singularity benchmark? IIRC the game was developed with AMD helping the dev out. Would it be crazy to assume there were choices made that specifically improved performance for AMD? I'm not saying they necessarily actively made choices that hampered NVIDIA intentionally or even directly, but if true I'd assume some choices made would specifically benefit AMD while not helping, or potentially hurting NVIDIA hardware. Assuming this is all still based on Ashes alone, that's a single engine. There's at least half a dozen other engines out there that either have dx12 support or have it coming that are not necessarily going to behave the same way, so doesn't it seem a bit too early to draw any conclusions based on a sample size of 1?


Yes and that is exactly why I said that this whole thread can be taken as speculation at best, yet most seem to think this one alpha engine is a perfect example of dx12...


----------



## GorillaSceptre

I won't lie and say i don't enjoy a bit of drama










__
https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/cul9auq

AMD honestly need to grow a pair and come out swinging. If all of this is true then GCN has a major advantage.

Finally some competition, about time something interesting happened in the PC space









I especially liked this part "common-sense workloads", Tessellation anyone?


----------



## charlievoviii

all this is worthless. We will wait until Dx12 games start to roll out and see who is faster. Cause so far for couple years, you get AMD hype but the end results is that AMD get beat over and over. Action speak louder than words. So until i see some real game with real results. I Couldnt care what Nvidia does or behaves. Product performance is the key decision for me and so far Nvidia been winning. Until i see AMD really doing better than i get will AMD videocard. Example Fury X super hype and cherry picking in house benchmarks. When it finally got release and benchmark by non bias people. Was it faster ? LMAO

I love how AMD and AMD fans always love to making excuses and pointing fingers. When something is better they RE quick to claim the fame, than when they are not, start calling Nvidia cheating or making excuses how it's not fair.


----------



## provost

Quote:


> Originally Posted by *Kollock*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.
> 
> In regards to the purpose of Async compute, there are really 2 main reasons for it:
> 
> 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.
> 
> 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.
> 
> It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


Thanks for posting this. I don't understand 90% of the technical stuff that you, Mahigan and others have been saying here.....lol , but the 10% I think I do understand, leads me to the ask the following :

Mahigan's theory, as far as I understand it, is that asynchronous compute is the most efficient and effective way for the developers to code for dx12 games because:

1. It is the best way to take advantage of dx12's lower cpu overhead ; and more importantly
2. It greatly reduces the need for the GPU designer/manufacturer's middleware software to mess with the game's performance as designed by the developer

I may be oversimplifying, but this is how I understand what is being theorized in this thread.

Lastly, in your opinion, are the promised dx12 benefits meaningfully real for both the pc gamers and developers, or is it just a bunch of smoke and mirrors ( a gimmick, if you will) to persuade the pc gamers to upgrade into yet another illusory benefit, once the next wave of GPUs are released championing the virtues of dx12 compliance?


----------



## charlievoviii

Quote:


> Originally Posted by *Themisseble*
> 
> http://www.pcper.com/image/view/60376?return=node%2F63602
> 
> i7 4930K bottlenecking Fury X.
> 
> - Also PCper -glare Q off, shgadows medium, TAA = 6


980Ti is also bottle necking at 1080P. If you want to see what bottle necking really is put it on a AMD CPU.


----------



## mtcn77

Quote:


> Originally Posted by *charlievoviii*
> 
> 980Ti is also bottle necking at 1080P. If you want to see what bottle necking really is put it on a AMD CPU.


HD7970 is bottlenecking on an Intel i7 4790K, too? Please stop circular logic.
Post #1257.


----------



## Mahigan

@Kollock

Thank you for the clarifications









It's not about winning. It's about gathering as much information as possible. I hope NVIDIA will provide clarifications now

The only way to get information is to create a big deal out of something. If you don't, everyone tends to remain silent.

In this case we have AMD and oxide stepping up to provide answers. Only NVIDIA are left.

Ps. Out of context is out of order in my dialect.
Quote:


> For content that does not contribute to any discussion.
> 
> [-]AMD_Robert- EmployeeAMD 11 points 14 hours ago
> 
> Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.
> 
> GCN has supported async shading since its inception, and it did so because we hoped and expected that gaming would lean into these workloads heavily. Mantle, Vulkan and DX12 all do. The consoles do (with gusto). PC games are chock full of compute-driven effects.
> 
> If memory serves, GCN has higher FLOPS/mm2 than any other architecture, and GCN is once again showing its prowess when utilized with common-sense workloads that are appropriate for the design of the architecture


Therefore, yes... Maxwell 2 has a hard time with handling too much out of context switching.

It's like GeForceFX all over again. Where NVIDIA utilized a driver compiler in order to perform shader swaps because the FX performed poorly at this task.

I support that now, with DX12, it will depend on what developers will be doing.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> there's a difference...
> 
> The Fury-X is sometimes behind the GTX 980 ti, sometimes a little ahead. This is acceptable. Though if I were to recommend someone where to spend $650 for a DX11 GPU, it would be the GTX 980 Ti.
> 
> What you don't expect is a GPU from 2013 competing with a GTX 980 Ti.


There is one thing I'm not convinced about.

They gimped the FP64 compute performance on Maxwell in favor of more gaming performance. It's a Single Precision monster, but Double Precision is what is wanted for Compute.

To be honest, it doesn't matter either way. Pascal I think will, performance wise, leave this card in the dust.

We are getting:

A die shrinks worth of more transistors
HBM2 and the bandwidth it brings
Potentially a more parallel architecture (we don't know yet)
Potentially gains comparable from SMX to SMM, and perhaps other gains (they may improve their compression again for example)
Judging by history too, the driver optimizations may stop when the Pascal GPUs come out. And in the case of DX12, driver optimizations are simply not possible, so perhaps much sooner than that.

The Fury X I'm not sure is a good buy either for reasons discussed. Something is bottlenecking it.

Quote:


> Originally Posted by *Kollock*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.
> 
> In regards to the purpose of Async compute, there are really 2 main reasons for it:
> 
> 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.
> 
> 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.
> 
> It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.
> 
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. *Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine.* In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


If this is the case, do you think that the Fury X might see a huge leap next generation in performance compared to the 290X? There are about 45% more shaders here in the Fury X than the 290X, but we don't see scaling right now. Is that because shaders are simply such a small part of the problem overall? If there are situations where the main bottleneck is in shader effects, texture effects, or bandwidtth, the Fury X should fly circles around every other GPU. Could AMD be playing the long game here too?

But elsewhere the gains are less impressive. I'll note that the pixel fill rates did not change at all from the 290X to Fury X. They remain at 67Gpixels/s. Nvidia's jumped between the 780Ti to 980 Ti from 37 to 95 (and it's 103 on the Titan X). In other words, on the 290X, it was vastly better than the 780Ti, but now Nvidia's has managed to leapfrog in a generation. This and perhaps more rasterizers per clock are a huge matter that AMD needs to improve upon. Where the Fury X is fill rate or rasterization bottlenecked, it's going to, well, frankly, suck compared to the 980Ti and not be a big improvement over the 290X.

Another matter is - why are we not seeing gains on the memory bandwidth? In theory this should be a substantial advantage.

Quote:


> Originally Posted by *Digidi*
> 
> Maybe they didn't Change the rasterizer because it's more than enough power?
> 
> If you see the DX12 Drawcalltest in 3dmark, its also a Polygonoutput test because each drawcall existst of 112-127 Polygons. That means that AMD can put out more Polygons through its rasterizer than nvidia. So rasterizer is no bottleneck under dx12 for amd.
> 
> Having a lot rasterizer is one Thing, the other Thing is to have them feeded from the command porcessor. At this Point Nvidia is very week at high drawcalls. And its a hardwarelimit. No big changes can be made by Driver update.


Out of curiosity, what do you think is the weak point of the Fury X then? There's gotta be something as we are seeing poor scaling.

You make a good point - the Fury X is better at high draw calls and where those are a bottleneck, the Fury X should outperform the 980Ti.


----------



## pengs

It would be interesting to see a revisit of the performance per watt between GCN and Maxwell with a fully fledged DX12 title like Ashes.


----------



## Mahigan

Good job everyone









You did the work tech websites used to do.

From this point on I'm out of this conversation. Time to spend more time with the wife.


----------



## provost

Quote:


> Originally Posted by *CrazyElf*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> There is one thing I'm not convinced about.
> 
> They gimped the FP64 compute performance on Maxwell in favor of more gaming performance. It's a Single Precision monster, but Double Precision is what is wanted for Compute.
> 
> To be honest, it doesn't matter either way. Pascal I think will, performance wise, leave this card in the dust.
> 
> We are getting:
> 
> A die shrinks worth of more transistors
> HBM2 and the bandwidth it brings
> Potentially a more parallel architecture (we don't know yet)
> Potentially gains comparable from SMX to SMM, and perhaps other gains (they may improve their compression again for example)
> Judging by history too, the driver optimizations may stop when the Pascal GPUs come out. And in the case of DX12, driver optimizations are simply not possible, so perhaps much sooner than that.
> 
> The Fury X I'm not sure is a good buy either for reasons discussed. Something is bottlenecking it.
> If this is the case, do you think that the Fury X might see a huge leap next generation in performance compared to the 290X? There are about 45% more shaders here in the Fury X than the 290X, but we don't see scaling right now. Is that because shaders are simply such a small part of the problem overall? If there are situations where the main bottleneck is in shader effects, texture effects, or bandwidtth, the Fury X should fly circles around every other GPU. Could AMD be playing the long game here too?
> 
> But elsewhere the gains are less impressive. I'll note that the pixel fill rates did not change at all from the 290X to Fury X. They remain at 67Gpixels/s. Nvidia's jumped between the 780Ti to 870 Ti from 37 to 95 (and it's 103 on the Titan X). In other words, on the 290X, it was vastly better than the 780Ti, but now Nvidia's has managed to leapfrog in a generation. This and perhaps more rasterizers per clock are a huge matter that AMD needs to improve upon. Where the Fury X is fill rate or rasterization bottlenecked, it's going to, well, frankly, suck compared to the 980Ti and not be a big improvement over the 290X.
> 
> Another matter is - why are we not seeing gains on the memory bandwidth? In theory this should be a substantial advantage.
> Out of curiosity, what do you think is the weak point of the Fury X then? There's gotta be something as we are seeing poor scaling.
> 
> You make a good point - the Fury X is better at high draw calls and where those are a bottleneck, the Fury X should outperform the 980Ti.


Great points.
Quote:


> Originally Posted by *Mahigan*
> 
> Good job everyone
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You did the work tech websites used to do.
> 
> From this point on I'm out of this conversation. Time to spend more time with the wife.


Do what you love, and love what you do.

Why not become an independent columnist for overclock.net? We can use some dissenting editorials here to keep things interesting....


----------



## RushTheBus

I want to preface by saying that i appreciate the discussion and everyone's input.

I'm going to admit that i understand very little about GPU architecture or the intricacies of the direct x API. That said, after reading through this thread as well as similar threads across other tech forums i still remain confused. I'm really trying to understand whats going on here and not go to Defcon 1. I do apologize but i need it broken down a bit further so that i can get a better understanding of everything as i'm still trying to determine if there is an actual bombshell issue here or not. I do apreciate clarification ahead of time as this is going to not only help me better understand whats going on but give me additional tools to make an informed purchasing decision.

So here are my questions:

Is asynchronous compute an actual hard requirement for doing things in DX12 (ie: it must be utilized) or is it simply something DX12 supports over DX11?
Is asynchronous compute simply a technology or architecture decision that AMD has made based on their philosophies or a hard requirement based off of research into low level APIs?
Is it possible that the Oxide developers chose to utilize asynchronous compute for various implementations instead of some other method or is asynch compute the only method of doing things?
Kind Regards

Rush


----------



## mav451

Quote:


> Originally Posted by *provost*
> 
> Do what you love, and love what you do.
> 
> Why not become an independent columnist for overclock.net? We can use some *dissenting* editorials here to keep things interesting....


I would change that word to *disruptive*, not necessarily dissenting.
Dissenting implies there is a _majority_ consensus on the topic. There isn't one just yet


----------



## Forceman

Quote:


> Originally Posted by *CrazyElf*
> 
> There is one thing I'm not convinced about.
> 
> They gimped the FP64 compute performance on Maxwell in favor of more gaming performance. It's a Single Precision monster, but Double Precision is what is wanted for Compute.


There is both single and double precision compute, and gaming uses single not double.


----------



## Joe-Gamer

As long as I can near max games at 96fps (monitor oc'ed to 96hz) at 1440p I wont care that much about the competition. But damn if my 980ti becomes weaker than a standard fury or a 390x, I'll be annoyed as heck. First flag ship card, should of stuck with 290 cf. DX12 has ruined me haha


----------



## KenjiS

Quote:


> Originally Posted by *RushTheBus*
> 
> I want to preface by saying that i appreciate the discussion and everyone's input.
> 
> So here are my questions:
> 
> Is asynchronous compute an actual hard requirement for doing things in DX12 (ie: it must be utilized) or is it simply something DX12 supports over DX11?
> Is asynchronous compute simply a technology or architecture decision that AMD has made based on their philosophies or a hard requirement based off of research into low level APIs?
> Is it possible that the Oxide developers chose to utilize asynchronous compute for various implementations instead of some other method or is asynch compute the only method of doing things?
> Kind Regards
> 
> Rush


1. I do not believe Async compute is a hard requirement for DX12 or nVidia would be getting in deep poo for saying their cards are DX12 compatable.. actually I would be fairly certain they would have that yanked, it is likely an optional feature that CAN be used

2. According to what I see, support for Async compute was baked into GCN (the architecture that AMD is using) I'm guessing its a side effect of Mantle/Vulkan and their development for the PS4 and Xbox One silicon...

3. Its not the only method, but it can, in situations, improve performance, its somewhat like hyperthreading from the looks

on a side note, heres a news post from 5 months ago where AMD is talkikng about Async Compute... so this isnt really a new thing apparently, everyone just ignored it until now..

http://wccftech.com/amd-improves-dx12-performance-45-gpu-asynchronous-compute-engines/

In fact its not even really surprising...AMD built the chips for both new consoles after all.. so AMD might have known a lot more than nVidia about what was going to be happening with Dx12...

Now then, I just ordered a 980 Ti, do I regret it? Well.. no.. not really, Dx12 titles, besides Ashes of the Singularity and ARK Survival Evolved, are not quite here yet (in fact Wikipedia currently lists 6 games confirmed to use Dx12, the next game using it, as of right now, appears to be Deus Ex Mankind Divided which has a date of Feburary 16th, Will be curious to see how this turns out as AMD and Square Enix generally have close ties..also admittedly im very much looking forward to this game....)

Ashes is one benchmark, one very important Benchmark, but we're still a year away from seeing proper Dx12 titles on the market, heck, we're 6 months from the NEXT Dx12 title being on the market, by the time Dx12 is more common, Pascal will be out as will the Fury MAXX or whatever AMD will call it... its possible nVidia can get more of a gap through better driver optimization or other trickery, albeit unlikely. And its also not if we're saying the 980 Ti goes HURK and keels over dead in Ashes, its just not beating the Fury X to a pulp...

I'm actually excited in a way for a return to the old days where we had ATi and nVidia beating each other to a pulp and we watched with excited grins for the next big thing... its nice to see AMD get some ground back, I want to see the next Titan and Fury cards fighting it out, because at the end of the day all of us win.

So no, Not regretting my 980 Ti, its the best card at the moment and when something better drops I will sell it and get that. I did seriously consider the Fury X but 1. they're marked up a lot right now, 2. installing it would be a moderate inconvenience and 3. their drivers give me unpleasant flashbacks


----------



## semitope

The hard requirement for dx12 might be graphics + Copy rather than graphics + copy + compute. Maxwell, kepler support the former.


----------



## caswow

Wasnt there a talk about ms and nvidia developing dx12 for years and years even way before mantle?


----------



## RushTheBus

Thanks for the direct reply KenjiS, i do appreciate the breakdown / simplification of it for me. If i am understanding everything correctly (i still may not be), this whole discussion seems to hinged around one developers choice to employ one particular method of doing things and many seem to think that its either the only way of doing things or the defacto most efficient way?


----------



## gamervivek

Quote:


> Originally Posted by *CrazyElf*
> 
> But elsewhere the gains are less impressive. I'll note that the pixel fill rates did not change at all from the 290X to Fury X. They remain at 67Gpixels/s. Nvidia's jumped between the 780Ti to 870 Ti from 37 to 95 (and it's 103 on the Titan X). In other words, on the 290X, it was vastly better than the 780Ti, but now Nvidia's has managed to leapfrog in a generation. This and perhaps more rasterizers per clock are a huge matter that AMD needs to improve upon. Where the Fury X is fill rate or rasterization bottlenecked, it's going to, well, frankly, suck compared to the 980Ti and not be a big improvement over the 290X.
> 
> Another matter is - why are we not seeing gains on the memory bandwidth? In theory this should be a substantial advantage.
> Out of curiosity, what do you think is the weak point of the Fury X then? There's gotta be something as we are seeing poor scaling.
> 
> You make a good point - the Fury X is better at high draw calls and where those are a bottleneck, the Fury X should outperform the 980Ti.


Theoretical specifications mean next to nothing.



If the theoretical specifications mattered you'd see Fury fall way behind at 4k.


----------



## KenjiS

Quote:


> Originally Posted by *RushTheBus*
> 
> Thanks for the direct reply KenjiS, i do appreciate the breakdown / simplification of it for me. If i am understanding everything correctly (i still may not be), this whole discussion seems to hinged around one developers choice to employ one particular method of doing things and many seem to think that its either the only way of doing things or the defacto most efficient way?


Its more "it works for some games and engines and not for others"...

In _theory_ Async Compute is better, in _practice_ it may offer little to no benefit to a specific title versus the cost of implementing it

For Ashes, obviously Async Compute leads to a good performance improvement, will this be the case in every title? We don't know yet because thats basically the only Dx12 benchmark right now, so for now it stands by itself without comparison, Something to watch with interest yes, and mildly concerning to those of us with nVidia hardware, but at the end of the day its not crippling (and again, its not like the 980 Ti is now suddenly 1/3rd the performance of the Fury X) if we include titles like Thief, where AMD says Async Compute was already employed in Mantle and...you see little difference at 2560x1600..

Its just one benchmark, and too early to call if this is a real problem for nVidia or not, as I said, something to watch, but ill be enjoying my 980 Ti all the same


----------



## Noufel

Quote:


> Originally Posted by *KenjiS*
> 
> Quote:
> 
> 
> 
> Originally Posted by *RushTheBus*
> 
> Thanks for the direct reply KenjiS, i do appreciate the breakdown / simplification of it for me. If i am understanding everything correctly (i still may not be), this whole discussion seems to hinged around one developers choice to employ one particular method of doing things and many seem to think that its either the only way of doing things or the defacto most efficient way?
> 
> 
> 
> Its more "it works for some games and engines and not for others"...
> 
> In _theory_ Async Compute is better, in _practice_ it may offer little to no benefit to a specific title versus the cost of implementing it
> 
> For Ashes, obviously Async Compute leads to a good performance improvement, will this be the case in every title? We don't know yet because thats basically the only Dx12 benchmark right now, so for now it stands by itself without comparison, Something to watch with interest yes, and mildly concerning to those of us with nVidia hardware, but at the end of the day its not crippling (and again, its not like the 980 Ti is now suddenly 1/3rd the performance of the Fury X) if we include titles like Thief, where AMD says Async Compute was already employed in Mantle and...you see little difference at 2560x1600..
> 
> Its just one benchmark, and too early to call if this is a real problem for nVidia or not, as I said, something to watch, but ill be enjoying my 980 Ti all the same
Click to expand...


----------



## garwynn

Here it is folks, my piece is finally up. I have heard unofficially who Kollock may be but I'm waiting to see if I can get a confirmation on that.
http://thegametechnician.com/2015/08/31/analysis-amds-long-game-realization/

It's two pages, please look for the page counter on the bottom.


----------



## orlfman

So nvidia never said they don't support async have they?

if they drivers list support for it, but when enabled, it doesn't fully work correctly, couldn't this just be a driver issue and not a hardware issue like the 3.5gb gate was?


----------



## Wishmaker

Please tell me again how good AMD hardware is when every Mantle TITLE I have crashes and I have to run dx 11 for it to work







!


----------



## GorillaSceptre

DICE (and Frostbite) is looking pretty cosy with AMD.

If Battlefront is using Async then i'm def going with AMD, that game is going to be my jam.

As far as DX12 supported titles go, there's way more than i originally thought.

A lot of the already released AAA games have expressed interest in updating their games. Besides the ones already mentioned, games like The Witcher 3 are possibly getting updated to it, there's also talks for Arkham Knight too, + Battlefront has been strongly hinted at getting it

I hope more devs like Oxide give their opinions on this topic.


----------



## p4inkill3r

Quote:


> Originally Posted by *Wishmaker*
> 
> Please tell me again how good AMD hardware is when every Mantle TITLE I have crashes and I have to run dx 11 for it to work
> 
> 
> 
> 
> 
> 
> 
> !


Sounds to me like you have some other issues going on with your PC, but sure, blame AMD.


----------



## p4inkill3r

Quote:


> Originally Posted by *garwynn*
> 
> Here it is folks, my piece is finally up. I have heard unofficially who Kollock may be but I'm waiting to see if I can get a confirmation on that.
> http://thegametechnician.com/2015/08/31/analysis-amds-long-game-realization/
> 
> It's two pages, please look for the page counter on the bottom.


Well written and informative.


----------



## ZealotKi11er

Not sure if this has been posted: http://www.overclock3d.net/articles/gpu_displays/oxide_developer_says_nvidia_was_pressuring_them_to_change_their_dx12_benchmark/1


----------



## garwynn

Quote:


> Originally Posted by *p4inkill3r*
> 
> Well written and informative.


While I started out doing this on my own - before I knew about this thread - finding out about this discussion certainly helped shape the revisions. That's a big credit to those who have helped drive this conversation (like @Mahigan). Depending on how well this does I'll probably follow it up with some more.


----------



## GorillaSceptre

Quote:


> Originally Posted by *garwynn*
> 
> Here it is folks, my piece is finally up. I have heard unofficially who Kollock may be but I'm waiting to see if I can get a confirmation on that.
> http://thegametechnician.com/2015/08/31/analysis-amds-long-game-realization/
> 
> It's two pages, please look for the page counter on the bottom.


Appreciate the effort, it must of been a PITA to condense all the fragmented info in there.









Quote:


> Originally Posted by *ZealotKi11er*
> 
> Not sure if this has been posted: http://www.overclock3d.net/articles/gpu_displays/oxide_developer_says_nvidia_was_pressuring_them_to_change_their_dx12_benchmark/1


I think that's what Oxides response implied when they said "tread lightly".


----------



## PontiacGTX

Quote:


> Originally Posted by *Wishmaker*
> 
> Please tell me again how good AMD hardware is when every Mantle TITLE I have crashes and I have to run dx 11 for it to work
> 
> 
> 
> 
> 
> 
> 
> !


it should be your pc..


----------



## Devnant

Quote:


> Originally Posted by *orlfman*
> 
> So nvidia never said they don't support async have they?
> 
> if they drivers list support for it, but when enabled, it doesn't fully work correctly, couldn't this just be a driver issue and not a hardware issue like the 3.5gb gate was?


Nvidia should support async compute. Look here:

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9#post-1869058


----------



## mav451

I'm not seeing _any_ new information in the OC3D piece. Just a rehash of what we've been reading the past couple weeks now.


----------



## Mahigan

@Devnant

They do but...
Quote:


> AMD_Robert- EmployeeAMD 12 points 17 hours ago
> 
> Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.
> 
> GCN has supported async shading since its inception, and it did so because we hoped and expected that gaming would lean into these workloads heavily. Mantle, Vulkan and DX12 all do. The consoles do (with gusto). PC games are chock full of compute-driven effects.
> 
> If memory serves, GCN has higher FLOPS/mm2 than any other architecture, and GCN is once again showing its prowess when utilized with common-sense workloads that are appropriate for the design of the architecture


----------



## orlfman

Quote:


> Originally Posted by *Devnant*
> 
> Nvidia should support async compute. Look here:
> 
> https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9#post-1869058


so really could all of this just be a issue with drivers then? i know nvidia has been working on dx12 for awhile now, but its still new, and from seeing all the drivers being released lately, nvidia seems to have been focusing more on continuing their dx11 improvements than dx12 optimizations.

edit:
i was reading this from anandtech:
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading
Quote:


> On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once - early implementations of HyperQ cannot be used in conjunction with graphics. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage..


Maxwell 2 is post to have 1 Graphics + 31 Compute, so it does support async, well according to nvidia and this anandtech article. pre maxwell 2, including maxwell 1, does not.


----------



## garwynn

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Appreciate the effort, it must of been PITA to condense all the fragmented info in there.
> 
> 
> 
> 
> 
> 
> 
> 
> I think that's what Oxides response implied when they said "tread lightly".


It's one thing to say like what AMD's Robert Hallock did just a bit ago:

__ https://twitter.com/i/web/status/638392528593682437Quote:


> Originally Posted by *Robert Hallock (AMD)*
> tl;dr version: DX12 "async shading" = compute+gfx at once. GCN can do it. Maxwell can't.
> And being able to do gfx+compute at the same time, rather than serially, is big +perf.


It's a completely different thing to want to understand it and start digging in.
And if I was going to do it I had the mindset of a research paper - show my references so others can validate it.
And yes, it was frustrating at times to condense it... but glad I waited until this morning to put it up. Amazing what clarity reveals in a few days.

Again, thanks to everyone for the input and feedback!


----------



## Devnant

Quote:


> Originally Posted by *orlfman*
> 
> so really could all of this just be a issue with drivers then? i know nvidia has been working on dx12 for awhile now, but its still new, and from seeing all the drivers being released lately, nvidia seems to have been focusing more on continuing their dx11 improvements than dx12 optimizations.


Probably because there are no DX12 games out there, yet. But by the end of the year, with more DX12 titles available, or at least more alpha and beta DX12 titles, I'm sure we will get a better picture.


----------



## Dudewitbow

Quote:


> Originally Posted by *garwynn*
> 
> It's one thing to say like what AMD's Robert Hallock did just a bit ago:
> 
> __ https://twitter.com/i/web/status/638392528593682437It's a completely different thing to want to understand it and start digging in.
> And if I was going to do it I had the mindset of a research paper - show my references so others can validate it.
> And yes, it was frustrating at times to condense it... but glad I waited until this morning to put it up. Amazing what clarity reveals in a few days.
> 
> Again, thanks to everyone for the input and feedback!


I find that a good way to visualize what is happening is on

__
https://www.reddit.com/r/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/cullj3d
 for those who dont readily absorb computer lingo


----------



## sugarhell

New amd drivers with aots optimizations. Anyone wanna try?


----------



## delboy67

Quote:


> Originally Posted by *sugarhell*
> 
> New amd drivers with aots optimizations. Anyone wanna try?


Hopefully a fury owner could pop by?


----------



## xxdarkreap3rxx

Quote:


> Originally Posted by *sugarhell*
> 
> New amd drivers with aots optimizations. Anyone wanna try?


Yes, agreed. I just asked in that thread if numbers would change and came here to ask if someone could run them with the new driver.


----------



## mav451

Haha speaking of contacting AMD:

About a month ago, I had contacted Hollack about confirming if Nano was actually full Fiji (when nobody knew for sure).
I never actually got a response lol


----------



## garwynn

Quote:


> Originally Posted by *delboy67*
> 
> Hopefully a fury owner could pop by?


I can give it a go later tonight.
(I don't have AotS but I'll see if I can get it. If I can't I'll ask someone I know that has it if he can.)


----------



## spacin9

Wow... I'm really being schooled by this thread.

So basically, In DX 12, I have two glorified FX 5900 Ultras and a bunch of VRAM I'll probably never use. Cool.

I want strategic zoom, nukes and Galactic Colossus NOW! haha... j/k.

No really I want them now.


----------



## garwynn

Quote:


> Originally Posted by *spacin9*
> 
> Wow... I'm really being schooled by this thread.
> 
> So basically, In DX 12, I have two glorified FX 5900 Ultras and a bunch of VRAM I'll probably never use. Cool.
> 
> I want strategic zoom, nukes and Galactic Colossus NOW! haha... j/k.
> 
> No really I want them now.


Not really, GTX 970s will still be awesome in DX11 - it's DX12 where you may see a difference.

I'm sincerely curious as to what NVIDIA may have in the works to address async shading.
One thing is for certain, they're not going to let it go if they're truly behind. Not after years of fighting to get on top.
For the two companies, the battle ensues. For us? It's a great time to be a gamer or PC enthusiast.


----------



## Xuper

Oh god , ppl went mad : (1450 Comments)


__
https://www.reddit.com/r/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/cullj3d

Will be like GTX 970 fiasco? i don't know.


----------



## infranoia

All right, now we need to see what the brand-new 15.8 beta drivers do for the Fury X in Ashes.
Quote:


> * Ashes of the Singularity - Performance optimizations for DirectX® 12


It could be also for GCN across the board-- anyone have Win 10 + Ashes + Fury X around here?


----------



## garwynn

Quote:


> Originally Posted by *infranoia*
> 
> All right, now we need to see what the brand-new 15.8 beta drivers do for the Fury X in Ashes.
> It could be also for GCN across the board-- anyone have Win 10 + Ashes + Fury X around here?


The only people that have Ashes AFAIK are the press who got the benchmark copy.
I am not aware of any larger scale release.


----------



## FastEddieNYC

Quote:


> Originally Posted by *orlfman*
> 
> so really could all of this just be a issue with drivers then? i know nvidia has been working on dx12 for awhile now, but its still new, and from seeing all the drivers being released lately, nvidia seems to have been focusing more on continuing their dx11 improvements than dx12 optimizations.
> 
> edit:
> i was reading this from anandtech:
> http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading
> Maxwell 2 is post to have 1 Graphics + 31 Compute, so it does support async, well according to nvidia and this anandtech article. pre maxwell 2, including maxwell 1, does not.


Nvidia has been quiet about Async compute capability for all their cards for a reason which we now know. They do not have Native support but need to do context switching. Async compute is one of the major benefits of DX12 and Ashes of the Singularity uses a decent amount of it. I'm sure Nvidia will try to optimize the drivers to try to make up for the lack of have ACE's in their Gpu's but that is only a bandaid. I guarantee their next gen cards will have it.

Update: i just read the Guru3D article. If true that Nvidia tried to have Async disabled in the alpha benchmark to hide the fact that their hardware really isn't 100% DX12 capable it shows that ethical behavior at Nvidia is seriously lacking.


----------



## Noufel

Nvidia PR "We're glad to see DirectX 12 titles showing up. There are many titles with DirectX 12 coming before the end of the year and we are excited to see them" like a boss not even scarred








http://wccftech.com/amd-nvidias-maxwell-is-utterly-incapable-of-performing-async-compute/


----------



## umeng2002

I think the larger take away is that since consoles are now GCN, they will use Async Compute a lot sooner rather than later in coming console ports. So nVidia should have this fixed in Pascal. Maxwell 2 is a dx12 write off, imho.


----------



## infranoia

Quote:


> Originally Posted by *garwynn*
> 
> The only people that have Ashes AFAIK are the press who got the benchmark copy.
> I am not aware of any larger scale release.


Nah, just buy it. It's in my Steam account right now.

http://www.ashesofthesingularity.com/store



Now, it's a tricky request, because we're not supposed to share screens or discuss it, touch it or even really look at it. No eye contact with Ashes of the Singularity please. But maybe hints could drop.


----------



## anubis44

Quote:


> Originally Posted by *Mahigan*
> 
> Not exactly the same z/stencil rate, give or take depending on the operating clock speeds.
> 
> And I don't see what is wrong with me saying I'm going to buy Greenland. I'm quite certain that what Oxide is saying, and what I stated by observing the results in theory, point to the lack of async support tier 2 in Maxwell 2.
> 
> NVIDIA told everyone they could do it. Even Anandtech. Why would I chance buying a $650 graphics card from NVIDIA, in the future, seeing what they've done with Kepler and now potentially Maxwell/2?
> 
> I can overlook them doing it once with Kepker. Fine. And I have.
> 
> If Maxwell 2 doesn't end up supporting Async tier 2 then you can be sure GameWorks titles won't until Pascal is released. Pascal will undoubtedly be tier 2 if not 3. Once Pascal releases GameWorks titles will support Async. That means hitting the GTX 980 Ti hard.
> 
> That would be twice in a row they do this. Why would I buy a GPU from them if that's the case?
> 
> That's making a rational buying decision based on an objective perspective.
> 
> Typo. 5.4 not 5.6 and I obtained it from here: http://techreport.com/review/28513/amd-radeon-r9-fury-x-graphics-card-reviewed/4
> 
> And it depends on the boost clock. It varies up and down depending on load.


Quote:


> Originally Posted by *Mahigan*
> 
> Good job everyone
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You did the work tech websites used to do.
> 
> From this point on I'm out of this conversation. Time to spend more time with the wife.


Thank you very much, Mahigan. You've brought some real legitimacy to the discussion of DX12 suitability/features in AMD and nVidia GPUs architectures, and I'm very grateful for the clear, concise way you have described some quite technical concepts. Your posts are golden nuggets in a teeming, massive dungheap of partisan bickering and mindless, ignorant rants and ad-hominem personal attacks. Please come back periodically and continue to make your important contributions to our understanding of this important aspect of computing.


----------



## garwynn

Quote:


> Originally Posted by *umeng2002*
> 
> I think the larger take away is that since consoles are now GCN, they will use Async Compute a lot sooner rather than later in coming console ports. So nVidia should have this fixed in Pascal. Maxwell 2 is a dx12 write off, imho.


This is the odd thing - at least part of the Async shading (if not all) has been in use on consoles since XBO/PS4 release. That's why I mentioned in the article.
AMD just couldn't get MS to apply the custom DX11.x changes back to PC mainline - and honestly may have never happened except for all the support behind Mantle.


----------



## provost

Quote:


> Originally Posted by *Devnant*
> 
> Probably because there are no DX12 games out there, yet. But by the end of the year, with more DX12 titles available, or at least more alpha and beta DX12 titles, I'm sure we will get a better picture.


I think you know better than that... Lol

New cards have to have a selling point, based on the performance jump over the previous gen. Optimization and Performance are interchangeable terms to describe the same phenomena in the GPU market, right ?


----------



## iLeakStuff

If this is true for all DX12 games, it will be quite sad for current Kepler/Maxwell owners that planned to use their GPUs for 2016 and 2017 where DX12 will take over, unless Nvidia have something to fix it within the DX12 drivers.
I expect Nvidia to be forward thinking that they implement support for asynchronous compute with Pascal, but not exactly promising for current Nvidia owners with current tech.

Will for sure be interesting to see Nvidia`s reply on this since AMDs own employees have now been pretty forward about how great their GCN is compared

__
https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/cul9auq

There is quite the buzz all over the internet now about this.


----------



## ImJJames

http://www.overclock3d.net/articles/cpu_mainboard/dice_wants_win_10_plus_dx12_as_minimum_specs_for_holiday_2016_frostbite_games/1

How quickly the tide turns, from reading almost ever PC sub Reddit looks like Nvidia has literally dug itself its own burial.


----------



## p4inkill3r

Quote:


> Originally Posted by *ImJJames*
> 
> http://www.overclock3d.net/articles/cpu_mainboard/dice_wants_win_10_plus_dx12_as_minimum_specs_for_holiday_2016_frostbite_games/1
> 
> How quickly the tide turns, from reading almost ever PC sub Reddit looks like Nvidia has literally dug itself its own burial.


They can screw up indefinitely and still hold 70% market share.

AMD needs more than moral victories at this point.


----------



## ImJJames

Quote:


> Originally Posted by *p4inkill3r*
> 
> They can screw up indefinitely and still hold 70% market share.
> 
> AMD needs more than moral victories at this point.


I completely agree, Nvidia has lied and manipulated its way to the TOP. Welcome to corporate America.


----------



## Themisseble

I think that AMD is gaining reputation from people and users. Nvidia is loosing it.... I still do not like that their hardware does not support async shaders natively. So NVIDIA titles on DX12 bvut no async shaders for PC only for XBOX ONE.. thats what I am expecting... just like with physX and other stuff.
Dont know why so many people defending NVIDIA... AMD is clearly better choice. Just like INTEL was with sandy bridge.

Finally some gaming on linux and steam OS.. like to have privacy. Thanks AMD.


----------



## Forceman

So this is all based on that one comment from the guy posted here, right? Looking at the articles it looks like everything is circular sourcing back to that single post. Be nice to see some confirmation, considering how many developers are working on DX12 already.


----------



## Xuper

Is hardforum full of Nv shill ? they're attacking Mahigan.

http://hardforum.com/showthread.php?t=1874085&page=4


----------



## Kand

Quote:


> Originally Posted by *Forceman*
> 
> So this is all based on that one comment from the guy posted here, right? Looking at the articles it looks like everything is circular sourcing back to that single post. Be nice to see some confirmation, considering how many developers are working on DX12 already.


It's a straw AMD fans can grasp on. They will spin it and twist it to their favor to make their purchase that much less regretful.

Let's be real. Has anyone confirmed that the Oxide person is really who he claims to be? I can start posting as Lisa Su here if I wanted to.


----------



## epic1337

Quote:


> Originally Posted by *p4inkill3r*
> 
> They can screw up indefinitely and still hold 70% market share.
> 
> AMD needs more than moral victories at this point.


thats exactly why not competing agressively leaves AMD with a negative impression.
sure the shortage of HBM had caused the Fury cards to have few in stock, they still shouldn't have ran up the perf/$ on such a steep slope.
this is while ignoring the fact that their decision of using HBM with it's current manufacturing deficiency wasn't a good idea.

they have inferior perf/$ at resolutions equal to 1440P or lower, this doesn't help them win buyers, in contrast to that though DX12 did somewhat help.
but DX12 is still premature, by the time DX12 is widely adopted, there'd be better cards available from both AMD and Nvidia.
at the moment AMD is relying on DX12 fiasco to keep their cards with high prospects, its a smart move but not an ideal one.

as a side note, their decision in making a 4096SP chip, discarding GDDR5 IMC because of insufficient TDP headroom, and relying on HBM for any benefits all came quite too short in results.
they had 2~3years to develop a successor to GCN 1.1, of course GCN 1.2 is a successor to it, it wasn't better than GCN 1.1 in terms of efficiency and overall performance.
to point out, if they still don't have a better successor to their current GCN lineage, what are they gonna put out after Fiji XT? a 5120SP chip with 400W power consumption?


----------



## Themisseble

Quote:


> Originally Posted by *Forceman*
> 
> So this is all based on that one comment from the guy posted here, right? Looking at the articles it looks like everything is circular sourcing back to that single post. Be nice to see some confirmation, considering how many developers are working on DX12 already.


Actually its quite clear.
If you buy 300-650$ expensive card... what do you expect from it? That seller is lying to you?

Nvidia was showing how well does DX11 on their cards... also @pcper is showing that and they are all lying!
http://www.pcper.com
@ http://www.pcper.com/reviews/Graphics-Cards/DX12-GPU-and-CPU-Performance-Tested-Ashes-Singularity-Benchmark/Results-Heavy
This CPU benchmark.. look at i7 6700K DX11!

With heavy scene getting 54/48 FPS... basically second is GPU bottlenecked. There is NO WAY that CPU can do that on DX11 - this is huge LIE for everyone who thinks that Dx11 NVIDIA scales so well. IT JUST DOES NOT!

Digital foundry - check thes screenshots. AND look at i7 and i3 DX11 and i3 ohn DX12 - screenshot number 2/5.
http://www.eurogamer.net/articles/2015-08-17-ashes-of-the-singularity-gtx-970-dx11-vs-dx12-performance

What happens with DX11? .. Dx11 starts to show huge bottleneck... because game starts to use only 1-2 threads.. and pcper is showing that skylake has 2x IPC?...

And then you see this
http://www.pcper.com/image/view/60376?return=node%2F63602

AVG. CPU FRAMETIME 52 FPS?!! On Fury X and i7 4960X!! WHAT THE @PCPER?

You can clearly see that CPU was bottlencking GPU

Please, if anyone has I7 and AotS and NVIDIA GPU can you just do heavy scene in DX11 mode?


----------



## provost

Quote:


> Originally Posted by *p4inkill3r*
> 
> They can screw up indefinitely and still hold 70% market share.
> 
> *AMD needs more than moral victories* at this point.


Agreed.

Quote:


> Originally Posted by *Forceman*
> 
> So this is all based on that one comment from the guy posted here, right? Looking at the articles it looks like everything is circular sourcing back to that single post. Be nice to see some confirmation, considering how many developers are working on DX12 already.


How about an unequivocal confirmation of native async compute from Nvidia? lol


----------



## Kand

Quote:


> Originally Posted by *Xuper*
> 
> Is hardforum full of Nv shill ? they're attacking Mahigan.
> 
> http://hardforum.com/showthread.php?t=1874085&page=4


You really cant take his speculation as truth jist because it sounds like it makes sense. I would advose you to wait for more benchmarks and developers to actually make use of dx12. In games you would actually play and not games you run to benchmark.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Forceman*
> 
> So this is all based on that one comment from the guy posted here, right? Looking at the articles it looks like everything is circular sourcing back to that single post. Be nice to see some confirmation, considering how many developers are working on DX12 already.


I've been skeptical too. but at this point it's all but Nvidia confirmed.

We've had Oxide and AMD comment on the matter, all that's left is a proper response from Nvidia, instead they said something like "We're exited for DX12 games".

There's overwhelming evidence now, and AMD have confirmed it, and Nvidia haven't denied it.

GCN is a superior parallel architecture.


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> Is hardforum full of Nv shill ? they're attacking Mahigan.
> 
> http://hardforum.com/showthread.php?t=1874085&page=4


Don't worry about it









As long as the truth comes out... I can take the hits


----------



## Kane2207

Quote:


> Originally Posted by *ImJJames*
> 
> http://www.overclock3d.net/articles/cpu_mainboard/dice_wants_win_10_plus_dx12_as_minimum_specs_for_holiday_2016_frostbite_games/1
> 
> How quickly the tide turns, from reading almost ever PC sub Reddit looks like Nvidia has literally dug itself its own burial.


Kind of a dumb move by Dice considering Win 10's current market share. Can't be bothered to Google more...but the first result on The Register has it at 0.375% on 02/08/2015. Obviously that's increased, but increased by enough to justify their position?

Incredibly foolish to limit your potential market to what could be single digit numbers by the end of the year.


----------



## Kand

Quote:


> Originally Posted by *GorillaSceptre*
> 
> I've been skeptical too. but at this point it's all but Nvidia confirmed.
> 
> We've had Oxide and AMD comment on the matter, all that's left is a proper response from Nvidia, instead they said something like "We're exited for DX12 games".
> 
> There's overwhelming evidence now, and AMD have confirmed it, and Nvidia haven't denied it.
> 
> GCN is a superior parallel architecture.


They've also said that AoTS is not relfective of Dx12 performance.
http://wccftech.com/nvidia-we-dont-believe-aots-benchmark-a-good-indicator-of-dx12-performance/


----------



## ImJJames

[/quote]
Quote:


> Originally Posted by *Kand*
> 
> They've also said that AoTS is not relfective of Dx12 performance.
> http://wccftech.com/nvidia-we-dont-believe-aots-benchmark-a-good-indicator-of-dx12-performance/


Thats because Nvidia got salty after Oxide refused to let Nvidia manipulate the benchmarks. http://www.overclock3d.net/articles/gpu_displays/oxide_developer_says_nvidia_was_pressuring_them_to_change_their_dx12_benchmark/1


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kand*
> 
> They've also said that AoTS is not relfective of Dx12 performance.
> http://wccftech.com/nvidia-we-dont-believe-aots-benchmark-a-good-indicator-of-dx12-performance/


That was before all of this. Ironically i think that's whats made this an even bigger deal









Then one of the leads at Oxide responded "tread lightly", i think that was related to all this new info that Mahigan found.

An Ashes dev said in this very thread that Nvidia PR will not argue with their points.


----------



## Kand

Quote:


> Originally Posted by *ImJJames*
> 
> Thats because Nvidia got salty after Oxide refused to let Nvidia manipulate the benchmarks. http://www.overclock3d.net/articles/gpu_displays/oxide_developer_says_nvidia_was_pressuring_them_to_change_their_dx12_benchmark/1


You dont need to post a link that quotes a post on this very thread in this forum.

If dx11 performance as is was 100% and dx12 with async was bugged and gave 50%, (see amd fx scaling) i can see why Nvidia would not want async fully enabled until theyre able to isolate the issue.


----------



## Kand

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That was before all of this. Ironically i think that's whats made this an even bigger deal
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Then one of the leads at Oxide responded "tread lightly", i think that was related to all this new info that Mahigan found.
> 
> An Ashes dev said in this very thread that Nvidia PR will not argue with their points.


Has anyone confirmed the validity of said ashes dev?


----------



## sugarhell

It wasnt bugged. Nvidia cant do Async thats why they asked to drop the feature


----------



## epic1337

Quote:


> Originally Posted by *Kand*
> 
> Has anyone confirmed the validity of said ashes dev?


Nvidia would've sued Oxide for slander otherwise, wouldn't they?


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> Has anyone confirmed the validity of said ashes dev?


Yes... but I'm not at liberty to discuss just who he is. Let's just say... you've been speaking to a VIP.


----------



## sugarhell

I think this is enough?


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> Yes... but I'm not at liberty to discuss just who he is. Let's just say... you've been speaking to a VIP.


Interesting but ill take it with a full shaker of salt.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kand*
> 
> Has anyone confirmed the validity of said ashes dev?


I'm not sure, i don't think that would be fair or needed at this point either.

An AMD rep on reddit has agreed with what has been said. And i think we can confirm his validity


----------



## HalGameGuru

Quote:


> Originally Posted by *Kane2207*
> 
> Kind of a dumb move by Dice considering Win 10's current market share. Can't be bothered to Google more...but the first result on The Register has it at 0.375% on 02/08/2015. Obviously that's increased, but increased by enough to justify their position?
> 
> Incredibly foolish to limit your potential market to what could be single digit numbers by the end of the year.


Windows 10 market share is EXPLODING compared to earlier versions. Less than a month after launch over 50 million machines, most all new machines are going to be Windows 10, I wouldn't be surprised if Microsoft truncates any continued support for older versions of windows to OEMs, I doubt you will be seeing many Windows 7 or 8.1 machines coming from the big name OEM's long term.

Pushing for AAA Titles to support DX12 (not exclusively it would seem, though: DX11 will still be supported I'm sure and VULKAN) will push Windows 10 and DX12 saturation, which will make developing and SUPPORTING much easier on the devs and producers. AND make porting back and forth between PC and Console much easier and higher quality.


----------



## epic1337

Quote:


> Originally Posted by *GorillaSceptre*
> 
> An AMD rep on reddit has agreed with what has been said. And i think we can confirm his validity


as an opposing faction? i'm not sure that should be taken as a fact.
i mean, who wouldn't wanna try pissing off their enemy?

plus AMD reps (or all reps for that matter) are known to run off their information with exaggeration.


----------



## Kand

Quote:


> Originally Posted by *HalGameGuru*
> 
> Windows 10 market share is EXPLODING compared to earlier versions. Less than a month after launch over 50 million machines, most all new machines are going to be Windows 10, I wouldn't be surprised if Microsoft truncates any continued support for older versions of windows to OEMs, I doubt you will be seeing many Windows 7 or 8.1 machines coming from the big name OEM's long term.
> 
> Pushing for AAA Titles to support DX12 (not exclusively it would seem, though: DX11 will still be supported I'm sure and VULKAN) will push Windows 10 and DX12 saturation, which will make developing and SUPPORTING much easier on the devs and producers. AND make porting back and forth between PC and Console much easier and higher quality.


Dows that actually count the number of people who installed ir in VMs though?

I know a guy who used one key of windows 7 on 10 VMs and "upgraded" to 10.


----------



## Mahigan

Guys,

I emailed Oxide and asked if someone could come and clarify what was happening. I emailed them as a member of Overclock.net. I explained that we have many people discussing their benchmark.

The developer came here to answer your questions and explain what was happening. There's no conspiracy theory here. Everyone has, so far, been honest and forthright.

What we now need is a response from nVIDIA. If none is forthcoming then I think you know what that means. Everything is confirmed.


----------



## p4inkill3r

Quote:


> Originally Posted by *Mahigan*
> 
> There's no conspiracy theory here.


This very statement indicates something conspiratorial is afoot!


----------



## GorillaSceptre

Quote:


> Originally Posted by *epic1337*
> 
> as an opposing faction? i'm not sure that should be taken as a fact.
> i mean, who wouldn't wanna try pissing off their enemy?
> 
> plus AMD reps (or all reps for that matter) are known to run off their information with exaggeration.


Follow what he's been saying on reddit, he actually said that no current GPU fully supports DX12, pretty honest if you ask me. AMD usually are. For a corporation anyway









I meant it wouldn't be fair to the dev who commented here because he may be sued/fired personally, or cause problems for Oxide.

If Nvidia has got their backs against a wall because of this, i know i wouldn't want to be the one to stoke the fire..


----------



## mcg75

*Another thread going down the tubes because of name calling and personal attacks.

Please stick to the topic people. Leave the personal nonsense out of it.*


----------



## SpeedyVT

I would love to see a 15.8 benchmark of Ashes.


----------



## ImJJames

Quote:


> Originally Posted by *SpeedyVT*
> 
> I would love to see a 15.8 benchmark of Ashes.


Crazy how AMD were able to even optimize it more... at this rate the driver will probably show AMD 280 equalling 980Ti


----------



## Forceman

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Follow what he's been saying on reddit, he actually said that no current GPU fully supports DX12, pretty honest if you ask me. AMD usually are. For a corporation anyway
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I meant it wouldn't be fair to the dev who commented here because he may be sued/fired personally, or cause problems for Oxide.
> 
> If Nvidia has got their backs against a wall because of this, i know i wouldn't want to be the one to stoke the fire..


Seems like there should be plenty of other developers that know whether this is true or not - at the very least Microsoft and Epic, and probably Dice. Surprised none of them have said anything yet.


----------



## NuclearPeace

Other developers and studios wont gain anything out of chiming in.


----------



## Remij

All these AMD gloaters here and none of them own the game/benchmark? lol

DX12
1080p
Crazy setting (everything high)
Fullscreen
no msaa
no taa
no vsync
no freesync

Someone with a 290x, 390x or Fury X run the test like that. I'll run it and let's compare.

I would like to see if there's even more improvement on the AMD side.


----------



## Noufel

Anyone to test aots benchmark with the new 355.82 driver, i read in several forums that it increased the performance in maxwell2 gpus i need confirmation


----------



## Mahigan

Quote:


> Originally Posted by *Remij*
> 
> All these AMD *gloaters* here...


Is not conducive towards the goal of an open conversation.

You could just say AMD Users and you'd likely get a lot more people replying to your request.


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> Guys,
> 
> I emailed Oxide and asked if someone could come and clarify what was happening. I emailed them as a member of Overclock.net. I explained that we have many people discussing their benchmark.
> 
> The developer came here to answer your questions and explain what was happening. There's no conspiracy theory here. Everyone has, so far, been honest and forthright.
> 
> What we now need is a response from nVIDIA. If none is forthcoming then I think you know what that means. Everything is confirmed.


Guilty until proven innocent. Quite a good sense of justice you've got there.


__
https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

This one is a pretty good read.


----------



## SpeedyVT

Quote:


> Originally Posted by *Kand*
> 
> Guilty until proven innocent. Quite a good sense of justice you've got there.
> 
> 
> __
> https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/
> 
> This one is a pretty good read.


It's seriously the case NVidia hasn't stated their position on the matter after the recent discussion over the benchmark where NVidia requested the benchmarks be ran with different settings.


----------



## Remij

355.80 DX12


355.82 DX12


355.80 DX11


355.82 DX11


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> Guilty until proven innocent. Quite a good sense of justice you've got there.
> 
> 
> __
> https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/%5B/URL
> 
> That test you just linked too, relies on slow context switching. The XboxOne and PS4 titles rely on fast context switching.
> 
> When Oxide stated that nVIDIA Maxwell doesn't do Async Compute, he also stated it could perform the task but slowly. A large performance hit. (what he mean't was that it relies on slow context switching).
> 
> I mentioned an analogy. It's like the GeForceFX, it could perform Pixel Shader 2.0 tasks, but slowly.
> 
> There's so much back and forth on this and so much confusion but don't worry. Once nVIDIA responds it will be confirmed.
> 
> Edit: I don't think Maxwell/2 runs the commands Asynchronously. I think it runs them Synchronously but more testing will be needed in order to determine this.


----------



## Anna Torrent

Quote:


> Originally Posted by *Kollock*
> 
> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.
> 
> In regards to the purpose of Async compute, there are really 2 main reasons for it:
> 
> 1) It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading. Like hyper threading, it really depends on the workload and GPU architecture for as to how important this is. In this case, it is used for performance. I can't divulge too many details, but GCN can cycle in work from an ACE incredibly efficiently. Maxwell's schedular has no analog just as a non hyper-threaded CPU has no analog feature to a hyper threaded one.
> 
> 2) It allows jobs to be cycled in completely out of band with the rendering loop. This is potentially the more interesting case since it can allow gameplay to offload work onto the GPU as the latency of work is greatly reduced. I'm not sure of the background of Async Compute, but it's quite possible that it is intended for use on a console as sort of a replacement for the Cell Processors on a ps3. On a console environment, you really can use them in a very similar way. This could mean that jobs could even span frames, which is useful for longer, optional computational tasks.
> 
> It didn't look like there was a hardware defect to me on Maxwell just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs. Since we were tying to use it for #1, not #2, it made little sense to bother. I don't believe there is any specific requirement that Async Compute be required for D3D12, but perhaps I misread the spec.
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


Hi Kollock - really glad to see you here (and I also think that it does great service to Oxide)

I read your posts and many others and still I feel a lot is missing to me and I guess for others too. I'd be glad to understand more:

1. Before, with DX11.X, many games were GPU bound anyway, according to monitoring software like HWInfo and according to CPU-clocks scaling benchmarks. Do we have to learn from DX12 that really, there was a lot more juice in the AMD GPUs (for example), because DX12 allows better utilization?
Looking at this Tonga architecture chart:
If I get it correctly, in DX11 you could easily be in a state where not all compute units were utilized in a specific time, right?

2. Maxwell II is considered to have some kind of ACEs. They should have 2 "queues" (graphics + compute) according to Slides I saw on Anandtech. So, yo actually say they are not really operational as the AMD part? is it not clear how to utilize them?

2b. Do we know how well Maxwell II GPUs are utilized in terms of occupied CUs?

3. So, really, what does GPU load/usage rates mean? How do we know what it means to be GPU-bound if we don't know what's going on below?

4. So2, you are saying that there's nothing about graphics engine that the compute engine can't do? Sounds like the GE is really there from historical reasons

5. Haven't consoles used ACEs before? they were not limited to DX11 and had their own API

Many thanks in advance!


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> On the first point, if a response is not forthcoming, if nVIDIA ignore requests for comment (give it a few days) then what was stated is confirmed.
> 
> Now what was stated?
> 
> This is also what an AMD rep mentioned here (I posted it in this thread but it was lost in all of the comments which were posted in haste by many folks):
> Available here:
> 
> __
> https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/%5B/URL
> 
> That test you just linked too, relies on slow context switching. The XboxOne and PS4 titles rely on fast context switching.
> 
> When Oxide stated that nVIDIA Maxwell doesn't do Async Compute, he also stated it could perform the task but slowly. A large performance hit. (what he mean't was that it relies on slow context switching).
> 
> I mentioned an analogy. It's like the GeForceFX, it could perform Pixel Shader 2.0 tasks, but slowly.
> 
> There's so much back and forth on this and so much confusion but don't worry. Once nVIDIA responds it will be confirmed.


When Oxide stated that Maxwell doesn't do Async Compute, this is all that the AMD pushers saw and are grasping onto it.


----------



## RushTheBus

I really can't help but feel there is an awful lot of conjecture throughout this thread (from a variety of parties) being championed as indisputable facts. I also can't help but feel that the entire alarmist approach to this discussion is generally irresponsible and counterproductive to the reported intent (even more so for people who don't have a solid grasp on finer details of these technologies).


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> When Oxide stated that Maxwell doesn't do Async Compute, this is all that the AMD pushers saw and are grasping onto it.


Yes,

Because partisanship goes both ways. This is why I reject partisanship. It is not conducive to an open and transparent conversation. It leads to insults being thrown in every direction.


----------



## Mahigan

Quote:


> Originally Posted by *RushTheBus*
> 
> I really can't help but feel there is an awful lot of conjecture throughout this thread (from a variety of parties) being championed as indisputable facts. I also can't help but feel that the entire alarmist approach to this discussion is generally irresponsible and counterproductive to the reported intent (even more so for people who don't have a solid grasp on finer details of these technologies).


Alarmism is a beneficial tool at times. It can be used to illicit a response from an organization when it is refusing to respond to questions or divulge more information. We're consumers. We're at a great power disadvantage compared to the large tech organizations. By being disruptive, stirring up a controversy, we can compel organizations to spill the beans.

Why do this?

Well when you see an R9 290x going toe to toe against a GTX 980 Ti, it raises an eyebrow (at least it did for me). I wanted to know why? Nobody was providing the answers. Therefore what do you do?
You start to post in various threads in order to get as many people as you can discussing the topic. You crowd source. Based on this information you create a theory. This theory draws attention and views. You then use the views and attention in order to bring the organizations involved into the discussion.

Piece by piece, your questions are answered.

In the end, we will all benefit.

For now... some people are dancing around in circles doing a victory dance while others are sobbing. This will change... give it time. By the end of this we will have a clearer picture of what is going on and a better idea on how to invest our hard earned money going forward.

You may not like this tactic thus far, you may feel personally targeted, but when big tech websites don't do their duty and ask the tough questions... what are we supposed to do? Make blind purchases and hope for the best?

This is how journalism used to be done. This is how Glenn Greenwald operates. Look at how much we now know about NSA activities? It works.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> Alarmism is a beneficial tool at times. It can be used to illicit a response from an organization when it is refusing to respond to questions or divulge more information. We're consumers. We're at a great power disadvantage compared to the large tech organizations. By being disruptive, stirring up a controversy, we can compel organizations to spill the beans.
> 
> Why do this?
> 
> Well when you see an R9 290x going toe to toe against a GTX 980 Ti, it raises an eyebrow (at least it did for me). I wanted to know why? Nobody was providing the answers. Therefore what do you do?
> You start to post in various threads in order to get as many people as you can discussing the topic. You crowd source. Based on this information you create a theory. This theory draws attention and views. You then use the views and attention in order to bring the organizations involved into the discussion.
> 
> Piece by piece, your questions are answered.
> 
> In the end, we will all benefit.
> 
> For now... some people are dancing around in circles doing a victory dance while others are sobbing. This will change... give it time. By the end of this we will have a clearer picture of what is going on and a better idea on how to invest our hard earned money going forward.
> 
> You may not like this tactic thus far, you may feel personally targeted, but when big tech websites don't do their duty and ask the tough questions... what are we supposed to do? Make blind purchases and hope for the best?
> 
> This is how journalism used to be done. This is how Glenn Greenwald operates. Look at how much we now know about NSA activities? It works.


Yea, these wars.. like there is some ideology in it. Consumption society..


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> Yea, these wars.. like there is some ideology in it. Consumption society..


We are living, in a material world, and I am a material gir... scratch that last part


----------



## RushTheBus

Quote:


> Originally Posted by *Mahigan*
> 
> Alarmism is a beneficial tool at times. It can be used to illicit a response from an organization when it is refusing to respond to questions or divulge more information. We're consumers. We're at a great power disadvantage compared to the large tech organizations. By being disruptive, stirring up a controversy, we can compel organizations to spill the beans.
> 
> Why do this?
> 
> Well when you see an R9 290x going toe to toe against a GTX 980 Ti, it raises an eyebrow (at least it did for me). I wanted to know why? Nobody was providing the answers. Therefore what do you do?
> You start to post in various threads in order to get as many people as you can discussing the topic. You crowd source. Based on this information you create a theory. This theory draws attention and views. You then use the views and attention in order to bring the organizations involved into the discussion.
> 
> Piece by piece, your questions are answered.
> 
> In the end, we will all benefit.
> 
> For now... some people are dancing around in circles doing a victory dance while others are sobbing. This will change... give it time. By the end of this we will have a clearer picture of what is going on and a better idea on how to invest our hard earned money going forward.
> 
> You may not like this tactic thus far, you may feel personally targeted, but when big tech websites don't do their duty and ask the tough questions... what are we supposed to do? Make blind purchases and hope for the best?
> 
> This is how journalism used to be done. This is how Glenn Greenwald operates. Look at how much we now know about NSA activities? It works.


I appreciate you taking the time to respond. Unfortunately, i would correlate this whole discussion (at least in its current form) to screaming "fire" in a crowded movie theater.

I am fully supportive of detailed technical discussion, asking difficult questions, and holding corporations accountable (which is why i would really love a technical response from nvidia). Unfortunately,i just don't feel that is the case. There is a lot of confusion and uncertainty about what seems to be going on and why it may be going on. I just don't really feel as though we have anything definitive - lots of potentially good ideas - but nothing definitive, but those hypothesis have somehow transformed into apparent facts.

Besides (and maybe I'm incorrect), but for this to be as massive of a deal as it is being made out to be, wouldn't every engine developer need to employ asynchronous compute the same way that Oxide has? Even beyond that, isn't asych compute just one method of doing things (albeit an efficient one) and not a defacto method and not somthing that is going to necessarily universally accepted immediately?

There just seem to be too many unknowns for me right now. Thanks again for the response.


----------



## garwynn

Quote:


> Originally Posted by *Mahigan*
> 
> Yes... but I'm not at liberty to discuss just who he is. Let's just say... you've been speaking to a VIP.


I've been told who it likely is - and that tweet reinforced that.
But AFAIK no one from Oxide has straight out said - "You're talking to XXXXX"
(Dan essentially validates though that it's from Oxide based on that same tweet.)

I've tried to reach out to him and Oxide to confirm but have not received a response.


----------



## tpi2007

How many people here are actually interested in isometric RTS games and, furthermore, expect this to translate to other types of games? It is a well known matter that DX 12 will only help greatly in some types of games and isometric RTS is the prime example where CPU overhead was known to be a problem. How does this translate to other games?

DICE, that could have come forward with their own benchmark, still haven't. They have had a Mantle renderer working for more than a year in a commercially available game and can't beat Oxide to the headlines with a benchmark? Nvidia showed that at least on the early stages of implementation and even with the Kepler architecture, they could beat in DX 11 mode what took AMD Mantle to regain some lost ground.

So, the really decisive benchmarks would be not only games with wide appeal but also ones coded to take advantage of it. Right now we have the second but not the first. Can an fps be made to take advantage of it in a way that shows clear benefits on current CPUs and GPUs? Where is DICE?

If we had at least a handful selection of different games in different genres to compare with, then I'd find this discussion interesting, but as it is, it's too early.

I want to see games like GTA V, The Witcher 3, Battlefield and the latest Batman make use of it. And I want to see these types of games in the future to do something different from now to show the advantage of DX 12. And then let's see if those elements actually fit in the game or if it's just filler to justify the gap. Say, a game with more characters on-screen is not necessarily better. It has to make sense given the context. More characters on-screen does not _per se_ make a game better. The game has to be made from the ground up with a different set of expectations that make sense. And then we're going to see if DX 12 shows anything relevant or not, and which GPU vendor does better in each type of game that relies more on this or that feature.

And then we'll see how many games like that we actually get. You see, the lowest common denominator, the Xbox One, will still have 768 cores at 853 Mhz and 16 ROPs at the end of the day. Even if in the best case scenario it gets a 30% boost, it will still trail the PS4 because Sony can implement the same thing with Vulkan or some sort of adapted version of it. The PS4 also has more compute relative to the Xbox One, it's not even proportional.

And if a 30% boost may have some people excited, let's not forget that not many people are excited about the GTX 950, yet it is more than 30% faster than the 750 Ti - which incidentally has around the same performance as the PS4's GPU. So, take that into consideration when thinking how great the PC will be but then a lot of the potential will be buried by the industry's practice to not hurt their bottom line and thus console optimized games will prevail. As an example, if the Xbox One can't handle a Witcher 3 village full of people that interact with you, you won't get it on the PC either.

Until now the difference between consoles and the PC has been cosmetic (and with a lot of drama and downgrades even at that), with the underlying game mechanics being programmed to work well on the lowest common denominator. Now that we can start having some fundamental core mechanics change, does anybody really think that the PC will suddenly break free? Or will it rather still have to comply with the lowest common denominator?

Yes, I'm highly sceptical. We will of course get our handful of halo games to justify the upgrade, like we got Crysis back in 2007 to justify Vista, and then it will be a desert again. Rinse and repeat.


----------



## umeng2002

Also, I don't think AMD would release slide of their ACE technology with references to "others" not being able to do it and NOT be talking about nVidia.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> We are living, in a material world, and I am a material gir... scratch that last part


Every decent man wants to be a girl from time to time, really


----------



## garwynn

@tpi2007,

Agree. NVIDIA's side of the equation is still largely undecided. With the right development they may find a way to restructure processing via LL software to solve the potential problem. Stranger things have happened where a solution was provided in another area that maybe was not possible elsewhere. We also have ZERO results from DX12 Unreal Engine 4 at the moment, which given the GameWorks involvement should be fine tuned and give the best representation for Team Green.

Should someone run out and buy another card right away? Not unless you're me and want one of each for comparison.








Good time to get popcorn, a comfy chair and keep reading? Definitely a good idea, and enjoy some games during the wait!


----------



## p4inkill3r

Quote:


> Originally Posted by *tpi2007*
> 
> Yes, I'm highly sceptical. We will of course get our handful of halo games to justify the upgrade, like we got Crysis back in 2007 to justify Vista, and then it will be a desert again. Rinse and repeat.


I'm a bit more optimistic; I believe we will see DX12 implemented in many already-released AAA games relatively soon and going forward, its features and badging will be utilized to signify 'next gen' games, much like Glide was back in the day.

I think something that can potentially open the floodgates for adoption and expansion of the API is how Battlefront fares: it has extreme crossover appeal with Episode VII and may be a game people buy new PCs to play. "Plays best on Windows 10 & DirectX 12" is something we could see quite a bit of in the near future.


----------



## spacin9

Quote:


> Originally Posted by *Remij*
> 
> 355.80 DX12
> 
> 
> 355.82 DX12
> 
> 
> 355.80 DX11
> 
> 
> 355.82 DX11


What are your settings so I can verify what you got here.

I've not seen a tangible improvement in 355.82. The CPU seems to be bumped up 5 Fps... whatever that means.

This is a 4K game. I can turn off AA and temporal AA @ 4K and it looks ten times better than 1080p maxed with 4X MSAA and have a higher average framerate. You can't appreciate the detail and the scale unless you see it 4K.


----------



## Kand

http://linustechtips.com/main/topic/429461-unreal-engine-4-directx12-vs-directx11-performance-comparison/

Why aren't we benching Unreal 4?


----------



## KenjiS

Quote:


> Originally Posted by *p4inkill3r*
> 
> I'm a bit more optimistic; I believe we will see DX12 implemented in many already-released AAA games relatively soon and going forward, its features and badging will be utilized to signify 'next gen' games, much like Glide was back in the day.
> 
> I think something that can potentially open the floodgates for adoption and expansion of the API is how Battlefront fares: it has extreme crossover appeal with Episode VII and may be a game people buy new PCs to play. "Plays best on Windows 10 & DirectX 12" is something we could see quite a bit of in the near future.


I think DX12 is going to spread -rapidly- due to the rapid adoption of Windows 10 plus its commonality with the Xbox One.... DX11 may actually disappear quickly if that happens.. which means we're going to be forced to Win 10 whether we like it or not...


----------



## p4inkill3r

Quote:


> Originally Posted by *Kand*
> 
> http://linustechtips.com/main/topic/429461-unreal-engine-4-directx12-vs-directx11-performance-comparison/
> 
> Why aren't we benching Unreal 4?


What's stopping you from creating a new thread so we can?


----------



## ZealotKi11er

Quote:


> Originally Posted by *KenjiS*
> 
> I think DX12 is going to spread -rapidly- due to the rapid adoption of Windows 10 plus its commonality with the Xbox One.... DX11 may actually disappear quickly if that happens.. which means we're going to be forced to Win 10 whether we like it or not...


We are in great need for DX12 atleast in AMD side but not so much in Nvidia side. I would totally see Nvidia right now would rather keep DX11 where it has the advantage. I just hope Nvidia does not cripple DX12 to fit their architecture. I know they will and its one reason i dont support companies like that. I am fine with IHV improving game code to run better for their architecture as long it does not negatively effect the other side.


----------



## Kand

Quote:


> Originally Posted by *p4inkill3r*
> 
> What's stopping you from creating a new thread so we can?


I do not want to install windows 10 yet.


----------



## garwynn

Quote:


> Originally Posted by *Kand*
> 
> http://linustechtips.com/main/topic/429461-unreal-engine-4-directx12-vs-directx11-performance-comparison/
> 
> Why aren't we benching Unreal 4?


I read over this and the quality of the port seems to be uncertain. At this point Ark DX12 should be here soon enough to help give a clearer picture.


----------



## Kand

Quote:


> Originally Posted by *garwynn*
> 
> I read over this and the quality of the port seems to be uncertain. At this point Ark DX12 should be here soon enough to help give a clearer picture.


It still would serve as a pretty neat synthetic benchmark.


----------



## Mahigan

Well,

It seems that the Beyond3D tests have had the opposite effect they intended. The test appears to prove that Maxwell/2 cannot perform Compute and Graphic tasks in Parallel:


The thread where the discussion is taking place can be found here:

__
https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/%5B/URL


----------



## Noufel

Good reading









__
https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/


----------



## gamervivek

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Follow what he's been saying on reddit, he actually said that no current GPU fully supports DX12, pretty honest if you ask me. AMD usually are. For a corporation anyway
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I meant it wouldn't be fair to the dev who commented here because he may be sued/fired personally, or cause problems for Oxide.
> 
> If Nvidia has got their backs against a wall because of this, i know i wouldn't want to be the one to stoke the fire..


Quote:


> Originally Posted by *tpi2007*
> 
> How many people here are actually interested in isometric RTS games and, furthermore, expect this to translate to other types of games? It is a well known matter that DX 12 will only help greatly in some types of games and isometric RTS is the prime example where CPU overhead was known to be a problem. How does this translate to other games?
> 
> DICE, that could have come forward with their own benchmark, still haven't. They have had a Mantle renderer working for more than a year in a commercially available game and can't beat Oxide to the headlines with a benchmark? Nvidia showed that at least on the early stages of implementation and even with the Kepler architecture, they could beat in DX 11 mode what took AMD Mantle to regain some lost ground.
> 
> So, the really decisive benchmarks would be not only games with wide appeal but also ones coded to take advantage of it. Right now we have the second but not the first. Can an fps be made to take advantage of it in a way that shows clear benefits on current CPUs and GPUs? Where is DICE?
> 
> If we had at least a handful selection of different games in different genres to compare with, then I'd find this discussion interesting, but as it is, it's too early.
> 
> I want to see games like GTA V, The Witcher 3, Battlefield and the latest Batman make use of it. And I want to see these types of games in the future to do something different from now to show the advantage of DX 12. And then let's see if those elements actually fit in the game or if it's just filler to justify the gap. Say, a game with more characters on-screen is not necessarily better. It has to make sense given the context. More characters on-screen does not _per se_ make a game better. The game has to be made from the ground up with a different set of expectations that make sense. And then we're going to see if DX 12 shows anything relevant or not, and which GPU vendor does better in each type of game that relies more on this or that feature.
> 
> And then we'll see how many games like that we actually get. You see, the lowest common denominator, the Xbox One, will still have 768 cores at 853 Mhz and 16 ROPs at the end of the day. Even if in the best case scenario it gets a 30% boost, it will still trail the PS4 because Sony can implement the same thing with Vulkan or some sort of adapted version of it. The PS4 also has more compute relative to the Xbox One, it's not even proportional.
> 
> And if a 30% boost may have some people excited, let's not forget that not many people are excited about the GTX 950, yet it is more than 30% faster than the 750 Ti - which incidentally has around the same performance as the PS4's GPU. So, take that into consideration when thinking how great the PC will be but then a lot of the potential will be buried by the industry's practice to not hurt their bottom line and thus console optimized games will prevail. As an example, if the Xbox One can't handle a Witcher 3 village full of people that interact with you, you won't get it on the PC either.
> 
> Until now the difference between consoles and the PC has been cosmetic (and with a lot of drama and downgrades even at that), with the underlying game mechanics being programmed to work well on the lowest common denominator. Now that we can start having some fundamental core mechanics change, does anybody really think that the PC will suddenly break free? Or will it rather still have to comply with the lowest common denominator?
> 
> Yes, I'm highly sceptical. We will of course get our handful of halo games to justify the upgrade, like we got Crysis back in 2007 to justify Vista, and then it will be a desert again. Rinse and repeat.


It's not 'upto 30%' but what they have already accomplished in these early days. sebbbi over at b3d thinks it can already give better results than that.

https://forum.beyond3d.com/posts/1868894/

And you're conflating more characters with async compute. Nothing of the sort. And GTA V with thousands of NPCs would be a blast.


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> Well,
> 
> It seems that the Beyond3D tests have had the opposite effect they intended. The test appears to prove that Maxwell/2 cannot perform Compute and Graphic tasks in Parallel:
> 
> 
> The thread where the discussion is taking place can be found here:
> 
> __
> https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/
> 
> This appears to point to more evidence that Maxwell/2 cannot perform both Compute and Parallel tasks in Parallel.


That person you quoted is contradicting himself with every post he makes.

In fact, the original poster pointed out that async has been part of CUDA since Fermi.


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> That person you quoted is contradicting himself with every post he makes.
> 
> In fact, the original poster pointed out that async has been part of CUDA since Fermi.


You can read the Beyond3D thread here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9

And here's the graph:



Notice how the GTX 980 Ti has all of that white space? Notice how the white space disapears with each step? That's because the compute tasks are being added to the Graphics tasks. This indicates that Maxwell2 is running in serial. But it indicates a second thing. From initial testing, it does perform better at 32, 64,128 batches but that's because Maxwell 2 is great at compute and graphics loads.

The latency numbers indicate a lack of parallelism. So Maxwell 2 probably does REALLY well in serial.

The original poster thought the test functioned as a benchmark. It doesn't. He jumped to a conclusion once the Maxwell 2 GPU was able to complete the tests.

Of course Maxwell 2 can complete the tests, it's driver exposes Async compute, as Oxide confirmed, but it runs both the compute and graphic tasks in serial (the reason Oxide was running into performance issues).


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> You can read the Beyond3D thread here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9
> 
> And here's the graph:
> 
> 
> 
> Notice how the GTX 980 Ti has all of that white space? That indicates, from initial testing, that it cannot perform both compute and graphic tasks in parallel.
> 
> The latency numbers also corroborate with this assertion.
> 
> The original poster thought the test functioned as a benchmark. It doesn't. He jumped to a conclusion once the Maxwell 2 GPU was able to complete the tests.
> 
> Of course Maxwell 2 can complete the tests, it's driver exposes Async compute, as Oxide confirmed, but it runs both the compute and graphic tasks in serial (the reason Oxide was running into performance issues).




__
https://www.reddit.com/r/3j5r8s/psa_before_we_all_jump_to_conclusions_and_crucify/cumlmwv

This is interesting.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Well,
> 
> It seems that the Beyond3D tests have had the opposite effect they intended. The test appears to prove that Maxwell/2 cannot perform Compute and Graphic tasks in Parallel:
> 
> 
> The thread where the discussion is taking place can be found here:
> 
> __
> https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/%5B/URL


That seems to show that the 980 Ti is massively faster at 32 and 64, and then roughly even up to 128. Be nice to see someone extend that limit and see where GCN starts to fall off. Wonder why it is grouping them in batches of 32 - because of the 1+31 configuration?


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Kand*
> 
> That person you quoted is contradicting himself with every post he makes.
> 
> In fact, the original poster pointed out that async has been part of CUDA since Fermi.
> 
> 
> 
> You can read the Beyond3D thread here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9
> 
> And here's the graph:
> 
> 
> 
> Notice how the GTX 980 Ti has all of that white space? That indicates, from initial testing, that it cannot perform both compute and graphic tasks in parallel.
> 
> The latency numbers also corroborate with this assertion.
> 
> The original poster thought the test functioned as a benchmark. It doesn't. He jumped to a conclusion once the Maxwell 2 GPU was able to complete the tests.
> 
> Of course Maxwell 2 can complete the tests, it's driver exposes Async compute, as Oxide confirmed, but it runs both the compute and graphic tasks in serial (the reason Oxide was running into performance issues).
Click to expand...

So maxwell is faster up to 128 batchs hmmm interesting


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> That seems to show that the 980 Ti is massively faster at 32 and 64, and then roughly even up to 128. Be nice to see someone extend that limit and see where GCN starts to fall off. Wonder why it is grouping them in batches of 32 - because of the 1+31 configuration?


No,

I read that wrong. The graphic shows the compute and graphic performance numbers in latency (separately) at the top.

The Bottom shows when doing them Asynchronously.

Notice how GCN doesn't change? That's because GCN is doing both Asynchronously in the bottom graph. Maxwell 2 is performing both tasks in serial. So if you add the top Compute Graph with the bottom Graphics graph you get the result which is claimed to be Async at the bottom. You see the graph move up on Maxwell 2. Because it is not performing the tasks asynchronously.

The latency, however, appears to show that Maxwell 2 is performing in serial. That's why I used the analogy of a GeForceFX vs Radeon 9700 Pro when performing PS2.0 tasks.

I think they want to conduct more tests.

Still more to come I guess.

Quote:


> Originally Posted by *Noufel*
> 
> So maxwell is faster up to 128 batchs hmmm interesting


See above.

Latency numbers here (notice how the latency jumps between Compute only and Compute + Graphics? Not doing both Asynchronously on Maxwell 2):
Quote:


> You guys are missing the point.
> The point wasn't that Maxwell was bad at doing compute. Maxwell does compute very well and very fast.
> The point was that Maxwell is not capable of doing compute and graphics asynchronously at the same time.
> 
> For example, look at the GTX680 test run by MDolenc:
> Compute only:
> 1. 17.91ms
> 2. 18.03ms
> 3. 17.90ms
> Graphics only: 50.75ms (33.06G pixels/s)
> Graphics + compute:
> 1. 68.12ms (24.63G pixels/s)
> 2. 68.20ms (24.60G pixels/s)
> 3. 68.23ms (24.59G pixels/s)
> You see how the Graphics+compute runs took almost exactly the compute time plus the graphics time? 18ms + 50ms = 68ms~
> 
> This is true for all of the tests run by NVidia GTX owners in that thread, like this GTX960:
> Compute only:
> 1. 11.21ms
> Graphics only: 41.80ms (40.14G pixels/s)
> Graphics + compute:
> 1. 50.54ms (33.19G pixels/s)
> 
> 50.54ms is 95.34% of 11.21 + 41.8
> GTX970:
> Compute only:
> 1. 9.77ms
> Graphics only: 32.13ms (52.22G pixels/s)
> Graphics + compute:
> 1. 41.63ms (40.30G pixels/s)
> 
> 41.63 is 99.36% of 9.77 + 32.13
> 
> GTX980Ti:
> Compute only:
> 1. 11.63ms
> Graphics only: 17.88ms (93.82G pixels/s)
> Graphics + compute:
> 1. 27.69ms (60.59G pixels/s)
> 
> 27.69 is 93.83% of 11.63 + 17.88
> 
> But then if you start looking at the GCN cards:
> 
> Radeon 290:
> Compute only:
> 1. 52.71ms
> Graphics only: 26.25ms (63.90G pixels/s)
> Graphics + compute:
> 1. 53.32ms (31.47G pixels/s)
> 
> 53.32 is 67.53% of 52.71 + 26.25
> 
> 390X:
> Compute only:
> 1. 52.28ms
> Graphics only: 27.55ms (60.89G pixels/s)
> Graphics + compute:
> 1. 53.07ms (31.62G pixels/s)
> 
> 53.07 is 66.48% of 52.28 + 27.55
> 
> Fury X:
> Compute only:
> 1. 49.65ms
> Graphics only: 25.18ms (66.62G pixels/s)
> Graphics + compute:
> 1. 55.93ms (30.00G pixels/s)
> 
> 55.93 is 74.74% of 49.65 + 25.18
> 
> Laptop 8970M:
> Compute only:
> 1. 61.52ms
> Graphics only: 59.03ms (28.42G pixels/s)
> Graphics + compute:
> 1. 62.97ms (26.64G pixels/s)
> 
> 62.97 is 52.24% of 61.52 + 59.03
> 
> A lower percentage is better. If it's at or near 100% it means it's doing it pretty much serially, no benefit from asynchronously running them together.
> tldr: OP missed the point. Maxwell is good at compute, that wasn't the point. Maxwell just cannot benefit from doing compute + graphics asynchronously. GCN can.




__
https://www.reddit.com/r/3j5r8s/psa_before_we_all_jump_to_conclusions_and_crucify/cumlmwv


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> Yes,
> 
> They wanted to conduct a test that could take advantage of both Task Queue limits. What they've found is what I mentioned to Razor1. Even with only 64 queues, GCN operates like a hypertheaded CPU. Therefore performance doesn't fall off at any point even pushing 128. They'll need to push even further. Who knows how far.
> 
> The latency, however, appears to show that Maxwell 2 is performing in serial. This is also why the performance drops off rather quickly after 32 threads. That's why I used the analogy of a GeForceFX vs Radeon 9700 Pro when performing PS2.0 tasks.
> 
> I think they want to conduct more tests.
> 
> Still more to come I guess.
> 32 threads... then performance falls off. Sure it performs faster at 32 and 64 and 128 but with each step performance falls off.
> 
> This indicates that Maxwell 2 is capable of processing 32 threads per cycle... or in other words.. 32 - 32 - 32 - 32 - in a serial sequence ( "-" indicating a cycle). GCN can go way past its serial limit of 64 because it is performing Asynchronously. GCN appears to not be taxed at all. Performing the same. It probably will fall off at some point.
> 
> That's what I'm reading in their findings so far.
> 
> Latency numbers here (notice how the latency jumps between Compute only and Compute + Graphics? Not doing both Asynchronously on Maxwell 2):


https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10#post-1869204

Thread over here states that

"The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering."

So basically, the whole using the time it takes as a basis isn't indicative of Async performance.


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-10#post-1869204
> 
> Thread over here states that
> 
> "The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.
> 
> To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering."
> 
> So basically, the whole using the time it takes as a basis isn't indicative of Async performance.


He means the latency figures on their own. Meaning looking at Graphics Latency being lower and concluding that lower means better. Looking at the Compute Latency being lower and concluding that lower means better. But if you look at the combined latency, you can deduce that GCN doesn't budge when doing both graphics and compute. Latency remains the same. So you get the same latency figure when doing compute that you do when doing graphics and compute.

That's what he means. And that's what the test was supposed to indicate. To see if Maxwell 2 was performing Asynchronously.

ex:

Fury X:
Compute only:
1. *49.65ms*
Graphics only: *25.18ms* (66.62G pixels/s)
Graphics + compute:
1. *55.93ms* (30.00G pixels/s)

55.93 is 74.74% of 49.65 + 25.18

vs.

GTX980Ti:
Compute only:
1. *11.63ms*
Graphics only: *17.88ms* (93.82G pixels/s)
Graphics + compute:
1. *27.69ms* (60.59G pixels/s)

27.69 is 93.83% of 11.63 + 17.88

Notice that for the GTX 980 Ti you can add up the Compute Only Latency and the Graphics Only Latency and it Equals the Graphics + Compute? This indicates that the GTX 980 Ti is running in serial.

The lower latency figures themselves don't mean anything. That's what Sebbi was saying.

Don't worry I also interpreted the results wrong at first.

PS. I updated my post above to reflect this.


----------



## Forceman

Quote:


> Originally Posted by *Kand*
> 
> So basically, the whole using the time it takes as a basis isn't indicative of Async performance.


The time itself may not be important, but the thing they are trying to highlight is that it doesn't take any longer to do graphics and compute on the GCN cards than it does to do compute-only, while on Maxwell it does.

That leaves unanswered the question of whether this simple test is doing what they think it is doing, or if there may be other issues (maybe driver related) that are messing with the results. For example, why does the compute-only time get longer as you add more batches? What would it look like if it was only doing compute - a straight line?

@Mahigan - is your understanding that the benchmark is running three passes - a compute-only, a graphics-only, and then a compute+graphics, or is it running one pass and just recording the times for each portion?

Edit: Just downloaded and ran it. It is running three separate passes. So why does Maxwell get slower even when running compute only? Async computing should have no impact there, since it is compute only.


----------



## provost

Quote:


> Originally Posted by *cowie*
> 
> you cant code it is the reason because I hear its soooo far in the red it may not even make the light of day???
> fess up
> 1 it has NOT 1 visual aspect
> 2 if not coded properly it will cause performance issues
> 
> So how is the long time business dealings with ati/amd???I hear you are more broke then them
> so come back lie
> I have a surprise for you


Are the character assassination personal attacks necessary to make a point here?


----------



## KenjiS

Quote:


> Originally Posted by *gamervivek*
> 
> It's not 'upto 30%' but what they have already accomplished in these early days. sebbbi over at b3d thinks it can already give better results than that.
> 
> https://forum.beyond3d.com/posts/1868894/
> 
> And you're conflating more characters with async compute. Nothing of the sort. And GTA V with thousands of NPCs would be a blast.


I can think of one BIG one, Assasins Creed Unity.. A lot of its issues and problems were caused by the insane number of draw calls it made, overloading DX11 API... the insane number of NPCs and etc taxed DX to its limits.. DX12 would easily do this

Imagine Skyrim or another game with some massive epic chaotic battle, Imagine Helms Deep rendered realtime in a 3d engine or the insanity you could pull off with an updated Normandy Beach landing


----------



## spacin9

Well, I can confirm SLi does indeed work in DX 11 AotS. Very, very badly. The work load is split via the AFRGPU option in the menu, but the two cards in tandem don't use over 50-55 percent per card, so it pretty much sucks compared to using one card that will get full GPU utilization 100% and better. The SLi indicator bar will max out most of the time, but it's choppy and low framerate.

Doesn't seem to work in DX 12 yet. Looks like they're working on it anyway.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> The time itself may not be important, but the thing they are trying to highlight is that it doesn't take any longer to do graphics and compute on the GCN cards than it does to do compute-only, while on Maxwell it does.
> 
> That leaves unanswered the question of whether this simple test is doing what they think it is doing, or if there may be other issues (maybe driver related) that are messing with the results. For example, why does the compute-only time get longer as you add more batches? What would it look like if it was only doing compute - a straight line?
> 
> @Mahigan - is your understanding that the benchmark is running three passes - a compute-only, a graphics-only, and then a compute+graphics, or is it running one pass and just recording the times for each portion?
> 
> Edit: Just downloaded and ran it. It is running three separate passes. So why does Maxwell get slower even when running compute only? Async computing should have no impact there, since it is compute only.


I'm not sure about Maxwell/2 but I do remember that Kepler was using a CPU driven Software scheduler and GCN was using a Hardware Scheduler. This could have that sort of impact.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> I'm not sure about Maxwell/2 but I do remember that Kepler was using a CPU driven Software scheduler and GCN was using a Hardware Scheduler. This could have that sort of impact.


To my thinking, if the issue was async compute, you should see a flat line for compute, a flat line for graphics, and then a not-flat line (or greater than the sum of the two individually line) for compute+graphics (indicating the penalty for running both at the same time). Instead what we are seeing is that increasing the count causes problems with compute itself.


----------



## Mahigan

Quote:


> Originally Posted by *provost*
> 
> Are the character assassination personal attacks necessary to make a point here?


Yeah I reported him.

Oxide dev takes the time to come and talk to us and this guy here just decides flame him. Completely unprofessional conduct.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> To my thinking, if the issue was async compute, you should see a flat line for compute, a flat line for graphics, and then a not-flat line (or greater than the sum of the two individually line) for compute+graphics (indicating the penalty for running both at the same time). Instead what we are seeing is that increasing the count causes problems with compute itself.


The program is probably not multi-threaded (or Maxwell 2 uses more CPU than GCN). Have you checked your task manager to see how the load on the CPU changes with each step?

Because I seem to recall the Oxide dev mentioning a higher CPU overhead.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> The program is probably not multi-threaded. Have you checked your task manager to see how the load on the CPU changes with each step?


No, but if that were the reason, wouldn't that pretty much discredit this benchmark? That would be driver or program related, not a Maxwell issue.

Edit: CPU load doesn't change at all with the GCN card, let me check the Maxwell.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> No, but if that were the reason, wouldn't that pretty much discredit this benchmark? That would be driver or program related, not a Maxwell issue.


Well if the scheduler is software driven, it would increase the CPU usage per step. This would result in added latency I would think. Oxide dev did mention Maxwell having a higher CPU overhead.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Well if the scheduler is software driven, it would increase the CPU usage per step. This would result in added latency I would think. Oxide dev did mention Maxwell having a higher CPU overhead.


Didn't seem to, usage stayed the same all the way through on the 960 also. Why the huge disparity in compute times with Maxwell and GCN? Maxwell isn't that much better at compute.

Anyone tried this bench on a 6970? Wonder what that looks like.

I'll attach it here in case anyone wants to try it.

AsyncCompute.zip 17k .zip file


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Didn't seem to, usage stayed the same all the way through on the 960 also. Why the huge disparity in compute times with Maxwell and GCN? Maxwell isn't that much better at compute.
> 
> Anyone tried this bench on a 6970? Wonder what that looks like.


LOL!

VLIW4, now that would be an interesting test.


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> I'm not sure about Maxwell/2 but I do remember that Kepler was using a CPU driven Software scheduler and GCN was using a Hardware Scheduler. This could have that sort of impact.


If I'm not correct NVidia could fix this issue if they multi-thread the CPU driven software scheduler. However it has to specifically know how many threads it has available at the time of the operations or BSOD.


----------



## EightDee8D

Quote:


> Originally Posted by *Forceman*
> 
> Didn't seem to, usage stayed the same all the way through on the 960 also. Why the huge disparity in compute times with Maxwell and GCN? Maxwell isn't that much better at compute.
> 
> Anyone tried this bench on a 6970? Wonder what that looks like.
> 
> I'll attach it here in case anyone wants to try it.
> 
> AsyncCompute.zip 17k .zip file


it requires D3D12 card , 6970 isn't D3D12 capable afaik.


----------



## JohnLai

Quote:


> Originally Posted by *EightDee8D*
> 
> it requires D3D12 card , 6970 isn't D3D12 capable afaik.


Hmm...cant bench with Fermi since Nvidia hasnt release dx12 pathline for Fermi yet.


----------



## Forceman

Quote:


> Originally Posted by *EightDee8D*
> 
> it requires D3D12 card , 6970 isn't D3D12 capable afaik.


Good point.

Interesting that the 750Ti shows the same behavior as the 9 series, considering it doesn't even support the 31+1 of Maxwell 2. Seems like there is some other factor at play here.


----------



## speedyeggtart

So from what I am getting from reading this thread.

AMD GCN cards that were refreshed or "rebranded" this year can do A-sync Compute, an important core feature of DX12, and the new Maxwell 1 & 2 can't?

I guess when DX12 games start coming out next year Nvidia got some fresh new customers for their Pascal GPUs due to people whom just bought Maxwell GPUs thought they getting DX12 A-Sync Compute but turns out not.


----------



## Forceman

Quote:


> Originally Posted by *speedyeggtart*
> 
> A-sync Compute, an important core feature of DX12


According to some people it is an optional feature of DX12, and not required for full DX12 functionality.

Quote:


> It's also worth noting, as Kollock does, that since asynchronous compute isn't part of the DX12 specification, its presence or absence on any GPU has no bearing on DX12 compatibility.


http://www.extremetech.com/gaming/213202-ashes-dev-dishes-on-dx12-amd-vs-nvidia-and-asynchronous-compute


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Good point.
> 
> Interesting that the 750Ti shows the same behavior as the 9 series, considering it doesn't even support the 31+1 of Maxwell 2. Seems like there is some other factor at play here.


Well a 750 Ti is a Maxwell part:


If the Beyond3D test is valid then the 750 Ti performs just as it was intended to perform in this image. The Maxwell 2 cards, however, don't correspond to the image and instead perform just like the 750 Ti. Doing their Graphics and Compute loads in Serial.

I would have to extend the big X's over the Maxwell 2 info as well if this test is correct.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Well a 750 Ti is a Maxwell part:
> 
> 
> If the Beyond3D test is valid then the 750 Ti performs just as it was intended to perform in this image. The Maxwell 2 cards, however, don't correspond to the image and instead perform just like the 750 Ti. Doing their Graphics and Compute loads in Serial.
> 
> I would have to extend the big X's over the Maxwell 2 info as well if this test is correct.


Actually, the 750Ti steps in groups of 16, not 32 like the 9 series cards. So it isn't behaving quite the same.

Edit: Actually, 900 cards do 31 and then step, so that actually makes sense with 31+1. Then when it gets past that point it does another 32 compute (the time of which it takes gets added on), then after another 32 it does runs another pass (in essence, which adds another increment of time). So actually, that pretty much matches the slide - it's the 750Ti that seems off since it only does 16 before it increments.

So it seems like Maxwell 2 is doing 1 graphics task and 31 compute tasks simultaneously, and then if there are more compute tasks remaining it has to go back and process another whole batch, so if it is 1 more or 32 more it takes the same time. Then if it exceeds 63 (1 graphic, 31 compute, and another 32 compute) it has to iterate another pass and that time gets added on. So actually, it seems like it is working like you'd expect, except that you would expect to see that first set to be 32 instead of 31 when it is running compute only.


----------



## JunkoXan

Quote:


> Originally Posted by *Forceman*
> 
> Quote:
> 
> 
> 
> Originally Posted by *speedyeggtart*
> 
> A-sync Compute, an important core feature of DX12
> 
> 
> 
> *According to some people it is an optional feature of DX12, and not required for full DX12 functionality.
> *
> Quote:
> 
> 
> 
> It's also worth noting, as Kollock does, that since asynchronous compute isn't part of the DX12 specification, its presence or absence on any GPU has no bearing on DX12 compatibility.
> 
> Click to expand...
> 
> http://www.extremetech.com/gaming/213202-ashes-dev-dishes-on-dx12-amd-vs-nvidia-and-asynchronous-compute
Click to expand...

I think right now it's optional, but once the ball rolls full speed it'll be required as devs can make full use of the feature for the game/software to run at full potential when they learn how to properly utilize it.


----------



## Forceman

Quote:


> Originally Posted by *JunkoXan*
> 
> I think right now it's optional, but once the ball rolls full speed it'll be required as devs can make full use of the feature for the game/software to run at full potential when they learn how to properly utilize it.


I don't think they can retroactively make it required, that would need it to be DX12.1 or something.


----------



## Mahigan

If this is all true, this is why Asynchronous Compute matters:

Mirror's Edge Catalyst will be released on *February 23, 2016* for Xbox One, PS4, and PC.
Quote:


> In order to fit a slew of heavy processing tasks into its tight budgets, DICE has employed new techniques specific to modern graphics processors and the new console generations. *By taking advantage of Asynchronous Compute*, the developer is now capable of reaching new levels of in-depth optimizations, with which it has been able to squeeze more work out of the graphics pipeline.


Read more: http://www.vcpost.com/articles/87174/20150826/mirrors-edge-catalyst-boasts-advanced-rendering-techniques-reflection-technologies-glass-city.htm#ixzz3kSkxnueB

Rise of the Tomb Raider Q1 2016
Quote:


> Of all the rendering techniques used in the game, the most fascinating is its *use of asynchronous compute* for the generation of advanced volumetric lights. For this purpose, the developer has employed a resolution-agnostic voxel method, which allows volumetric lights to be rendered using asynchronous compute after the rendering of shadows, with correctly handled transparency composition.


Read more: http://gearnuke.com/rise-of-the-tomb-raider-uses-async-compute-to-render-breathtaking-volumetric-lighting-on-xbox-one/

Deus Ex: Mankind Divided Q1 2016
Quote:


> Deus Ex: Mankind Divided to use *async compute to enhance Pure Hair simulation*. During its SIGGRAPH 2015 presentation, Eidos Montreal revealed that Deus Ex: Mankind Divided will be the first title to make use of Pure Hair technology for hair simulation. For the uninitiated, Pure Hair is the successor to AMD's TressFX technology, which was first seen in Tomb Raider. The new hair solution has been created in collaboration between AMD and Eidos Montreal's research and development lab.


Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/

Just three titles on the way. That's without mentioning Fable Legends and others...

That's why this is a very big deal because Pascal won't arrive before, early estimates, Q2 2016.


----------



## Forceman

Guess it depends how integrated async is into the engine. If it's just hair and lights Nvidia users can just turn those features off, like AMD users do now with Gameworks.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Guess it depends how integrated async is into the engine. If it's just hair and lights Nvidia users can just turn those features off, like AMD users do now with Gameworks.


Most probably. Devs will have to enable users to deactivate considering how many people run nVIDIA graphics on the PC.


----------



## mav451

Quote:


> Originally Posted by *JunkoXan*
> 
> I think right now it's optional, but once the ball rolls full speed it'll be required as devs can make full use of the feature for the game/software to run at full potential when they learn how to properly utilize it.


ExtremeTech says they're "investigating" this:
Quote:


> Note: Nvidia has represented to ExtremeTech and other hardware sites that Maxwell 2 (the GTX 900 family) is capable of asynchronous compute, with one graphics queue and 31 compute queues. We are investigating this situation. It is not clear how these compute queues are accessed or what the performance penalty is for using them; GCN, according to AMD, is eight ACEs' with eight queues each, for a total of 64 queues + a graphics queue.


Why do I have feeling that _once_ it's revealed on a forum/reddit thread/Mahigan post that it's going to show up on ExtremeTech's front page haha


----------



## dogen1

Quote:


> Originally Posted by *Forceman*
> 
> I don't think they can retroactively make it required, that would need it to be DX12.1 or something.


I'm pretty sure multi-engine(async-compute in D3D12 terms) support is already required. Every D3D12 compliant GPU needs to handle it one way or the other. Intel and Nvidia will probably just serialize it, for lack of a better word, through the driver.


----------



## Mahigan

Quote:


> Originally Posted by *mav451*
> 
> ExtremeTech says they're "investigating" this:
> Why do I have feeling that _once_ it's revealed on a forum/reddit thread/Mahigan post that it's going to show up on ExtremeTech's front page haha


I hope they respond. I've been waiting for a while now for a response.


----------



## Kpjoslee

Hmm, Pretty interesting stuff going on in Beyond3D. Definitely looks like GCN's computation works cocurrently in parallel while Maxwell does in serial order.


----------



## Rabit

Any one try using lower end GPUs for this test like R7 260X * GCN 1.1 or R9 285/380 *GCN 1.2
How looks difference DX11 vs DX12 on this GPUs ?


----------



## MadRabbit

Quote:


> Originally Posted by *Rabit*
> 
> Any one try using lower end GPUs for this test like R7 260X * GCN 1.1 or R9 285/380 *GCN 1.2
> How looks difference DX11 vs DX12 on this GPUs ?


I can try it on an 280x but a bit later, just getting ready for work.


----------



## charlievoviii

Quote:


> Originally Posted by *MadRabbit*
> 
> You missed this part from the news huh?


back at you








Quote:


> I want to add thing here, yesterday Hallock was all over this downplaying Nvidia and evangelizing how good their GPUs are, and now he is taking a step back with these answers on Reddit.
> 
> it's all marketing mud-fighting and attacking each other these days in-between Nvidia and AMD. Fun fact: on the AMD GPU Tech day for Fury, I myself literally confronted and asked about the DX12 supported feature levels to Hallock, and in this case Hallock himself absolutely refused to give a valid answer at the time as he very well knew that AMD would not fully support DX12 either.


----------



## MadRabbit

Quote:


> Originally Posted by *charlievoviii*
> 
> back at you


Since the whole "article" is about DX12 (full support or not) he was kinda right. Even without full support for DX12 on AMD's cards they are better than Nvidias at this point. (Or at least a bit better supported on DX12 part)

This whole mud flinging is just pure and utter nonsense. Only one way to describe it, AMD's marketing.


----------



## Lantian

how has nvidia lied about dx 12 now, asynch shaders is not a part of dx 12 spec, it's something that simply can be done now with the access that dx 12 gives you, never was it part of it or is it required for any tier support of dx 12, where did the idea even come from? also I am pretty sure microsoft tiers the cards and not nvidia or amd
and finally if async shaders are truly responsible for the performance benefits you see between dx11 and dx 12 care to explain why thief saw none of that in mantle mode?


----------



## Rabit

Who cares only Maxwell 2 in theory supports fully DX 12.1










As you see all old Nvidia GPUs barely catch to DX 12 train with 11.0 support








Old GCN based AMD GPUs looks better









Source: http://www.overclock.net/t/1567968/directx-12-direct3d-12-feature-levels-and-resources


----------



## GorillaSceptre

Quote:


> Originally Posted by *charlievoviii*
> 
> back at you
> 
> 
> 
> 
> 
> 
> 
> 
> so to sums it up both companies are Full of CRAP.


That is a ridiculous article. Sounds like it was written by a fanboi. Instead of reporting on the big story, which is Async, they instead post that rubbish? Seems legit









What he said was the truth, there is no architecture that supports every feature of Directx 12. I don't see the issue.

It's foolish to blindly trust any corporation, but in this case there's only one company that's full of crap.


----------



## charlievoviii

No need to jump to conclusion. There are many ways to gain performance and such in a game. We can't judge AMD or Nvidia until more DX12 games start to come out. So enjoy your AMD or Nvidia in the mean time. See you guys later back to MGSV now.


----------



## charlievoviii

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That is a ridiculous article. Sounds like it was written by a fanboi. Instead of reporting on the big story, which is Async, they instead post that rubbish? Seems legit
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What he said was the truth, there is no architecture that supports every feature of Directx 12. I don't see the issue.
> 
> It's foolish to blindly trust any corporation, but in this case there's only one company that's full of crap.


Here come the AMD typical excuses. If the report is good about AMD it's the true and start acting all big. If the report are bad start making excuses.







Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


----------



## GorillaSceptre

Quote:


> Originally Posted by *charlievoviii*
> 
> Here come the AMD typical excuses. Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


AMD excuses? I'm stating facts.. I don't have a dog in this fight, but looking at your sig makes me understand why you're getting worked up









This is what you said: "lol the true came out. FuryX , X = Lies"

Well done, pretending that AMD have been caught doing something, when the info actually came from them in the first place..


----------



## Rabit

Quote:


> Originally Posted by *charlievoviii*
> 
> Here come the AMD typical excuses. Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


My R7 260x gives me more FPS in Withcer 3 than 750 Ti ;P


----------



## MadRabbit

Quote:


> Originally Posted by *charlievoviii*
> 
> Here come the AMD typical excuses. If the report is good about AMD it's the true and start acting all big. If the report are bad start making excuses.
> 
> 
> 
> 
> 
> 
> 
> Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


How many DX12 games are there to make any conclusion on both sides what so ever?









You know, next time you call someone "fan girls" you might want to change your attitude towards people not to sound like one yourself.


----------



## charlievoviii

Quote:


> Originally Posted by *MadRabbit*
> 
> How many DX12 games are there to make any conclusion on both sides what so ever?


you have pretty much repeat what i have been saying and prove my point. Good job


----------



## Olivon

Quote:


> Like all GCN chips "1.1 and 1.2" Fiji medium level 12_0 hardware feature of Direct3D, but ignores the 12_1 level supported by the Nvidia GPU Maxwell's second generation. By cons, at the level of other major Direct3D12 specification, the "Binding Resources" (the number of provisions to developer resources), Fiji is at maximum level, Tier 3, where the Nvidia GPU limited at Tier 2. *as much as AMD and Nvidia have the opportunity to offer developers graphic effects which are incompatible with the material of the other*.


http://www.hardware.fr/news/14263/amd-radeon-r9-fury-x-gpu-fiji-details.html

Oxide is not the only studio doing DX12 (hopefully, it's just a mini studio) and we can expect DX12 functionalities that work better for nVidia too.
This debate is just to make AMD fans wet their panties and crap on nVidia (I heard they kill child too) like always.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Olivon*
> 
> http://www.hardware.fr/news/14263/amd-radeon-r9-fury-x-gpu-fiji-details.html
> 
> Oxide is not the only studio doing DX12 (hopefully, it's just a mini studio) and we can expect DX12 functionalities that work better for nVidia too.
> This debate is just to make *AMD fans wet their panties and crap on nVidia* (I heard they kill child too) like always.


Wow..

All the evidence so far points to GCN being superior in DX12.

Spin it how you want, act immature and attack people. Still doesn't change the facts


----------



## Dudewitbow

Quote:


> Originally Posted by *charlievoviii*
> 
> back at you


Its already in official documents that GCN was never 100% compliant with DX12(no gpu is) as its already listed that all GCN gpus have a range between feature level 11_1 and 12_0 (for reference, Maxwell gen 1 is feature level 12_0, Gen 2 is 12_1). What Maxwell 2.0 has specifically is the ability to use conservative rasterization. the problem with that method is that Conservative rasterization is ONLY found in maxwell 2.0(roughly 6.2% of hardware survey takers on steam) and generally is only used for lighting and shadow effects, whereas the topic in question, async compute is found on ALL GCN, including the extent of the current generation consoles which already use it in practice at occasions.

The bigger question which basically this thread is trying to address is why Maxwell 2.0 compliant is tagged as 12_1 compliant, but missing a very key performance feature that DX12 offers. Its not about making AMD look better(though some people might perceive it as so, and use it to their advantage) but asking a question on why Maxwell in particular is not doing this well. If no one decides to adress this, it would be as if for instance no one addressed the Gtx 970 Vram issue. Directly addressing a problem should be a job of the hardware vendors. As consumers, its our job to raise these questions so that they get answered.

To use an old situation, a situation where this happened was Crysis 2 with water tessellation. AMD knew that its hardware was not up to spec when it came to tesselation during icrysis 2's inception, so they answered that problem with a tessellation switch, which to this day was brought back up again due to gameworks implementation of Hairworks. Although AMD users could not enjoy the same visual quality as nvidia builds, it was able to use hair works relatively well because AMD users can change the ludicrous 64x tesselation factor down to a more modest 8/16/32x while retaining 90%+ of the visual performance. Older Kepler cards for instance didn't have this luxury.

I don't know why people are being so defensive with their gpus, On a far back scale, it would actually be BENEFICIAL to support this push as it FORCES nvidia to eventually address this problem one way or another. Best case scenario is that Nvidia finds a way to push out better performance and everyone wins. This is how GPU races SHOULD work as the performance of the competitor card pushed the competition to work harder. This was similar to Starswarm with Nvidia responding by extreme CPU overhead in DX11, because of starstorm, it probably pushed nvidia's driver scheduling ahead in time, The end result led to massive DX11 gains for nvidia. Worst case scenario is that nvidia doesn't find a work around, and will announce that its capability is only found in pascal, and if that happens, its up to you as the user to decide what you want to do, but regardless defending your gpu by attacking the opposing side isn't the way to go. The way to go is addressing whats wrong and demanding a fix.


----------



## SpeedyVT

Quote:


> Originally Posted by *charlievoviii*
> 
> Here come the AMD typical excuses. If the report is good about AMD it's the true and start acting all big. If the report are bad start making excuses.
> 
> 
> 
> 
> 
> 
> 
> Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


Quote:


> Originally Posted by *GorillaSceptre*
> 
> That is a ridiculous article. Sounds like it was written by a fanboi. Instead of reporting on the big story, which is Async, they instead post that rubbish? Seems legit
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What he said was the truth, there is no architecture that supports every feature of Directx 12. I don't see the issue.
> 
> It's foolish to blindly trust any corporation, but in this case there's only one company that's full of crap.


I think if anyone is going to say NVidia > AMD or AMD > NVidia, they've got some confused logic. The way hardware works is completely and entirely different. Their architectures are completely and totally designed to operate in ways completely and totally different from each other. Whether or not NVidia is capable of Asyncronous is not in question, facts have been laid out.

There will always be games that utilize NVidia's hardware better than AMD vice-versa.

NVidia has tremendous non-asyncronous compute power and AMD has tremendous asyncronous compute power.

I would assume this is why NVidia's GPUs benefited more from DX11 than DX12. This is also why tesselation performance is significantly better in NVidia over AMD. The asyncronous power however grants AMD the ability to do simutaneous work loads of tesselation at a broad scale but under less of a load. Technically equal.

If NVidia traded for asyncronous performance significant possibility it would lose a massive precentage of serial performance. You can not have one without the other.

This is my understanding of the situation, although my explaination is flawed. It is more flawed to assume that AMD or NVidia is weaker and do not know their stuff. Although there is only one thing I can side with is that AMD was a great choice for the console platforms for this asyncronous power.

I'm sleepy off to bed.


----------



## uplink

I'm nvidia fan, but I'm looking forward for first AAA titles that will utilize async compute in effects, and shatter nvidia. I'm so pissed off with nvidia policy for past few years.

Instead of messing around, they should give 50-70% off for the next graphics card they sell You, if You're an owner of one of theirs, depending on class You have, etc. For the next one for SLi and/or next gen, instead of forbidding companies to use some features and techs.


----------



## Serios

Quote:


> Originally Posted by *charlievoviii*
> 
> Here come the AMD typical excuses. If the report is good about AMD it's the true and start acting all big. If the report are bad start making excuses.
> 
> 
> 
> 
> 
> 
> 
> Have you seen any other games with DX12 beside this game that AMD actually doing good ? So yeah keep on dreaming about the future. In the mean time i will enjoy my FPS with nvidia.


What are you taking about??
You seem very confused.

AMD never claimed to have full support for DX12, your attempts to bash AMD are irrelevant.


----------



## Clocknut

Quote:


> Originally Posted by *Dudewitbow*
> 
> Its already in official documents that GCN was never 100% compliant with DX12(no gpu is) as its already listed that all GCN gpus have a range between feature level 11_1 and 12_0 (for reference, Maxwell gen 1 is feature level 12_0, Gen 2 is 12_1). What Maxwell 2.0 has specifically is the ability to use conservative rasterization. the problem with that method is that Conservative rasterization is ONLY found in maxwell 2.0(roughly 6.2% of hardware survey takers on steam) and generally is only used for lighting and shadow effects, whereas the topic in question, async compute is found on ALL GCN, including the extent of the current generation consoles which already use it in practice at occasions.
> 
> The bigger question which basically this thread is trying to address is why Maxwell 2.0 compliant is tagged as 12_1 compliant, but missing a very key performance feature that DX12 offers. Its not about making AMD look better(though some people might perceive it as so, and use it to their advantage) but asking a question on why Maxwell in particular is not doing this well. If no one decides to adress this, it would be as if for instance no one addressed the Gtx 970 Vram issue. Directly addressing a problem should be a job of the hardware vendors. As consumers, its our job to raise these questions so that they get answered.


basically AMD have "nearly" full control of what is the important features inside DirectX12, since they control 2 of the biggest console GPU.

Perhaps *MAY BE* this asynchronous compute is all their plan all this while & it wasnt adding into Direct 12 specification until very late revision. Nvidia have no idea asynchronous compute has strong importance until Microsoft decide to add it at the very last minutes, the Maxwell was already tape out by the time this feature is added into DirectX12. DirectX12 is basically = DirectX with mantle feature included.

Remember, Microsoft is fighting hard on Sony PS4, they need every ounce of GPU performance they can squeeze out from that AMD GPU in Xbox One, so they isnt too far behind from PS4. Sony have their very own API, they may very well supporting asynchronous compute on their own API too. It will be silly/stupid for Microsoft to deliberately left out asynchronous compute in their directX12. So Microsoft basically is playing right into AMD's favor this time, Nvidia is the one that get screw over at the end of this.

The only thing Nvidia retain their advantages now is their DX9-11 CPU overhead. If AMD somehow decide to fix this in future driver updates = then it may be absolutely no reason to buy current Nvidia anymore.


----------



## cowie

Quote:


> Originally Posted by *Mahigan*
> 
> Yeah I reported him.
> 
> Oxide dev takes the time to come and talk to us and this guy here just decides flame him. Completely unprofessional conduct.


sorry I am no paid troll
the unprofessional conduct by Oxide is the true flame here please post up some screenshots of this feature on and off.....thanks


----------



## dogen1

Quote:


> Originally Posted by *Lantian*
> 
> how has nvidia lied about dx 12 now, asynch shaders is not a part of dx 12 spec, it's something that simply can be done now with the access that dx 12 gives you, never was it part of it or is it required for any tier support of dx 12, where did the idea even come from? also I am pretty sure microsoft tiers the cards and not nvidia or amd
> and finally if async shaders are truly responsible for the performance benefits you see between dx11 and dx 12 care to explain why thief saw none of that in mantle mode?


It is part of the spec, but it's called "muli-engine". The reason why thief didn't benefit much is because it's not an on off switch. There's a difference between using it, and really using it.


----------



## GorillaSceptre

Quote:


> Originally Posted by *cowie*
> 
> sorry I am no paid troll
> the unprofessional conduct by Oxide is the true flame here please post up some screenshots of this feature on and off.....thanks


He answered some of the questions and concerns people were having.

Wheres the flame and unprofessional conduct?


----------



## Lantian

Quote:


> Originally Posted by *dogen1*
> 
> It is part of the spec, but it's called "muli-engine". The reason why thief didn't benefit much is because it's not an on off switch. There's a difference between using it, and really using it.


source? especially if it's part of the spec show the source and what is that "muli-engine" nonsense


----------



## therealgiblet

I'd just like to temper this discussion with a 'LOL'.

Just feel it needs it.

All the best.


----------



## provost

Quote:


> Originally Posted by *GorillaSceptre*
> 
> AMD excuses? I'm stating facts.. I don't have a dog in this fight, but looking at your sig makes me understand why you're getting worked up
> 
> 
> 
> 
> 
> 
> 
> 
> 
> This is what you said: "lol the true came out. FuryX , X = Lies"
> 
> Well done, pretending that AMD have been caught doing something, when the info actually came from them in the first place..


Well said. I don't believe this unintelligent drivel is indigenous to Oc.net, as I read some of the comments in response to that article this poster referenced, and if there was ever a good example of group think, I believe that I found it.... lol
Since this Dx12 piqued by interest, I have been reading some forums, and it would be fair to say this forum (and one other, or may be two at most, but the breadth of this Oc.net is unmatched) is the only place where one can find intelligent exchange of ideas for pc gamers and enthusiasts.

I enjoy reading both sides of the argument, especially arguments that are contrary to my personal preference, as it alerts to my own blind spot.

Anyway, this thread has been very informative and a service to the community here, imho.


----------



## dogen1

Quote:


> Originally Posted by *Lantian*
> 
> source? especially if it's part of the spec show the source and what is that "muli-engine" nonsense


I was going by what I've read from devs.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-7#post-1868915

http://forums.anandtech.com/showpost.php?p=37618642&postcount=10

But you can also find it in the D3D12 specification.

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx


----------



## cowie

Quote:


> Originally Posted by *GorillaSceptre*
> 
> He answered some of the questions and concerns people were having.
> 
> Wheres the flame and unprofessional conduct?


have you not seen the childish behavior on both parts from nv and oxide.?
any screenshots or no? turn off a/s in drivers is what nv should do if and when they can not get there act together for this added in dx12 feature.it seems to work for amd's tess.
just please can you get oxcide to give us all in public the gorgeous visual splendor this feature adds?
I want us all to see what we are missing if its turned off


----------



## Paul17041993

I believe I talked something about this somewhere last year, maxwell 2 lacks a dedicated graphics queue right? because that's where the problem would be...


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Clocknut*
> 
> Remember, Microsoft is fighting hard on Sony PS4, they need every ounce of GPU performance they can squeeze out from that AMD GPU in Xbox One, so they isnt too far behind from PS4. Sony have their very own API, they may very well supporting asynchronous compute on their own API too. It will be silly/stupid for Microsoft to deliberately left out asynchronous compute in their directX12. So Microsoft basically is playing right into AMD's favor this time, Nvidia is the one that get screw over at the end of this.
> 
> The only thing Nvidia retain their advantages now is their DX9-11 CPU overhead. If AMD somehow decide to fix this in future driver updates = then it may be absolutely no reason to buy current Nvidia anymore.


Pretty spot on.

Having said that, if AMD *FINALLY* gets off their butts and fixes DirectX 9, 10, and 11 ... I honestly will be hard pressed to buy AMD, because that would basically be rewarding AMD for INTENTIONALLY ignoring 100% of the games that have been released in the past 7 years. I mean just look at how long it took them to even ADMIT they had a problem with Crossfire, then to do the Frame Pacing patch for DirectX 11 and 10 ... and STILL HAVE NOT fixed it for DirectX 9.

Basically, AMD ignores a problem until it becomes a massive issue. They never do a "stitch in time" solution. But hey, I would be pleasantly surprised if they do. My AMD builds would welcome it.

Now, having said all that, I will remind everyone here to not forget that BOTH AMD and nVidia have FPS increases under the Ashes benchmark ... it's just that AMD's gains over DirectX 11 to 12 are much greater. But make no mistake, nVidia is still gaining FPS. It's just that AMD is actually STARTING to be competitive now. I'm sure nVidia will focus on things and make improvements by the time there is ACTUALLY a game released (or even 5 games ... which is what, about the total of Mantle games made?).

DirectX 12 hasn't killed nVidia, it just CURRENTLY breathed a little life into AMD ... for now.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Paul17041993*
> 
> I believe I talked something about this somewhere last year, maxwell 2 lacks a dedicated graphics queue right? because that's where the problem would be...


And AMD cards lack several features in DirectX 12 feature sets that nVidia has. It'll be interesting to see how AMD performs when those features are turned on and how much of an advantage nVidia has for DirectX feature set 12_1.


----------



## Xuper

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And AMD cards lack several features in DirectX 12 feature sets that nVidia has. It'll be interesting to see how AMD performs when those features are turned on and how much of an advantage nVidia has for DirectX feature set 12_1.


Dream on! , Nvidia will not able to exploit those feature 12_1.


----------



## Klocek001

Quote:


> Originally Posted by *Xuper*
> 
> Dream on! , Nvidia will not able to exploit those feature 12_1.


why is that ?
I thought conservative raster and raster order were two things to gain even more control via gameworks. If maxwell 2 cannot utilize async compute to its advantage it's basically the same caliber of lie as the 3.5GB thing on 970. However, I still think nvidia's up to sth with that 12_1 features, paradoxically we may even see AMD falling behind even more than in DX11 gameworks titles. Not that I'm complaining about this particular one, I use a lot of GW settings in my games, I'd be great to see HW take a smaller performance hit on dx12 games.


----------



## GorillaSceptre

Quote:


> Originally Posted by *cowie*
> 
> have you not seen the childish behavior on both parts from nv and oxide.?
> any screenshots or no? turn off a/s in drivers is what nv should do if and when they can not get there act together for this added in dx12 feature.it seems to work for amd's tess.
> just please can you get oxcide to give us all in public the gorgeous visual splendor this feature adds?
> I want us all to see what we are missing if its turned off


No, i haven't seen childish behavior from both of them.

What i have seen is Nvidia try bully them and say that their game isn't a good indication of DX12 performance, basically questioning their competence(something i actually bought at first). When in actuality, Oxide did everything they could to improve the performance for Nvidia. As the Oxide dev said in this thread " you'd draw the conclusion we were working closer with Nvidia rather than AMD"

Async is off on Nvidia hardware, as it gave worse performance with it on compared to what we see now. For reasons already addressed in depth.


----------



## TrevBlu19

Not to be off topic but i found this..

https://msdn.microsoft.com/en-us/library/windows/desktop/Mt426648(v=VS.85).aspx

Direct3D 12.1 feature set.. What does this mean?


----------



## CasualCat

Hypothetically speaking is there some level of asynchronous compute usage in which FuryX/390X could regularly surpass (by a significant difference) the 980Ti and also wouldn't be held up by their own bottlenecks? Is this a realistic scenario or would this be artificially created like we saw with excessive GW tessellation (tessellation which btw is a DX11 feature)? Even tessellation while it apparently can be (and has been) abused also appears to make a tangible visual difference in some circumstances (see Unigine Heaven for example)

Remember it appears even with FuryX using this feature, 980Ti is still trading blows with it (which is what we've seen in DX11 benchmarks as well when they both have good drivers).


----------



## GorillaSceptre

Quote:


> Originally Posted by *CasualCat*
> 
> Hypothetically speaking is there some level of asynchronous compute usage in which FuryX/390X can regularly surpass (by a significant difference) the 980Ti and also isn't held up by their own bottlenecks? Is this a realistic scenario or would this be artificially created like we saw with excessive GW tessellation (tessellation which btw is a DX11 feature)? Even tessellation while it apparently can be (and has been) abused also appears to make a tangible visual difference in some circumstances (see Unigine Heaven for example)
> 
> Remember it appears even with FuryX using this feature, 980Ti is still trading blows with it (which is what we've seen in DX11 benchmarks as well when they both have good drivers).


Ashes is hardly using Async, and the nature of the game is probably a worst case for AMD.

If future releases heavily rely on Async then GCN will likely slaughter Keplar/Maxwell. Whether games will use it or not is another question. _IF_ the evidence we have is accurate.


----------



## Clocknut

Quote:


> Originally Posted by *Klocek001*
> 
> why is that ?
> I thought conservative raster and raster order were two things to gain even more control via gameworks. If maxwell 2 cannot utilize async compute to its advantage it's basically the same caliber of lie as the 3.5GB thing on 970. However, I still think nvidia's up to sth with that 12_1 features, paradoxically we may even see AMD falling behind even more than in DX11 gameworks titles. Not that I'm complaining about this particular one, I use a lot of GW settings in my games, I'd be great to see HW take a smaller performance hit on dx12 games.


Nvidia's features wont matter here. It is the consoles are the ones that set the Gaming standard.

With Microsoft being the underdog in console, Microsoft will try to do everything to maximize directX12 fully compatibility with GCN so Xbox ONE will be at its peak performance. Wasnt there the were news claims about DirectX12 improve xbox One by significant amount?

This Microsoft doing AMD's bidding here. Screw whatever Nvidia 81% share has, it wont matter, Microsoft did not owe Nvidia. It is the Xbox One performance that matter most to Microsoft now, not getting 81% PC gamers happy. 81% is insignificant to Microsoft compared to Xbox One user base.

I hope Nvidia's pascal has at least the same level of Asynchronous compute the GCN 1.0 has, if not Nvidia will be prepared to screwed by another generation. Accept it, DirectX12 is a GCN API, Nvidia have to adept now or fail


----------



## vloeibaarglas

Quote:


> Originally Posted by *Clocknut*
> 
> Nvidia's features wont matter here. It is the consoles are the ones that set the Gaming standard.
> 
> With Microsoft being the underdog in console, Microsoft will try to do everything to maximize directX12 fully compatibility with GCN so Xbox ONE will be at its peak performance. Wasnt there the were news claims about DirectX12 improve xbox One by significant amount?
> 
> This Microsoft doing AMD's bidding here. Screw whatever Nvidia 81% share has, it wont matter, Microsoft did not owe Nvidia. It is the Xbox One performance that matter most to Microsoft now, not getting 81% PC gamers happy. 81% is insignificant to Microsoft compared to Xbox One user base.
> 
> I hope Nvidia's pascal has at least the same level of Asynchronous compute the GCN 1.0 has, if not Nvidia will be prepared to screwed by another generation. Accept it, DirectX12 is a GCN API, Nvidia have to adept now or fail


Yup, we are "stuck" on GCN 1.0 for at least the next half a decade. Devs managed to make GTA 5 run on Xbox 360/PS3. End of generation Xbone and PS4 games are going to be amazing due to the optimization. Any optimization will at least partially carry over for GCN cards. Pascal development started probably 3-5 years ago. Let's see if Nvidia had the foresight to bake parallelism into their hardware design. Else, they are going to be like AMD on the CPU side, with nothing for show until they rush out their new architecture in Zen.


----------



## Xuper

a Guy from :

__
https://www.reddit.com/r/3j6kdu/ashes_of_the_singularitycatalyst_1571_vs_158_beta/%5B/URL

DX12



There is very little room for Optimization driver.Only DX11.


----------



## uplink

Test:



Me: [the benchmark settings - high]



So are we nvidia users good, or bad?


----------



## cowie

Quote:


> Originally Posted by *GorillaSceptre*
> 
> No, i haven't seen childish behavior from both of them.
> 
> What i have seen is Nvidia try bully them and say that their game isn't a good indication of DX12 performance, basically questioning their competence(something i actually bought at first). When in actuality, Oxide did everything they could to improve the performance for Nvidia. As the Oxide dev said in this thread " you'd draw the conclusion we were working closer with Nvidia rather than AMD"
> 
> Async is off on Nvidia hardware, as it gave worse performance with it on compared to what we see now. For reasons already addressed in depth.


making this drivel public is kids play, you know if they worked with nv more on this it would be a gameworks title.its great press for a "game"and a company on its payroll
this is far from the be all end all dx12 feature set or any indication of what is to come of dx12.
amd's admittance on not having all dx12's features in its hardware is just telling of maybe what nv has in store in "its" dx12 benchmark most probably to be seen soon.
but not like I will put much into that either.
but like I said I want to see screen shots of this dx feature all in its glory if enabled it does worse for a brand and looks the same without it that's a little different then what happens when you turn say gw's on and off


----------



## CasualCat

Quote:


> Originally Posted by *uplink*
> 
> Test:
> 
> 
> 
> 
> Me: [the benchmark settings - high]
> 
> 
> 
> 
> So are we nvidia users good, or bad?


I'd say for now fine. Could there be a scenario in which there is runaway performance on existing AMD hardware using asynchronous compute and existing Nvidia hardware can't keep up? Maybe (it appears potential is there), but we haven't seen it yet.


----------



## Forceman

Quote:


> Originally Posted by *CasualCat*
> 
> I'd say for now fine. Could there be a scenario in which there is runaway performance on existing AMD hardware using asynchronous compute? Maybe (it appears potential is there), but we haven't seen it yet.


And given the disparity in feature sets there's probably an equal potential for some Nvidia supported title to runaway also.

Which could lead to some dystopian future where everyone is sorted into gaming factions based on their hardware. Oh, wait, that's already happening. Never mind.


----------



## cowie

That brings up another point....why anyone would spend upward of 500usd+ for a single piece of hardware for these console ports and lets face it....they don't make games like they used to.
you can get an xbox or ps4 and just be done with it. the money is in consoles and that's where it is all going now.
but that's a rant that I am well over for the past decade


----------



## ku4eto

Quote:


> Originally Posted by *Mahigan*
> 
> LOL!
> 
> VLIW4, now that would be an interesting test.


Quote:


> Originally Posted by *Forceman*
> 
> Didn't seem to, usage stayed the same all the way through on the 960 also. Why the huge disparity in compute times with Maxwell and GCN? Maxwell isn't that much better at compute.
> 
> Anyone tried this bench on a 6970? Wonder what that looks like.
> 
> I'll attach it here in case anyone wants to try it.
> 
> AsyncCompute.zip 17k .zip file


I got a Power Color 6950 1 GB, if you show me from where i can download the benchmark, i can run it.


----------



## gamervivek

Quote:


> Originally Posted by *KenjiS*
> 
> I can think of one BIG one, Assasins Creed Unity.. A lot of its issues and problems were caused by the insane number of draw calls it made, overloading DX11 API... the insane number of NPCs and etc taxed DX to its limits.. DX12 would easily do this
> 
> Imagine Skyrim or another game with some massive epic chaotic battle, Imagine Helms Deep rendered realtime in a 3d engine or the insanity you could pull off with an updated Normandy Beach landing


Indeed, it'd be bloody awesome.









I was expecting async compute to be concurrent kernel execution repackaged for directx and HSA-lite to be the defining feature of consoles leading to death of discrete GPUs. And I thought that Mahigan was speaking nonsense considering his first post referred to similar number of ROPs on Fury and Hawaii and then he posted frontend benchmarks from techreport. Totally caught me off guard. It's looking quite promising.


----------



## Mahigan

So much anger. So much hate. Ruining what is probably the best thread on the web these days.

Cowie, do you work for NVIDIA? No. Let them handle their own PR.

We're set to receive more information from NVIDIA any day now. The privilege we've been granted in this thread is enormous.

We've been the talk of the internet for the past week. Every journalist comes here for their scoop. This is an amazing opportunity for us.

We need to keep our heads screwed on straight. That Oxide developer you attacked isn't a just a programmer. He's also busy working on a title and took the time to come and answer our questions. Did you see him doing that in any other forum?

AMD then chimed in, again, referring to what was said in our forum.

When NVIDIA respond, it will be based on what was said in our forum. We've seen an influx of new users registering here at overclock.net. We've seen the view count for this thread explode.

I, for one, am extremely grateful for the fact that we've been able to keep our collective cool. I look at all the comment sections about this topic, on other websites, and they're filled with petty bickering. It is quite sad to see really.

As for the asynchronous compute path in Ashes of the Singularity, we have Oxide telling us NVIDIA demanded they shut it down on their hardware. We have NVIDIA not disputing this fact. If you contact their PR, they won't dispute it.

Therefore that Oxide dev you attacked was being honest about that.

Keep your cool bro.


----------



## Klocek001

Still it's much too early to ditch all the advantages nvidia cards offer in DX11 for what AMD's GCN architecture offers in DX12. But boy will I be angry when it turns out async compute is subpar on 980Ti. Of all DX12 features that I was considering before switching to nvidia THAT was the most important one that I was looking forward to having on my card. I wanted the best version of that on a $700 GPU, not "it's there on paper" one.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> So much anger. So much hate. Ruining what is probably the best thread on the web these days.
> 
> Cowie, do you work for NVIDIA? No. Let them handle their own PR.
> 
> We're set to receive more information from NVIDIA any day now. The privilege we've been granted in this thread is enormous.
> 
> We've been the talk of the internet for the past week. Every journalist comes here for their scoop. This is an amazing opportunity for us.
> 
> We need to keep our heads screwed on straight. That Oxide developer you attacked isn't a just a programmer. He's also busy working on a title and took the time to come and answer our questions. Did you see him doing that in any other forum?
> 
> AMD then chimed in, again, referring to what was said in our forum.
> 
> When NVIDIA respond, it will be based on what was said in our forum. We've seen an influx of new users registering here at overclock.net. We've seen the view count for this thread explode.
> 
> I, for one, am extremely grateful for the fact that we've been able to keep our collective cool. I look at all the comment sections about this topic, on other websites, and they're filled with petty bickering. It is quite sad to see really.
> 
> As for the asynchronous compute path in Ashes of the Singularity, we have Oxide telling us NVIDIA demanded they shut it down on their hardware. We have NVIDIA not disputing this fact. If you contact their PR, they won't dispute it.
> 
> Therefore that Oxide dev you attacked was being honest about that.
> 
> Keep your cool bro.


True

One problem with such threads, though, is the fact that a lot of the information gets lost or confused. People won't read 2000 posts with various thoughts and facts and words just to be confused in the end. I think there should be some aggregation of posts with more info or at least some kind of mark - if someone wants to read the different technical angles or something


----------



## Mahigan

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And AMD cards lack several features in DirectX 12 feature sets that nVidia has. It'll be interesting to see how AMD performs when those features are turned on and how much of an advantage nVidia has for DirectX feature set 12_1.


This is true. But those feature sets are not performance oriented features. Therefore we won't see extra performance out of using Conservative Rasterizers or ROV. But it is entirely true that nVIDIAs current architectures support far more DX12 features than AMDs current GPUs.

Quote:


> Originally Posted by *cowie*
> 
> hey listen you work for them jokers?
> you seem to take so much offence to my comments.
> my cool is kept those clowns at oxcide are the lowest kind(nv intel amd ms sony all included)
> whats next???where does this go????
> and really stop you flame baiting me wake up and smell the coffee yo I don't run and cry to mods.
> I have been around to long to believe this farce
> did he tell you this project is so far in the red that he would probably do anything to right the ship?
> I might be brash but I am no dummy
> if you muxst know I am a x gamer and 3d benchmark addict I have been around a while and I have used all sorts of hardware
> like I said before show me screen shots on and off....if you can point out any difference besides framerate its 1 up for you ok?


I don't work for any of the parties involved or any 3rd parties which have business dealings with any of the parties involved. I am completely independent. I don't receive a dime from anyone involved here.

Many people have leveled that accusation towards me because of how hard I've pushed to get to the bottom of this issue in various forums. While it is entirely understandable that some people mis-trust a person who recently created various accounts in various forums, I've thoroughly explained my position on this on several instances. Do you want the story?

3 Years ago, I left Canada while working for Bell Canada in order to work on a project in Morocco (IT related). My function was to teach engineering practices (Fiber Optic related) to Moroccans working for their largest Telecom service provider. Over the course of this project I met a Woman. We exchanged information before I left and that was that. When I arrived in Canada, we talked over Skype and Facebook for a few months and I fell in love.
I left Canada 2 years ago, gave away all of my possessions (consumerism is a mental disorder believe you me), except the three PCs below (I had around 20 PCs and pieces to build 50 more but I gave them all away for free). The 3 PCs below, I stashed at a family members house, I packed a backpack and a suitcase worth of clothing and I headed to Morocco. Why would I do this? Love. I fell in love with a Woman. This Woman is now my wife. Of course in the process of this I had my Sony Xperia Z1 smartphone stolen (containing all my Bitcoin which was quite a loss). But since I had 2 way verification enabled on it, I lost access to my email as well. Locked out, until I return to Canada. So when Ashes of the Singularity numbers popped up, I was baffled by what I was seeing. I wanted to come and comment in various tech forums but without my passwords and without the ability to reset my passwords, I had to create new accounts. TL;DR I know.

Point being, I wanted to know what was happening. That's all I wanted and I saw that many other people wanted the same thing. The only problem was that information wasn't forthcoming. Any forum I visited had people attacking one another... not really productive. I started to discuss with what seemed to be the most knowledgeable members of those forums. I began reading up on all of the White Papers from both AMD and nVIDIA. I formulated a theory...

So I created an account here, by pure chance. I googled Ashes of the Singularity forum and came upon this thread. I posted my findings here at Overclock.net.

I don't work for anyone involved. I teach courses on IT on a per contract basis. I'm a teacher. I'm also a really curious person with a background in IT (several different IT fields actually).

I hope that, once and for all, this accusation will stop being leveled against me.

As for where this goes... towards the truth. To the bottom of the rabbit hole. This seems to be the whole point no? To know the truth? Not to resort to petty name calling, belittling, insults and yes even death threats (all a product of Consumerism and ones sense of self worth and ego being invested into a brand). I reject that line of thinking. Imagine if I taught my students based on that line of thought?


----------



## ku4eto

Quote:


> Originally Posted by *Xuper*
> 
> a Guy from :
> 
> __
> https://www.reddit.com/r/3j6kdu/ashes_of_the_singularitycatalyst_1571_vs_158_beta/%5B/URL
> 
> DX12
> 
> 
> There is very little room for Optimization driver.Only DX11.


Results are good, AMD had around 2 weeks to release beta drivers that are giving 5 extra FPS in Heavy and Medium batches ! From 17 to 22 FPS, this is around 30% increase just with a Beta driver update for alpha game. Although it seems like they got no possible room left for improvement in DX 12, where its only now up to the engine on how well it runs.
The nVidia new drivers are giving slight boost for DX 12 - ( 1 fps is... actually a marginal increase, that almost falls into the aprx error range), but 0 on DX11. So much for their imrpovement.
Also, those results from above are probably CPU bound, as 3820 doesnt seem to be doing really good compared to a 5820k or even 4790k.With them the DX 11 benefits will be far better, and we will be able to truly observe if there are above marginal (read it 0) fps improvement. An second in-depth benchmarking is needed, as the GPU may be less loaded, but making the same amount of work.

Quote:


> Originally Posted by *Mahigan*
> 
> This is true. But those feature sets are not performance oriented features. Therefore we won't see extra performance out of using Conservative Rasterizers or ROV. But it is entirely true that nVIDIAs current architectures support far more DX12 features than AMDs current GPUs.


Those DirectX12 support levels are eye-candy based, which are important mainly for the etnhusiasts, but us the low-mid-range peasants care more about the performance impact support levels, which AMD are supporting better. In few months i will be buying new GPU (read 2nd hand GPU), that is most probably going to be a 290x, 50% from economical reasons, 25% from being AMD fan and the rest due to this thread knowledge i gained









Quote:


> Originally Posted by *cowie*
> 
> hey listen you work for them jokers?
> you seem to take so much offence to my comments.
> my cool is kept those clowns at oxcide are the lowest kind(nv intel amd ms sony all included)
> whats next???where does this go????
> and really stop you flame baiting me wake up and smell the coffee yo I don't run and cry to mods.
> I have been around to long to believe this farce
> did he tell you this project is so far in the red that he would probably do anything to right the ship?
> I might be brash but I am no dummy
> if you muxst know I am a x gamer and 3d benchmark addict I have been around a while and I have used all sorts of hardware
> like I said before show me screen shots on and off....if you can point out any difference besides framerate its 1 up for you ok?


Man, you are not doing any good for this thread, it will be locked again by such comments like yours.

Please, could ALL of you stop trying to start a flame war ? This is not the sandbox to throw stuff into other people faces and screaming at them, this thread is. as Mahigan said, one of the most watched topics on the internet.


----------



## semitope

Quote:


> Originally Posted by *Olivon*
> 
> http://www.hardware.fr/news/14263/amd-radeon-r9-fury-x-gpu-fiji-details.html
> 
> Oxide is not the only studio doing DX12 (hopefully, it's just a mini studio) and we can expect DX12 functionalities that work better for nVidia too.
> This debate is just to make AMD fans wet their panties and crap on nVidia (I heard they kill child too) like always.


The only way these games will run better on nvidia will be if they are using 12.1 features. In those does nvidia even do it well? They seem to be only tier one in conservative rasterization. It might end up better to use other methods to accomplish the same thing and probably will be done that way since the majority of nvidia GPUs do not support 12.1.

in the end, the 12.1 features do not stand to have much impact. The places nvidia lacks are more likely to do so.

If nvidia messes around and pushes such forced changes their kepler users will rage harder


----------



## Devnant

Quote:


> Originally Posted by *ZWingerRyRy*
> 
> Well that was quick. From a Titan X owner on another forum with access to the benchmark.
> 
> Ashes - 1080P - High Settings - DX 12
> 
> 355.60 Drivers
> Normal - 72.2 FPS
> Medium - 59.5 FPS
> Heavy - 49.6 FPS
> 
> 355.82 Drivers
> Normal - 81.1 FPS
> Medium - 70.7 FPS
> Heavy - 59.6 FPS
> 
> To what Oxide posted before 355.82 or possibly even before 355.60 update driver.


I don't have access to the benchmark. Can anyone confirm these massive gains from 355.60 to 355.82? 10 FPS gain on heavy is pretty significant.


----------



## cowie

Quote:


> Originally Posted by *ku4eto*
> 
> Results are good, AMD had around 2 weeks to release beta drivers that are giving 5 extra FPS in Heavy and Medium batches ! From 17 to 22 FPS, this is around 30% increase just with a Beta driver update for alpha game. Although it seems like they got no possible room left for improvement in DX 12, where its only now up to the engine on how well it runs.
> The nVidia new drivers are giving slight boost for DX 12 - ( 1 fps is... actually a marginal increase, that almost falls into the aprx error range), but 0 on DX11. So much for their imrpovement.
> Also, those results from above are probably CPU bound, as 3820 doesnt seem to be doing really good compared to a 5820k or even 4790k.With them the DX 11 benefits will be far better, and we will be able to truly observe if there are above marginal (read it 0) fps improvement. An second in-depth benchmarking is needed, as the GPU may be less loaded, but making the same amount of work.
> Man, you are not doing any good for this thread, it will be locked again by such comments like yours.
> 
> Please, could ALL of you stop trying to start a flame war ? This is not the sandbox to throw stuff into other people faces and screaming at them, this thread is. as Mahigan said, one of the most watched topics on the internet.


pretty sure every thread would be locked if we are silenced of are opinions of any company you guys look for fame in forum threads?
I did not tell off any "non-company" user here
I would really like to tell that lady off from amd and all her lies and the clowns with the 3.5 970 where they at?
don't be sheep for these companys always question and don't believe too much of what they say.
even doctors told us to cigarette smoke was good for us


----------



## Klocek001

Quote:


> Originally Posted by *ku4eto*
> 
> but us the low-mid-range peasants care more about the performance impact support levels.


back to your turnips boy







Quote:


> Originally Posted by *cowie*
> 
> and really stop you flame baiting me wake up and smell the coffee yo I don't run and cry to mods.


that cracked me up


----------



## ku4eto

Quote:


> Originally Posted by *cowie*
> 
> pretty sure every thread would be locked if we are silenced of are opinions of any company you guys look for fame in forum threads?
> I did not tell off any "non-company" user here


No one is flame baiting you, its up to you if you get angered by information that is true, or one that doesn't come up with what you think/believe.

I am pretty sure that, game developers that are coming here to answer our questions and shed us in light with their knowledge.getting called liars and other bad names is not wanted here. Yes, there is nothing wrong with having yourself an opinion, but this is pure bashing with 0 logic and 0 gains, other than blind venting and desperate try to prove that you are right, when in reality there is no right or wrong (well nVidia is at fault for saying that they do support Async compute at 100%). The Oxide crew has chosen the best way for their game to run, its their own game, and they have been sponsored with codes from AMD and nVidia. As the Oxide Devs team responded to the nVidia false accusation for the MSAA "bug", the game was tested with Async computing disabled for nVidia cards, which actually benefitted them, as with Async computing, they were actually getting destroyed by AMD, since their cards are not made for this. All provided benchmarks (well not all), you can clearly see that MSAA is off, the one thing that nVidia complained about. You can clearly guess who is in the wrong here.


----------



## Forceman

Quote:


> Originally Posted by *ku4eto*
> 
> I got a Power Color 6950 1 GB, if you show me from where i can download the benchmark, i can run it.


It's attached to the post you quoted.
Just download and extract it, and then run the executable. It'll generate a perf.log with the data.


----------



## cowie

Quote:


> Originally Posted by *ku4eto*
> 
> No one is flame baiting you, its up to you if you get angered by information that is true, or one that doesn't come up with what you think/believe.
> 
> I am pretty sure that, game developers that are coming here to answer our questions and shed us in light with their knowledge.getting called liars and other bad names is not wanted here. Yes, there is nothing wrong with having yourself an opinion, but this is pure bashing with 0 logic and 0 gains, other than blind venting and desperate try to prove that you are right, when in reality there is no right or wrong (well nVidia is at fault for saying that they do support Async compute at 100%). The Oxide crew has chosen the best way for their game to run, its their own game, and they have been sponsored with codes from AMD and nVidia. As the Oxide Devs team responded to the nVidia false accusation for the MSAA "bug", the game was tested with Async computing disabled for nVidia cards, which actually benefitted them, as with Async computing, they were actually getting destroyed by AMD, since their cards are not made for this. All provided benchmarks (well not all), you can clearly see that MSAA is off, the one thing that nVidia complained about. You can clearly guess who is in the wrong here.


yeah maybe your right but it has gone full circle where the bandwagon takes all said as fact and not half truths.
they were both little babies about it from the start
but do notice a few things before I nicely go and let you guys be famous
amd has done business with them for a long time they are more in there pocket then the other guys if they had issues with amd software/hardware(I am sure they did)it would have never been said quite like this in public....obviously there may have been things wrong in nv drivers how it was handled since the latest driver show pretty good improvments.they were both throwing mud.
it was great press for both the dev and amd...the only good pres for both in a long time.


----------



## Mahigan

Quote:


> Originally Posted by *cowie*
> 
> yeah maybe your right but it has gone full circle where the bandwagon takes all said as fact and not half truths.
> they were both little babies about it from the start
> but do notice a few things before I nicely go and let you guys be famous
> amd has done business with them for a long time they are more in there pocket then the other guys if they had issues with amd software/hardware(I am sure they did)it would have never been said quite like this in public....obviously there may have been things wrong in nv drivers how it was handled since the latest driver show pretty good improvments.they were both throwing mud.
> it was great press for both the dev and amd...the only good pres for both in a long time.


In the end, I don't think things will be as one sided as they appear now. It just takes patience my friend. I'm sure nVIDIA has a different way of performing those Asynchronous Compute commands. They may even perform them in serial. We will see exactly where this leads but for now it's best to wait and see exactly what nVIDIA respond with.

ExtremeTech stated that nVIDIA has been in contact with them and other hardware websites. Let's see what comes out of that.


----------



## ku4eto

Quote:


> Originally Posted by *Forceman*
> 
> It's attached to the post you quoted.
> Just download and extract it, and then run the executable. It'll generate a perf.log with the data.


Oh my bad, i will do the benchmark tomorrow though.


----------



## icenks

OMG, it took me a long time to read the thread, and i even had to skip a few posts.

Cowie, why are you even so worked up about it? When you have to cherry pick evidence to talk about and start calling names, your points would become less and less persuasive. I presume that your objective is posting here is just the opposite of making yourself less persuasive, right? Just helping you out here.

I'm an computer engineer and currently working in the field. As for the actual topic, parallel processing is really not new at all.
Let's rewind 10 years and review what happened for CPU. That's when CPU gain extra performance by tuning up the frequency. it was sustainable becuase the power density is way too high. so they decided to increase performance with multicores and parallel processing. This poses a challenge because software liek C and C++ don't have an easy way to invoke parallel processing. doing it in compilation level is difficult too as the dependency of variables is difficult to predict especially when there's branching happening.
People are still researching betters ways of implementing parallel processing for CPU. As a result, CPU still focus more on single thread performance.

For GPU, it's another story. GPU itself has very simple ALU and compute units, but has many of them. it is designed to perform parallel processing. Traditionally, the key to GPU performance is the overall bandwidth, not latency. the reason is because it's a stream of data going into the CU. if there's enough bandwidth that you can finish processing the current frame before the next frame comes in, you get better FPS. Yes, people do talk about latency, but to me, that's just a byproduct on different bandwidth.
You can actually run a simple program that calculate input numbers using openCL or CUDA so you run them in the GPU. They would still work, but would be really slow. compare to the same program running in CPU. That showcase the simple/slow CU in GPU. that only reason why GPUs are so powerful is because of parallel processing.

Now both nvidia and AMD and all otherGPU parallel process, the async compute is just how well the parallelism is implemented.
Optimization is everything in terms of games and graphics. Games need to optimize for GPUs in order to have an acceptable/player experience.
Async compute in AMD is just way better optimized parallelism implemented in hardware.
Nvidia doesn't even have hardware that does it and most likely use content switching.from data A to data B (dump all original data in memory in GPU, and load new data to the GPU memory to processing another batch of data) after processing B, then load the data A back in and continue processing it. that can take a few hundred ms and its' really slow.

I know it's a long post, I apologize for that, but I really wanted to chime in here.
Async compute is the way to go, and game developer knows it.
unless Nvidia "force" them to boycott async computer or force them to degrade AMD graphic or show any anti competitive behavior, I think most games would opt in for async compute and boost their performance 30% in even the older GCN cards.

Nvidia is probably very nervous about this and there's really nothing they can say (hence silent).


----------



## Mahigan

I like this...

AMD is taking a step back, from their comments yesterday, and admitting that their GCN architecture isn't fully DX12 compliant: http://www.guru3d.com/news-story/amd-there-is-no-such-thing-as-full-support-for-dx12-today.html
Quote:


> There have been many attempts to distract people from this truth through campaigns that deliberately conflate feature levels, individual untiered features and the definition of "support." This has been confusing, and caused so much unnecessary heartache and rumor-mongering.
> 
> Here is the unvarnished truth: Every graphics architecture has unique features, and no one architecture has them all. Some of those unique features are more powerful than others.
> 
> Yes, we're extremely pleased that people are finally beginning to see the game of chess we've been playing with the interrelationship of GCN, Mantle, DX12, Vulkan and LiquidVR.


A far better reaction from AMD than I've seen as of late. In fact both AMD and nVIDIA appear to be unwilling to discuss the specifics of every DX12 feature. At first I thought this was because of the NDA on DX12, but DX12 is released now... so you'd think they'd spill the entire beans and let us know what each feature does and what lacking it will entail.

I also found this post by AMD's Robert Hallock interesting:


He's pretty much admitting to what we have all established here in this thread, quite a few pages back, with regards to Mantle, Vulcan, DX12 and LiquidVR. It was all a game of chess. Mantle was released to provoke MS into releasing DX12 onto the PC platform. I think that this is clear by now. MS claims DX12 was in the works since 2011, I think we know why that is... because the XBox One started development in 2011. I don't think MS wanted to release DX12 onto the PC but that's my opinion. IMO, Mantle provoked the release. Like a game of Chess.


----------



## icenks

It's interesting and very admirable that AMD would admit the weakness as well. I like that far more than NV bluffing for features that the chip doesn't support.
Clear and truthful advertising is one of the most important that I value from a company.
if you can't trust what the company say, how do you know you are getting what you are paying for?

anyways, really happy about this.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> I like this...
> 
> AMD is taking a step back, from their comments yesterday, and admitting that their GCN architecture isn't fully DX12 compliant: http://www.guru3d.com/news-story/amd-there-is-no-such-thing-as-full-support-for-dx12-today.html
> A far better reaction from AMD than I've seen as of late. In fact both AMD and nVIDIA appear to be unwilling to discuss the specifics of every DX12 feature. At first I thought this was because of the NDA on DX12, but DX12 is released now... so you'd think they'd spill the entire beans and let us know what each feature does and what lacking it will entail.
> 
> I also found this post by AMD's Robert Hallock interesting:
> 
> 
> He's pretty much admitting to what we have all established here in this thread, quite a few pages back, with regards to Mantle, Vulcan, DX12 and LiquidVR. It was all a game of chess. Mantle was released to provoke MS into releasing DX12 onto the PC platform. I think that this is clear by now. MS claims DX12 was in the works since 2011, I think we know why that is... because the XBox One started development in 2011. I don't think MS wanted to release DX12 onto the PC but that's my opinion. IMO, Mantle provoked the release. Like a game of Chess.


I just want to say that it's annoying that we had suboptimal GPU performance for so many years with no real info around it. Yes, people knew that the API overhead could be huge, but gamers/customers usually didn't know that. Moreover, if you already knew why you are doing GCN/ACEs and that it could be very useful, why no told people?

And really, that's not really a chess game with such an obvious move.. a "new" way that allows squeeze a lot more from your GPU. Why every necessary improvement is marketed as something special? with all due respect..


----------



## MonarchX

How did the very latest AMD 15.8 drivers improve benchmark scores?


----------



## Ganf

Quote:


> Originally Posted by *Mahigan*
> 
> I like this...
> 
> AMD is taking a step back, from their comments yesterday, and admitting that their GCN architecture isn't fully DX12 compliant: http://www.guru3d.com/news-story/amd-there-is-no-such-thing-as-full-support-for-dx12-today.html
> A far better reaction from AMD than I've seen as of late. In fact both AMD and nVIDIA appear to be unwilling to discuss the specifics of every DX12 feature. At first I thought this was because of the NDA on DX12, but DX12 is released now... so you'd think they'd spill the entire beans and let us know what each feature does and what lacking it will entail.


This is the same thing that both AMD and Nvidia had to say when DX11 came out, and DX10. They have to repeat themselves every time a new major API drops.









This has been the far more subtle side of trying to divide the market and conquer, where Nvidia has so far been winning by choosing the features that developers are more likely to use. AMD may have the advantage this round since anyone wanting to port a console game to PC no longer has to add features for their cards. Instead developers are going to be expected to add the extra rasterization features that Nvidia now supports over AMD in DX12.


----------



## uplink

Hi there *Anna Torrent.* I'd say there are three likely scenarios.

First, they had no idea whether it'll work, or not, it just came out it will.

Second, they didn't want to brag much about async compute power, because maybe, just maybe competition didn't notice what they're doing.

Third, Nvidia and AMD belongs to the same guy, and is only playing Game of Thrones with us - customers. Which I find to be the most likely scenario. I mean the guy, who we, nor media can't see.


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> I just want to say that it's annoying that we had suboptimal GPU performance for so many years with no real info around it. Yes, people knew that the API overhead could be huge, but gamers/customers usually didn't know that. Moreover, if you already knew why you are doing GCN/ACEs and that it could be very useful, why no told people?
> 
> And really, that's not really a chess game with such an obvious move.. a "new" way that allows squeeze a lot more from your GPU. Why every necessary improvement is marketed as something special? with all due respect..


I couldn't agree more. DX11 has felt like we were sitting on enormous compute potential but stuck playing Pacman on the Atari. DX11 lasted far too long and we have Microsoft to blame for that. They pretty much abandoned the PC market, for a time, and focused on the XBox. While I thoroughly enjoyed Battlefield 4, probably my favorite title in a LONG time, I felt it was limited somewhere... now I've come to learn why.

All that being said, I'm very excited as to where PC Gaming is headed. I just hope that the big GPU vendors set aside their questionable ethical practices and focus on their hardware. I'd like to see everything done on Compute and the archaic Graphics pipeline pushed out of existence.


----------



## dogen1

Quote:


> Originally Posted by *ku4eto*
> 
> Oh my bad, i will do the benchmark tomorrow though.


You can't, it requires DX12, the 6950 doesn't support it.


----------



## ku4eto

Quote:


> Originally Posted by *dogen1*
> 
> You can't, it requires DX12, the 6950 doesn't support it.


Well, there will be nothing wrong in checking older generation GPU + CPU on this.


----------



## dogen1

Quote:


> Originally Posted by *ku4eto*
> 
> Well, there will be nothing wrong in checking older generation GPU + CPU on this.


It won't even run...


----------



## semitope

Quote:


> Originally Posted by *cowie*
> 
> making this drivel public is kids play, you know if they worked with nv more on this it would be a gameworks title.its great press for a "game"and a company on its payroll
> this is far from the be all end all dx12 feature set or any indication of what is to come of dx12.
> amd's admittance on not having all dx12's features in its hardware is just telling of maybe what nv has in store in "its" dx12 benchmark most probably to be seen soon.
> but not like I will put much into that either.
> but like I said I want to see screen shots of this dx feature all in its glory if enabled it does worse for a brand and looks the same without it that's a little different then what happens when you turn say gw's on and off


you aren't really understanding what this is. Its not like gamework features. In fact, async could be used to make gameworks run better. Its a neutrally beneficial feature

If nvidia does not support it they have to run the effects the normal way. That would result in more latency than AMD and thus lower framerates.
Quote:


> Originally Posted by *Klocek001*
> 
> Still it's much too early to ditch all the advantages nvidia cards offer in DX11 for what AMD's GCN architecture offers in DX12. But boy will I be angry when it turns out async compute is subpar on 980Ti. Of all DX12 features that I was considering before switching to nvidia THAT was the most important one that I was looking forward to having on my card. I wanted the best version of that on a $700 GPU, not "it's there on paper" one.


such as? I get the impression that the dx11 performance for AMD in this game has people leaning towards thinking they were terrible at dx11. Lets not forget the benchmarks do not say this for most games. The only other one was Project Cars which might have been that way because of their physics.

Most times the AMD GPUs are also competitive in dx11 trading blows. More recently typically at higher resolutions.


----------



## ku4eto

Quote:


> Originally Posted by *dogen1*
> 
> It won't even run...


DX12 no , but DX11 will run i guess.


----------



## dogen1

Quote:


> Originally Posted by *ku4eto*
> 
> DX12 no , but DX11 will run i guess.


It's a DX12 test.

Wait, are you talking about the game? I thought you were talking about the test from beyond3d.


----------



## Klocek001

Quote:


> Originally Posted by *semitope*
> 
> such as?


such as aftermarket 980ti crushing Fury X at 1440p.


----------



## chaosblade02

I'm expecting the R9 390 will perform very well in DX12 games. I don't plan on running 4k or anything, but I'm definitely going to upgrade beyond a 1080p monitor soon. I'm not looking to spend $500 on a monitor though.


----------



## ku4eto

Quote:


> Originally Posted by *dogen1*
> 
> It's a DX12 test.
> 
> Wait, are you talking about the game? I thought you were talking about the test from beyond3d.


I thought you were talking about the AotS benchmark







Quote:


> Originally Posted by *Klocek001*
> 
> such as aftermarket 980ti crushing Fury X at 1440p.


And the 290x getting really close results to the 980, if not even. Fury X is made for 4k, it sucks for peasant resolutions.


----------



## semitope

Quote:


> Originally Posted by *Klocek001*
> 
> such as aftermarket 980ti crushing Fury X at 1440p.


"all the advantages" sounded more than that. Meh


----------



## cowie

Quote:


> Originally Posted by *Mahigan*
> 
> In the end, I don't think things will be as one sided as they appear now. It just takes patience my friend. I'm sure nVIDIA has a different way of performing those Asynchronous Compute commands. They may even perform them in serial. We will see exactly where this leads but for now it's best to wait and see exactly what nVIDIA respond with.
> 
> ExtremeTech stated that nVIDIA has been in contact with them and other hardware websites. Let's see what comes out of that.


well I am sure to stop the bleeding of coarse
ok I am just going to sit back and watch. you guys know the fun will start when Implicit Multiadapter and Explicit Multiadapter support comes omg it going to be such a dumb fan war








have a great day all


----------



## CasualCat

Quote:


> Originally Posted by *Mahigan*
> 
> I'd like to see everything done on Compute and the archaic Graphics pipeline pushed out of existence.


why?


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> I couldn't agree more. DX11 has felt like we were sitting on enormous compute potential but stuck playing Pacman on the Atari. DX11 lasted far too long and we have Microsoft to blame for that. They pretty much abandoned the PC market, for a time, and focused on the XBox. While I thoroughly enjoyed Battlefield 4, probably my favorite title in a LONG time, I felt it was limited somewhere... now I've come to learn why.
> 
> All that being said, I'm very excited as to where PC Gaming is headed. I just hope that the big GPU vendors set aside their questionable ethical practices and focus on their hardware. I'd like to see everything done on Compute and the archaic Graphics pipeline pushed out of existence.


For that.....we are still in the era where major titles are still built around consoles. Improvements from it still going to ends up being evolutionary than revolutionary. For the first year or two, it seems like async computation is going to be limited for the appliacation of post processing. Its affect wont be as disruptive as some people may think initially, since fully async compute pipeline is years away.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> This is true. But those feature sets are not performance oriented features. Therefore we won't see extra performance out of using Conservative Rasterizers or ROV. But it is entirely true that nVIDIAs current architectures support far more DX12 features than AMDs current GPUs.


The heck they aren't performance oriented features. Conservative Rasterization Tier 1 and Raster Order of Views is very much about performance.


----------



## Ganf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> The heck they aren't performance oriented features. Conservative Rasterization Tier 1 and Raster Order of Views is very much about performance.


This is true. The enhanced rasterization greatly reduces or eliminates the need for AA. It'll be a big performance boost for Nvidia users, but from what I've read it sounds like it takes a lot more work to implement than AA.


----------



## Themisseble

What about low end budget.

I can see that R7 260X ha smore COMPUTE power than R7 370 and 7850 or R7 265, but still is more than 20% slower ...
Might be good fight GTX 750Ti vs R7 260X....

TressFX is compute right? So it could run under async shaders?


----------



## ku4eto

Quote:


> Originally Posted by *47 Knucklehead*
> 
> The heck they aren't performance oriented features. Conservative Rasterization Tier 1 and Raster Order of Views is very much about performance.


raster order of view doesn't really sound performance factor...


----------



## Mahigan

Quote:


> Originally Posted by *cowie*
> 
> well I am sure to stop the bleeding of coarse
> ok I am just going to sit back and watch. you guys know the fun will start when Implicit Multiadapter and Explicit Multiadapter support comes omg it going to be such a dumb fan war
> 
> 
> 
> 
> 
> 
> 
> 
> have a great day all


I am excited about Multi-adapter. Being able to utilized split frame rendering without redundant textures filling both memory pools will finally enable us to double our frame buffer by adding two GPUs, triple by adding three and quadruple when adding four.









Have a great day yourself


----------



## garwynn

More food for thought, I'm posting this in Reddit so might as well here too.

What is a CUDA kernel? (This is used in parallel processing)
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#kernels

Observations regarding kernel overhead on CUDA. It's older and I'm curious what the overhead is now.
http://www.cs.virginia.edu/~mwb7w/cuda_support/kernel_overhead.html


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> I like this...
> 
> AMD is taking a step back, from their comments yesterday, and admitting that their GCN architecture isn't fully DX12 compliant: http://www.guru3d.com/news-story/amd-there-is-no-such-thing-as-full-support-for-dx12-today.html
> A far better reaction from AMD than I've seen as of late. In fact both AMD and nVIDIA appear to be unwilling to discuss the specifics of every DX12 feature. At first I thought this was because of the NDA on DX12, but DX12 is released now... so you'd think they'd spill the entire beans and let us know what each feature does and what lacking it will entail.


Makes you wonder if AMD presenting a more nuanced picture of their DX12 support has anything to do with Ark getting ready to drop. Being a Gameworks title it may show AMD in a less flattering light than AotS.


----------



## Mahigan

Quote:


> Originally Posted by *CasualCat*
> 
> why?


For one, memory bandwidth savings. Secondly, more effects running simultaneously and lastly better performance. I mean you can already, in theory, perform a triangle setup using compute resources alone. You also won't need AA and Tessellation in order to remove jaggies. Those features would disappear as would jaggies.

It would boost what can be done on the performance side while enabling higher levels of image quality.

We're not there yet, of course, but we're headed in that direction.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Makes you wonder if AMD presenting a more nuanced picture of their DX12 support has anything to do with Ark getting ready to drop. *Being a Gameworks title it may show AMD in a less flattering light than AotS*.


I'm in agreement with that statement. I do believe that nVIDIAs biggest card to play is GameWorks.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *ku4eto*
> 
> raster order of view doesn't really sound performance factor...


Maybe you should read up on it instead of just thinking about the name.

Bottom line, you don't have to blindly draw things that are beyond the view of the user from a particular view point. Thus saving you time and effort in drawing something and then drawing something over top it.

It very MUCH will improve performance.


----------



## Themisseble

@Mahigan
Will Dx12 use more VRAM just like MANTLE does in BF4?


----------



## CasualCat

Quote:


> Originally Posted by *Mahigan*
> 
> For one, memory bandwidth savings. Secondly, more effects running simultaneously and lastly better performance. I mean you can already, in theory, perform a triangle setup using compute resources alone. You also won't need AA and Tessellation in order to remove jaggies. Those features would disappear as would jaggies.
> 
> It would boost what can be done on the performance side while enabling higher levels of image quality.
> 
> We're not there yet, of course, but we're headed in that direction.


Can you elaborate on AA and Tessellation? How does generating triangles via compute eliminate jaggies or the need for AA? Is it that creating triangles via compute is more akin to 2D vector graphics as opposed to 2D raster graphics? Not sure I follow otherwise.

Also I've not familiar with tessellation in terms of removing jaggies. I think of it more in terms of displacement maps and the ability to add detail and depth.

I'm still not sure AA will go away until we get sufficiently high display pixel density. (Willing to be convinced otherwise though) Ideally we'd have single GPU cards powering through 4K easily and looking for the next higher resolution monitor to feed.


----------



## SlackerITGuy

Quote:


> Originally Posted by *Themisseble*
> 
> @Mahigan
> Will Dx12 use more VRAM just like MANTLE does in BF4?


Depends on the optimization made by the developer. If they have a use for using more VRAM, which is the case with BF4, then they will take more VRAM.

My guess is that these games developed with low level APIs will use more VRAM overall, which is great IMO.

Watch this video:


----------



## Forceman

Updated Fury benchmark from over at Beyond3d. Apparently the graph posted yesterday was really a 390X, not a Fury X. This new chart shows all three. Interesting that the Fury has so many spikes where it seems to be running serially.


----------



## xxdarkreap3rxx

Quote:


> Originally Posted by *MonarchX*
> 
> How did the very latest AMD 15.8 drivers improve benchmark scores?


Yes, I would like to know this as well. Does anyone have any benches?


----------



## Mahigan

Quote:


> Originally Posted by *CasualCat*
> 
> Can you elaborate on AA and Tessellation? How does generating triangles via compute eliminate jaggies or the need for AA? Is it that creating triangles via compute is more akin to 2D vector graphics as opposed to 2D raster graphics? Not sure I follow otherwise.
> 
> Also I've not familiar with tessellation in terms of removing jaggies. I think of it more in terms of displacement maps and the ability to add detail and depth.
> 
> I'm still not sure AA will go away until we get sufficiently high display pixel density. (Willing to be convinced otherwise though) Ideally we'd have single GPU cards powering through 4K easily and looking for the next higher resolution monitor to feed.


CAD rendering. Something like V-Ray but using GPU Compute resources and performed in real time. With HSA and the push for GPUs to act more like CPUs you will, in time, be able to render scenes which don't rely on a triangle setup. Right now you can perform a triangle setup using compute resources. In time you won't need too.


----------



## Forceman

Quote:


> Originally Posted by *xxdarkreap3rxx*
> 
> Yes, I would like to know this as well. Does anyone have any benches?


Someone posted some earlier. Same performance in DX12, but some improvement in DX11.


----------



## Xuper

Quote:


> Originally Posted by *xxdarkreap3rxx*
> 
> Yes, I would like to know this as well. Does anyone have any benches?


Quote:


> Originally Posted by *MonarchX*
> 
> How did the very latest AMD 15.8 drivers improve benchmark scores?


http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1600_40#post_24366266

Only DX11.looks like game engine is enough improved for AMD CGN.


----------



## Themisseble

Quote:


> Originally Posted by *Xuper*
> 
> http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1600_40#post_24366266
> 
> Only DX11.looks like game engine is enough improved for AMD CGN.


GTX 950 vs R7 260X


----------



## semitope

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Maybe you should read up on it instead of just thinking about the name.
> 
> Bottom line, you don't have to blindly draw things that are beyond the view of the user from a particular view point. Thus saving you time and effort in drawing something and then drawing something over top it.
> 
> It very MUCH will improve performance.


"Raster Ordered Views and Conservative Raster. Thankfully, the techniques that these enable (like global illumination) can already be done in other ways at high framerates (see: DiRT Showdown)."

Can someone look into this? https://software.intel.com/en-us/gamedev/articles/rasterizer-order-views-101-a-primer

here intel keeps mentioning order-independent transparency. Something AMD included in their 5000 series cards.

http://developer.amd.com/resources/documentation-articles/gpu-demos/ati-radeon-hd-5000-series-graphics-real-time-demos/

what the deal is?

edit: guess its to do with transparencies only. Just one aspect of what ROVs would help with.


----------



## provost

My take on Nvidia not responding is because it has nothing to gain from responding. There is no real threat of losing sales to AMD, given limited availability of Fury/Fury x, and the fact that Nvidia has already cashed in on majority of the sales for Maxwell. The response, in my opinion, will be in the form of a new series of cards, whether it would be some derivation of Maxwell arch or something else. And, it will kick off around the holiday season. I don't think Nvidia feels obligated that to do anything to the Dx12 capabilities of existing cards, as people bought those cards for the games of today, not tomorrow ( according to Nvidia). This is just my take on it anyway.....


----------



## Anna Torrent

Quote:


> Originally Posted by *uplink*
> 
> Hi there *Anna Torrent.* I'd say there are three likely scenarios.
> 
> First, they had no idea whether it'll work, or not, it just came out it will.
> 
> Second, they didn't want to brag much about async compute power, because maybe, just maybe competition didn't notice what they're doing.
> 
> Third, Nvidia and AMD belongs to the same guy, and is only playing Game of Thrones with us - customers. Which I find to be the most likely scenario. I mean the guy, who we, nor media can't see.


Hi there. I see Commodore developers have joined the discussion (-:

1. Really, think about it - it's nothing too smart. You have a GPU computation units you can't fully utilize and take advantage of because you are limited by 1 ACE and pipeline. You can't populate the work units fast enough. Allowing for async job preparations ahead of the old pipeline is quite an obvious move once you understand the inner workings (like they do)

Moreover, API overhead and DX limitations were quite hard to pass by. It's not like some kind of FPSoholics issue.

2. Just take a look at AMD/NV/others facebook page in the last year or whatever, then tell me how afraid they are about bragging

3. I do think that the question is really, not technological


----------



## dogen1

Updated async compute test
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354


----------



## Mahigan

Quote:


> Originally Posted by *dogen1*
> 
> Updated async compute test
> https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354


Any input on the results you've obtained? I realize you only have a GTX 680 (Kepler) but it would still be interesting to know what your results have been thus far.


----------



## icenks

NV has a reputation of being cheeky and making non-technical by insulting statements. If they can say something, they will.

I agree that NV would response with a new card. it depends on they currently achieve performance.
if they have already have something equivalent to async compute but didn't make the info public, then they are screwed, because when they adopt aasync compute, they would have to abandon whatever they had that does the same thing.

I'm really amazed with how marketing makes technology sound so amazing. It's nothing more than different ways of doing the same thing. New ways may be more efficient, faster and better, but in the end of the day, it's just a method. I used to develop flash memory technology, and all i'm working with is spacing and dimension between features. Process technology is essentially just material being used and how much and how hard to implant dopants to the silicon.

That being said, if you don't know the technology, it's hard to optimize for it. That's exactly what NV is doing with the proprietary "technology". All they have to do is to add some malicious codes to slow down performance that cannot be disabled unless you got a key. they key can be embedded in hardware, so all programmer using those code/lib will unwittingly favor one kind of hardware.

I do think that AMD released these cards at the wrong time. would be nice if they are released before shopping seasons.

All speculation aside, only time can tell if NV would go for async compute.


----------



## Mahigan

Quote:


> Originally Posted by *icenks*
> 
> NV has a reputation of being cheeky and making non-technical by insulting statements. If they can say something, they will.
> 
> I agree that NV would response with a new card. it depends on they currently achieve performance.
> if they have already have something equivalent to async compute but didn't make the info public, then they are screwed, because when they adopt aasync compute, they would have to abandon whatever they had that does the same thing.
> 
> I'm really amazed with how marketing makes technology sound so amazing. It's nothing more than different ways of doing the same thing. New ways may be more efficient, faster and better, but in the end of the day, it's just a method. I used to develop flash memory technology, and all i'm working with is spacing and dimension between features. Process technology is essentially just material being used and how much and how hard to implant dopants to the silicon.
> 
> That being said, if you don't know the technology, it's hard to optimize for it. That's exactly what NV is doing with the proprietary "technology". All they have to do is to add some malicious codes to slow down performance that cannot be disabled unless you got a key. they key can be embedded in hardware, so all programmer using those code/lib will unwittingly favor one kind of hardware.
> 
> I do think that AMD released these cards at the wrong time. would be nice if they are released before shopping seasons.
> 
> All speculation aside, only time can tell if NV would go for async compute.


@dogen1 just released his updated test program. We should know more about nVIDIAs Async Compute capabilities relatively soon.

Available here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> @dogen1 just released his updated test program. We should know more about nVIDIAs Async Compute capabilities relatively soon.
> 
> Available here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354


I don't know what it shows yet, but it takes a lot longer to run.

Someone also made a visualizer for the log files - the overlapping areas are supposed to show where it is benefiting from async.

http://nubleh.github.io/async/


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I don't know what it shows yet, but it takes a lot longer to run.
> 
> Someone also made a visualizer for the log files - the overlapping areas are supposed to show where it is benefiting from async.
> 
> http://nubleh.github.io/async/


The Nerd in me is totally digging all the work that is going into this mystery


----------



## dogen1

Quote:


> Originally Posted by *Mahigan*
> 
> @dogen1 just released his updated test program. We should know more about nVIDIAs Async Compute capabilities relatively soon.
> 
> Available here: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354


Whoa, not my program. I'm just posting the link.


----------



## Mahigan

Quote:


> Originally Posted by *dogen1*
> 
> Whoa, not my program. I'm just posting the link.


And here I was thinking you were the programmer involved this whole time







My bad.


----------



## garwynn

I've reached out to NVIDIA PR to try and do similar research on their side as I did on AMD side - and if allowed to share the results.
One thing I see - looks like it's not CUDA (I'll eat my crow on it) but instead part of GameWorks.
Unfortunately I don't see much technical detail in the Dev Zone so had to reach out to see if can get more.
They're running it up the flag pole and I'll update with any info I get and am able to share.


----------



## Shatun-Bear

Quote:


> Originally Posted by *provost*
> 
> My take on Nvidia not responding is because it has nothing to gain from responding. There is no real threat of losing sales to AMD, given limited availability of Fury/Fury x, and the fact that Nvidia has already cashed in on majority of the sales for Maxwell. *The response, in my opinion, will be in the form of a new series of cards, whether it would be some derivation of Maxwell arch or something else*. And, it will kick off around the holiday season. I don't think Nvidia feels obligated that to do anything to the Dx12 capabilities of existing cards, as people bought those cards for the games of today, not tomorrow ( according to Nvidia). This is just my take on it anyway.....


They might want to respond _now_ though, as this will snowball to effect their Pascal cards next year as it is believed Pascal is basically Maxwell 2 with HBM(2). If that is true, those cards will be found wanting in some DX12 game scenarios too. Volta the year after that will be a completely new architecture where they might want to match GCN's asynchronous compute architectural advantage.


----------



## Mahigan

Quote:


> Originally Posted by *garwynn*
> 
> I've reached out to NVIDIA PR to try and do similar research on their side as I did on AMD side - and if allowed to share the results.
> One thing I see - looks like it's not CUDA (I'll eat my crow on it) but instead part of GameWorks.
> Unfortunately I don't see much technical detail in the Dev Zone so had to reach out to see if can get more.
> They're running it up the flag pole and I'll update with any info I get and am able to share.


Thank You














Hopefully we know more soon.
Quote:


> FirePro W8100
> Compute only:1. *35.47ms* ~ *512. 503.18ms*
> Graphics only: *34.07ms* (49.25G pixels/s)
> Graphics + compute: 1. *35.33ms* (47.49G pixels/s) ~ *512. 505.24ms* (3.32G pixels/s)


https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-13

And this confirms that GCN is quite powerful at Async Compute.

At least we know AMD is being honest, up to this point, about their hardware's capabilities as it pertains to Async Compute.

We'll have a wait a bit longer for the nVIDIA results because the nVIDIA systems appear to be crashing.


----------



## Xuper

Look at chart : http://nubleh.github.io/async/

If I'm not mistaken, Async compute has no effect on all Nv cards compare to AMD cards?

Edit : I mean Different between "Total" & "Async" is huge for AMD cards.near (0 to max 10) for All Nv cards.

Edit 2 : Result on Fury X is strange.Only card with almost 100% effect on Async in some "kernel run".Why is different between 390x and Fury X?


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> Look at chart : http://nubleh.github.io/async/
> 
> If I'm not mistaken, Async compute has no effect on all Nv cards compare to AMD cards?
> 
> Edit : I mean Different between "Total" & "Async" is huge for AMD cards.near zero for Nv.


What the results show is that NV cards don't do both Compute and Graphics tasks in parallel. They do them in serial. Maxwell/2 is great at compute but if you push the loads... you end up bogging performance. Yes... the Total, for AMD, is always higher than the Async. Indicating that the tasks are being done in Parallel when done Asynchronously.

With the new version (NV results not in yet) we can see that the FirePro continues the trend on the GCN side.

So far, it looks like there is no change from the last test when it comes to GCN. Either under low loads or heavy loads...

Worth nothing: There could be something the programmers are missing but this is two tests over at Beyond3D as well as what the Oxide developer stated to us.


----------



## dogen1

Quote:


> Originally Posted by *Mahigan*
> 
> So far, it looks like there is no change. Either under low loads or heavy loads...
> 
> There could be something the programmers are missing but this is two tests over at Beyond3D as well as what the Oxide developer stated to us.


Those charts are the old version btw.


----------



## Xuper

Quote:


> Originally Posted by *Mahigan*
> 
> So far, it looks like there is no change. Either under low loads or heavy loads...
> 
> There could be something the programmers are missing but this is two tests over at Beyond3D as well as what the Oxide developer stated to us.


I edited my Post: here

Result on Fury X is strange.Only card with almost 100% effect on Async in some "kernel run".Why is different between 390x and Fury X?


----------



## Themisseble

So first game with async compute on PC will be Deus Ex man Mankind Divided??
Or we will see in in Star wars Battlefront? Ark Survival?


----------



## infranoia

The async pipeline efficiency on that 8970M is insane.

A DX11 engine versus an async shader-heavy DX12 engine would make a huge difference on that part, if all else was equal. <-- but there's the rub for a 2-year-old mobile part


----------



## dogen1

Quote:


> Originally Posted by *Themisseble*
> 
> So first game with async compute on PC will be Deus Ex man Mankind Divided??
> Or we will see in in Star wars Battlefront? Ark Survival?


Well, battlefield 4 used it on ps4(probably only a little though), so I'd say there's a good chance it'll be on PC if there's a DX12 or mantle version.


----------



## Themisseble

Quote:


> Originally Posted by *infranoia*
> 
> The async pipeline efficiency on that 8970M is insane.
> 
> A DX11 engine versus an async shader-heavy DX12 engine would make a huge difference on that part, if all else was equal. <-- but there's the rub for a mobile part


That is desktop 7870
https://www.techpowerup.com/gpudb/1966/radeon-hd-8970m.html


----------



## Mahigan

Quote:


> Originally Posted by *dogen1*
> 
> Those charts are the old version btw.


Yep, I just realized he was talking about that link and not what I posted. That Link contains the old results.


----------



## Themisseble

Quote:


> Originally Posted by *dogen1*
> 
> Well, battlefield 4 used it on ps4(probably only a little though), so I'd say there's a good chance it'll be on PC if there's a DX12 or mantle version.


Yeah i hope so.. I really want to see it.
Also tressFX 3.0 should be compute task = async compute?
Also I am waiting for Rise of tomb Raider next year... I hope that it will get real boost from adync compute. But on XBOX ONE looks very nice.


----------



## Mahigan

Quote:


> Originally Posted by *Themisseble*
> 
> So first game with async compute on PC will be Deus Ex man Mankind Divided??
> Or we will see in in Star wars Battlefront? Ark Survival?


See here: http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1570#post_24365329

Mirror's Edge, I believe, will be the first.

I've linked a few big titles coming in Q1 2016 with the feature.

First Maxwell2 result is in:
Quote:


> 980TI, 355.82
> Compute only:1. *5.67m*s ~ 512. 76.11ms
> Graphics only: *16.77ms* (100.06G pixels/s)
> Graphics + compute: 1. *21.15ms* (79.34G pixels/s) ~ 512. 97.38ms (17.23G pixels/s)


Still no Async.


----------



## infranoia

Quote:


> Originally Posted by *Themisseble*
> 
> That is desktop 7870
> https://www.techpowerup.com/gpudb/1966/radeon-hd-8970m.html


Maybe we'll begin to see why AMD wanted to keep Pitcairn going into the DX12 era.


----------



## SlackerITGuy

Quote:


> Originally Posted by *Themisseble*
> 
> So first game with async compute on PC will be Deus Ex man Mankind Divided??
> Or we will see in in Star wars Battlefront? Ark Survival?


We still don't know.

DICE's Battlefront would be my guess, with Johan Andersson being the front man to all of this low level API madness, he will make sure his next game has all the bells and whistles of a low level API game.


----------



## vloeibaarglas

Dark Red = Async

White between Light Red and Blue = Worse than serial, possibly due to this "context switching" penalty that Oxide claimed Nvidia cards suffered from.

GCN 1.1/1.2 seems a lot more jumpy than GCN 1.0. Tahiti, Pitcairn showing some silky constant results.

Battlefront or Fable Legends will probably be the first DX12 Async game I'm guessing.


----------



## spacin9

Quote:


> Originally Posted by *Devnant*
> 
> I don't have access to the benchmark. Can anyone confirm these massive gains from 355.60 to 355.82? 10 FPS gain on heavy is pretty significant.


_"Originally Posted by ZWingerRyRy View Post

Well that was quick. From a Titan X owner on another forum with access to the benchmark.

Ashes - 1080P - High Settings - DX 12

355.60 Drivers
Normal - 72.2 FPS
Medium - 59.5 FPS
Heavy - 49.6 FPS

355.82 Drivers
Normal - 81.1 FPS
Medium - 70.7 FPS
Heavy - 59.6 FPS

To what Oxide posted before 355.82 or possibly even before 355.60 update driver."_

**edit for revised numbers* Nope...no tangible increase. The benchmark fluctuates a few FPS from run to run but no real increase that I see except the CPU score jump.*

355.60

76.8- normal
63.7-medium
53.6-heavy

355.82

80.6-normal
65.2-medium
54.6-heavy

*110.0 CPU 355.60
124.3 CPU 355.82*

I don't even know if I can call it a "gain". I don't really know CPU score's full implication yet.

**edit* I just re-ran the benchmark in 355.60 and my numbers shot up with no adjustments at all. So there's really no increase that I see.*

80.3-normal
64.9-medum
54.9-heavy

CPU 114.1


----------



## Mahigan

Quote:


> Originally Posted by *vloeibaarglas*
> 
> Dark Red = Async
> 
> White between Light Red and Blue = Worse than serial, possibly due to this "context switching" penalty that Oxide claimed Nvidia cards suffered from.
> 
> GCN 1.1/1.2 seems a lot more jumpy than GCN 1.0. Tahiti, Pitcairn showing some silky constant results.
> 
> Battlefront or Fable Legends will probably be the first DX12 Async game I'm guessing.


I'm not sure if Fable Legends will make heavy use of Async or not.


----------



## Paul17041993

could anyone locate me the program/function L1 cache size of maxwell 2 per cluster? I'd like to do some paperwork on this.


----------



## Anna Torrent

BTW, what does it all say about monitors like HWInfo regarding GPU load? if you had 100% gpu load it wasn't really correct all this time?


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> BTW, what does it all say about monitors like HWInfo regarding GPU load? if you had 100% gpu load it wasn't really correct all this time?


I think they only show that the GPU is busy doing something, not that it is doing it efficiently for its architecture. It takes 64 tasks to form a single Waveform on GCN. Each CU is most efficient when processing 1 waveform. If you're feeding your CUs 32 tasks per cycle then it will still show up as being 100% under use but it won't be doing all the work it can do.


----------



## Xuper

More Fire !!

Quote by AMD_Robert:


> Maxwell cards are now also crashing out of the benchmark as they spend >3000ms trying to compute one of the workloads.
> 
> The author is not interpreting the results correctly.
> 
> Look at the height of the graphics bars.
> 
> Look at the height of the compute bars.
> 
> Notice how NVIDIA's async results are the height of those bars combined? This means the workloads are running serially, otherwise compute wouldn't have to wait on graphics and the bars would not be additive.
> 
> Compare that to the GCN results. Compute and graphics together, async shading bars are no higher than any other workload, demonstrating that frame latencies are not affected when the workloads are running together.
> 
> //EDIT: Asynchronous shading isn't simply whether or not a workload can contain compute and graphics. It's whether or not that workload overlay graphics and compute, processing them both simultaneously without the pipeline latency getting any longer than the longest job. This is what GCN shows, but Maxwell does not.
> 
> //15:45 Central Edit: This benchmark has now been updated. GPU utilization of Maxwell-based graphics cards is now dropping to 0% under async compute workloads. As the workloads get more aggressive, the application ultimately crashes as the architecture cannot complete the workload before Windows terminates the thread (>3000ms hang).




__
https://www.reddit.com/r/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/


----------



## Mahigan

Aside from the Async stuff...

*Here's what I think they did at Beyond3D*:

They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
They've bumped the Kernel count to up to 512 (16,384 Threads total).
They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2
*Here's why that's not how you code for GCN*






*Why?*:

Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
That's 2,560 Threads total PER CU.
An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.
If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential. In slide number 4, it stipulates that latency is hidden by executing overlapping wavefronts. This is why GCN appears to have a high degree of latency but you can execute a ton of work on GCN without affected the latency. With Maxwell/2, latency rises up like a staircase with the more work you throw at it. I'm not sure if the folks at Beyond3D are aware of this or not.

*Conclusion*:

I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.

GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.

What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.

This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.


----------



## Forceman

Quote:


> Originally Posted by *Xuper*
> 
> More Fire !!
> 
> 
> __
> https://www.reddit.com/r/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/%5B/URL
> 
> Aside from the Async stuff...
> 
> *Here's what I think they did at Beyond3D*:
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> They set the amount of threads, per kernel, to 32 (they're CUDA programmers after-all).
> They've bumped the Kernel count to up to 512 (16,384 Threads total).
> They're scratching their heads wondering why the results don't make sense when comparing GCN to Maxwell 2
> *Here's why that's not how you code for GCN*
> 
> 
> 
> 
> *Why?*:
> 
> Each CU can have 40 Kernels in flight (each made up of 64 threads to form a single Wavefront).
> That's 2,560 Threads total PER CU.
> An R9 290x has 44 CUs or the capacity to handle 112,640 Threads total.
> If you load up GCN with Kernels made up of 32 Threads you're wasting resources. If you're not pushing GCN you're wasting compute potential.
> 
> *Conclusion*:
> 
> I think they geared this test towards nVIDIAs CUDA architectures and are wondering why their results don't make sense on GCN. If true... DERP! That's why I said the single Latency results don't matter. This test is only good if you're checking on Async functionality.
> 
> GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.


Exactly. I'm not convinced this little test is actually testing what they think it is, or that it is returning meaningful results.


----------



## Paul17041993

and all this is why my engine I'm designing will make use of hardware profiles, so ideal low-level settings are automatically default for the particular hardware in use.


----------



## gamervivek

Quote:


> Originally Posted by *Themisseble*
> 
> That is desktop 7870
> https://www.techpowerup.com/gpudb/1966/radeon-hd-8970m.html


Not this again. It's a GCN1.1 or GCN2 part, not pitcarin.

http://www.overclock.net/t/1571391/tpu-amd-also-quietly-launches-the-radeon-r9-370x-sapphire-gives-it-vapor-x-treatment/20#post_24349701


----------



## spacin9

And a few things of note in my ongoing assessment:

DX 11 in a Win 10 environment, all 12 threads of my hex core CPU are being stressed and will go as high as 80-100 %. So I'm seeing a higher CPU usage in DX 11. Sits around 50% most of the time in DX 12. I do not see a real gain in DX 11 in Win 10. It's seems to be a bit worse if anything.

DX 11 in a Win 7 environment, 4 threads are not used at all... it seems it just uses 7-8 threads of my hex core cpu. I don't know if that's expected or not.. just thought it was interesting that there seems to be evidence of the promise of DX 12.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I'm not sure even the guy who wrote it knows what it is doing, so I'm not sure it's wise for AMD to be piping up quite yet. Maybe instead of calling Nvidia out over the TDR they should explain why their cards take 4 to 5 times as long to complete the compute only portion.
> Exactly. I'm not convinced this little test is actually testing what they think it is, or that it is returning meaningful results.


You're absolutely right. But it is fun watching them scratch their heads. I don't feel like creating an account there. I think the Async results are all that I find is meaningful from their tests. Maybe they'll figure out what kind of coding is required to enable Async on Maxwell 2, if it can perform the task. That's what I'm looking for.


----------



## epic1337

this i wonder, will Nvidia hurry their release to their successor card?
they're in a pitfall with maxwell, a dead-end you might say, since theres no clear route for "refining drivers" to make it perform better in the future.
well of course they're still great cards for DX11, just not for future games that would be using Vulkan or DX12 as it's primary API.

they've been pretty laid-back with their releases, look at how long it took them to release GTX960 and GTX950.


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> I'm not sure if Fable Legends will make heavy use of Async or not.


Hopefully does. They were one of the devs praising it.


----------



## Clocknut

Quote:


> Originally Posted by *epic1337*
> 
> this i wonder, will Nvidia hurry their release to their successor card?
> they're in a pitfall with maxwell, a dead-end you might say, since theres no clear route for "refining drivers" to make it perform better in the future.
> well of course they're still great cards for DX11, just not for future games that would be using Vulkan or DX12 as it's primary API.
> 
> they've been pretty laid-back with their releases, look at how long it took them to release GTX960 and GTX950.


It will only depends on 2 things,

1. The speed of DirectX12 adoption
2. The time it takes for Pascal(thats assuming it has at least GCN1.0 Asynchronous computer capability)

*on points no 2, I am not even sure if Pascal have that parallelism feature, if Pascal was already well passed the design phase & still didnt have GCN 1.0 capability, then God help them, they will have to at least bake something to miracle make it work within pascal's architecture.


----------



## Paul17041993

Quote:


> Originally Posted by *spacin9*
> 
> And a few things of note in my ongoing assessment:
> 
> DX 11 in a Win 10 environment, all 12 threads of my hex core CPU are being stressed and will go as high as 80-100 %. So I'm seeing a higher CPU usage in DX 11. Sits around 50% most of the time in DX 12. I do not see a real gain in DX 11 in Win 10. It's seems to be a bit worse if anything.
> 
> DX 11 in a Win 7 environment, 4 threads are not used at all... it seems it just uses 7-8 threads of my hex core cpu. I don't know if that's expected or not.. just thought it was interesting that there seems to be evidence of the promise of DX 12.


I'd say this is due to the change in how DX11 is handled in win10, more of an emulated approach inside DX12. Highly likely needs more driver optimisation to get up to scratch with win7/8 again and there's likely a lot of safety checks enabled.

I myself though haven't seen a difference between 8 and 10 with my 290X and the small handful of games I play, but I don't have any use data for my FX-8150 either, there could very well be more threads but with a total of 8 (integer) cores it hasn't made a performance impact...


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> Hopefully does. They were one of the devs praising it.


Ahhh nice. Thanks for sharing







Fable Legends and Ashes of the Singularity both look like very nice titles. All I need now is a nice FPS and I'm set.


----------



## semitope

Quote:


> Originally Posted by *Clocknut*
> 
> It will only depends on 2 things,
> 
> 1. The speed of DirectX12 adoption
> 2. The time it takes for Pascal(thats assuming it has at least GCN1.0 Asynchronous computer capability)
> 
> *on points no 2, I am not even sure if Pascal have that parallelism feature, if Pascal was already well passed the design phase & still didnt have GCN 1.0 capability, then God help them, they will have to at least bake something to miracle make it work within pascal's architecture.


Async is not essential to success. I am still not sure if it uses idle resources between graphics tasks or runs completely in parallel with graphics regardless of whats going on there but it benefits AMD because their hardware is stronger than the nvidia counterparts. The max computational power is higher on the AMD chips.

The effects the developers choose to pass through async could all be or mostly be done through the tradiational ways. If nvidia were to lack async but somehow manage to pack more power into their cards than AMD, they could overcome the deficiency. Highly unlikely without async though. AMD is already ahead in the teraflops game.

Basically I think we are missing the important detail here and something AMD mentioned. they have the highest TFLOPs/mm^2 architectures. literally they are simply faster and async allows them to tap into more of that power. GCN is stronk


----------



## Anna Torrent

The Beyond3D results actually show good ASync performance for GCN, but it might not be optimal
Also, remember that the number of threads *does not* equal efficiency.


----------



## Forceman

Quote:


> Originally Posted by *semitope*
> 
> I think we are missing the important detail here and something AMD mentioned. they have the highest TFLOPs/mm^2 architectures. literally they are simply faster and async allows them to tap into more of that power.


Well, as we see from the Fury X/290X results (that Mahigan likes to reference







), there's more to performance than just TFlops.


----------



## semitope

Quote:


> Originally Posted by *Forceman*
> 
> Well, as we see from the Fury X/290X results (that Mahigan likes to reference
> 
> 
> 
> 
> 
> 
> 
> ), there's more to performance than just TFlops.


yeah there can be bottlenecks in other places. I don't know where bottlenecks for some of the things we've heard async could be used for are though. With asynchronous compute the teraflops matter more from my understanding.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> I think they only show that the GPU is busy doing something, not that it is doing it efficiently for its architecture. It takes 64 tasks to form a single Waveform on GCN. Each CU is most efficient when processing 1 waveform. If you're feeding your CUs 32 tasks per cycle then it will still show up as being 100% under use but it won't be doing all the work it can do.


1. Are you sure about the waveform? I mean, it might be that a CU can be 100% busy doing 32 tasks, no?

2. About monitoring/HWInfo - yea, it seems like indeed what I previously thought to be GPU-bound is really not (or, not necessarily). And it's 2016

I think we need some good open-source cooperative for GPUs/hardware/laptops. Some company that its capital will be the peoples' money directly, with no profits (except salaries) and that everything in the hardware will be open sourced, safe as can be, and without marketing.


----------



## epic1337

Quote:


> Originally Posted by *semitope*
> 
> Async is not essential to success. I am still not sure if it uses idle resources between graphics tasks or runs completely in parallel with graphics regardless of whats going on there but it benefits AMD because their hardware is stronger than the nvidia counterparts. The max computational power is higher on the AMD chips.
> 
> The effects the developers choose to pass through async could all be or mostly be done through the tradiational ways. If nvidia were to lack async but somehow manage to pack more power into their cards than AMD, they could overcome the deficiency. Highly unlikely without async though. AMD is already ahead in the teraflops game.
> 
> Basically I think we are missing the important detail here and something AMD mentioned. they have the highest TFLOPs/mm^2 architectures. literally they are simply faster and async allows them to tap into more of that power. GCN is stronk


async helps in reducing stutters caused by pipeline stalls, in a way we're already seeing minimum framerates getting better in DX12.
combined with FreeSync we might even see stutter-free gaming at it's finest, could've been better with G-Sync's implementation though, AMD needs to work on that.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> async helps in reducing stutters caused by pipeline stalls, in a way we're already seeing minimum framerates getting better in DX12.
> combined with FreeSync we might even see stutter-free gaming at it's finest, could've been better with G-Sync's implementation though, AMD needs to work on that.


Also reduces the impact of rendering realtime shadows, light and fog(smoke). Anything that has to be rendered in part seperately can be done without hinderence to the base frames. So basically a scene full of particle effects and details will hardly effect the entirety of the games performance if utilized asyncronously. You can produce life like scenes without an loss of frames.


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> Async is not essential to success. I am still not sure if it uses idle resources between graphics tasks or runs completely in parallel with graphics regardless of whats going on there but it benefits AMD because their hardware is stronger than the nvidia counterparts. The max computational power is higher on the AMD chips.
> 
> The effects the developers choose to pass through async could all be or mostly be done through the tradiational ways. If nvidia were to lack async but somehow manage to pack more power into their cards they could overcome the deficiency.
> 
> I think we are missing the important detail here and something AMD mentioned. they have the highest TFLOPs/mm^2 architectures. literally they are simply faster and async allows them to tap into more of that power.


Yes... Async helps them achieve what is in this slide...


Latency becomes hidden by overlapping executions of Wavefronts. That's why GCN retains the same degree of latency as you throw more and more Kernels at it. GCN is far more parallel than competing architectures. I wouldn't say it is faster, it's just able to take on far more computational workloads (Threads) at any given time.

If you throw too much work at Maxwell/2, it begins to bottleneck. We see this result with the staircase effect, on nVIDIAs architecture, in Beyond3Ds graphs. So while Maxwell2 can compute a Kernel containing 32 threads in 25ms, GCN can compute a Kernel containing 64 threads (twice the commands) in 38-50ms. The problem is that if you throw a Kernel, containing 32 threads, at GCN, it will take the same 38-50ms. This is the result Beyond3D is getting and concluding (Jawed for example) that Maxwell 2 is so superior at compute.

If you add Async to the mix, You have that same 64 thread Kernel taking 38-50ms as well as a parallel Graphic task. So if we do the math, Maxwell 2 would take 50ms to handle a Kernel with 64 threads plus the 8-12ms it takes to handle the Graphics task.

I think that Beyond3D are CUDA programmers, if true, you can't fault them for not knowing.

At the end of this, Beyond3D will likely conclude that Oxide did something wrong when, in fact, they did something wrong in their tests.


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> 1. Are you sure about the waveform? I mean, it might be that a CU can be 100% busy doing 32 tasks, no?
> 
> 2. About monitoring/HWInfo - yea, it seems like indeed what I previously thought to be GPU-bound is really not (or, not necessarily). And it's 2016
> 
> I think we need some good open-source cooperative for GPUs/hardware/laptops. Some company that its capital will be the peoples' money directly, with no profits (except salaries) and that everything in the hardware will be open sourced, safe as can be, and without marketing.


1. That's what I mean. 32 tasks in a CU will show the CU as being 100% occupied when the CU could have had 64 tasks thrown at it and completed it in the same amount of time. Bundling 32 Tasks, per Kernel, is a waste of resources on GCN. I think that's what Beyond3D is doing though.

2. Exactly.


----------



## umeng2002

Quote:


> Originally Posted by *semitope*
> 
> Async is not essential to success. I am still not sure if it uses idle resources between graphics tasks or runs completely in parallel with graphics regardless of whats going on there but it benefits AMD because their hardware is stronger than the nvidia counterparts. The max computational power is higher on the AMD chips.
> 
> The effects the developers choose to pass through async could all be or mostly be done through the tradiational ways. If nvidia were to lack async but somehow manage to pack more power into their cards than AMD, they could overcome the deficiency. Highly unlikely without async though. AMD is already ahead in the teraflops game.
> 
> Basically I think we are missing the important detail here and something AMD mentioned. they have the highest TFLOPs/mm^2 architectures. literally they are simply faster and async allows them to tap into more of that power. GCN is stronk


idk, console devs using the GCN design think it's key to improving performance on the consoles' limited power.

It's not really a mater of absolute performance, it's also about efficiency. Letting parts idle is inefficient... like having a V10 being fed fuel to only 7 pistons because the fuel injection system could handle more.


----------



## tpi2007

If I'm getting this correctly the better analogy isn't AMD's GCN having the equivalent of Hyperthreading but rather having the ability to use its integer and floating point units at the same time and Nvidia doesn't.


----------



## Mahigan

Quote:


> Originally Posted by *tpi2007*
> 
> If I'm getting this correctly the better analogy isn't AMD's GCN having the equivalent of Hyperthreading but rather having the ability to use its integer and floating point units at the same time and Nvidia doesn't.


That's a better analogy. Yes


----------



## tpi2007

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *tpi2007*
> 
> If I'm getting this correctly the better analogy isn't AMD's GCN having the equivalent of Hyperthreading but rather having the ability to use its integer and floating point units at the same time and Nvidia doesn't.
> 
> 
> 
> That's a better analogy. Yes
Click to expand...


















I bet that Nvidia is watching this unfold and preparing a statement to explain things. Unless the explanation isn't very good for the shareholders in which case they may want to focus on getting other types of games that aren't affected by this out the door - even in beta benchmark mode - as soon as possible.

Having an isometric RTS as the sole talking example isn't doing anyone any favours anyhow.

Edit: I was just thinking about this: if you load up a compute task (that both can run, so not written in CUDA) in the background and then a game that doesn't rely much on compute, the performance of either the game or the compute task (or both) would see a pronounced decrease on Nvidia's cards while AMD's would perform with a much less pronounced impact. Has anybody tested this?


----------



## ZealotKi11er

Quote:


> Originally Posted by *tpi2007*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I bet that Nvidia is watching this unfold and preparing a statement to explain things. Unless the explanation isn't very good for the shareholders in which case they may want to focus on getting other types of games that aren't affected by this out the door - even in beta benchmark mode - as soon as possible.
> 
> Having an isometric RTS as the sole talking example isn't doing anyone any favours anyhow.
> 
> Edit: I was just thinking about this: if you load up a compute task (that both can run, so not written in CUDA) in the background and then a game that doesn't rely much on compute, the performance of either the game or the compute task (or both) would see a pronounced decrease on Nvidia's cards while AMD's would perform with a much less pronounced impact. Has anybody tested this?


This is only a problem if Nvidias Pascal has problems etc. With Maxwell, $, Market-share, influence they can cope alone and beat AMD just fine.


----------



## Paul17041993

Quote:


> Originally Posted by *tpi2007*
> 
> If I'm getting this correctly the better analogy isn't AMD's GCN having the equivalent of Hyperthreading but rather having the ability to use its integer and floating point units at the same time and Nvidia doesn't.


ALUs and having twice the amount of them per CU compared to maxwell 2. All 64 vector ALUs per CU can be utilised per clock by two matrix multiplies per CU, but that's a rare occurrence in itself.

Basically a lot of the time the vector and branch ALUs in each CU end up idle simply because they're not being fed enough tasks, whereas with async compute its like having 64 HT virtual cores for each CU and tasks can be crammed together and not let the ALUs run idle.

Each task is basically enter pipe > do task > exit pipe and the CUs have nothing to do during entry and exit, but when shoving in multiple tasks at a time you get an effect like that of continuously adding cores to a CPU, the performance increases linearly until the CUs are completely maxed out with tasks to do each cycle.


----------



## Mahigan

Quote:


> Originally Posted by *Paul17041993*
> 
> ALUs and having twice the amount of them per CU compared to maxwell 2. All 64 vector ALUs per CU can be utilised per clock by two matrix multiplies per CU, but that's a rare occurrence in itself.
> 
> Basically a lot of the time the vector and branch ALUs in each CU end up idle simply because they're not being fed enough tasks, whereas with async compute its like having 64 HT virtual cores for each CU and tasks can be crammed together and not let the ALUs run idle.
> 
> Each task is basically enter pipe > do task > exit pipe and the CUs have nothing to do during entry and exit, but when shoving in multiple tasks at a time you get an effect like that of continuously adding cores to a CPU, *the performance increases linearly until the CUs are completely maxed out with tasks to do each cycle.*


Yep, 2,560 Threads is the max per CU (or 40 Wavefronts). The issue with DX11 was keeping the CUs fed, rather than Idling.


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> Yep, 2,560 Threads is the max per CU (or 40 Wavefronts). The issue with DX11 was keeping the CUs fed, rather than Idling.


Oh how I wish CPUs could be designed this way.


----------



## Forceman

Quote:


> Originally Posted by *SpeedyVT*
> 
> Oh how I wish CPUs could be designed this way.


We'll, they could, but then they'd be GPUs.


----------



## vloeibaarglas

Nvidia fanboys can bench all day trying to prove that GCN doesn't do async or Nvidia does async. I'm sure Ox
Quote:


> Originally Posted by *tpi2007*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I bet that Nvidia is watching this unfold and preparing a statement to explain things. Unless the explanation isn't very good for the shareholders in which case they may want to focus on getting other types of games that aren't affected by this out the door - even in beta benchmark mode - as soon as possible.
> 
> Having an isometric RTS as the sole talking example isn't doing anyone any favours anyhow.
> 
> Edit: I was just thinking about this: if you load up a compute task (that both can run, so not written in CUDA) in the background and then a game that doesn't rely much on compute, the performance of either the game or the compute task (or both) would see a pronounced decrease on Nvidia's cards while AMD's would perform with a much less pronounced impact. Has anybody tested this?


Ummm, so can we run the Beyond3D's Nvidia favored benchmark + some kind of OpenCL Bench?


----------



## Mahigan

Gotta love when someone suggests a problem with the test and they instantly point to a thread where that person made mistakes while trying to figure out what was happening in Ashes of the Singularity:


Lets not even address the concerns raised. Lets just point to past errors. I'm quite certain that this counts as a logical fallacy.

I want to know how many Threads per Kernel they're using. 32 or 64? If it is 32 then they're doing it wrong. If it is 64 then I stand corrected.


Quote:


> Kernel occupancy is a measure of the utilization of the resources of a compute unit on a GPU, the utilization being measured by the number of in-flight wavefronts, for a given kernel, relative to the number of wavefronts that could be launched given the ideal kernel dispatch configuration (*dependent on the work-group size and resource utilization in the kernel*).


Source: http://developer.amd.com/tools-and-sdks/archive/amd-app-profiler/user-guide/app-profiler-kernel-occupancy/


----------



## dogen1

Quote:


> Originally Posted by *Mahigan*
> 
> I think that Beyond3D are CUDA programmers, if true, you can't fault them for not knowing.
> 
> At the end of this, Beyond3D will likely conclude that Oxide did something wrong when, in fact, they did something wrong in their tests.


Beyond3d is just a forum. There are some developers who frequent there, including some console, pc, and likely some cuda programmers. Hopefully a gcn expert will give his input.


----------



## Slaughterem

Quote:


> Originally Posted by *ZealotKi11er*
> 
> This is only a problem if Nvidias Pascal has problems etc. With Maxwell, $, Market-share, influence they can cope alone and beat AMD just fine.


So what market share are you talking about? The 82% of dGpu for PC? You seem to forget that consoles make up the bulk of the market. Factor this in and I would say that AMD has the larger market share. Add to this that M$ is looking to unify games to be played on both platforms, with DX12. Games made for Xbox one would not need to be changed to be played on PC. That is the bigger picture.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Slaughterem*
> 
> So what market share are you talking about? The 82% of dGpu for PC? You seem to forget that consoles make up the bulk of the market. Factor this in and I would say that AMD has the larger market share. Add to this that M$ is looking to unify games to be played on both platforms, with DX12. Games made for Xbox one would not need to be changed to be played on PC. That is the bigger picture.


They Market Share that gives money. Consoles bring no money to AMD or else whey would not have $1.80 stocks.


----------



## Paul17041993

Quote:


> Originally Posted by *SpeedyVT*
> 
> Oh how I wish CPUs could be designed this way.


well that's what HSA processors are, as they continue to rise we'll see a lot of mathematics offloaded to their CU's allowing for even more potential performance alongside the dedicated GPUs. Biggest plus for them is the lack of latency and ability to use the full RAM space, making them much easier to use.


----------



## Slaughterem

Quote:


> Originally Posted by *ZealotKi11er*
> 
> They Market Share that gives money. Consoles bring no money to AMD or else whey would not have $1.80 stocks.


Wow I did not know you were series 6 and 7 licensed to advise about a companies stock. Please tell me more about how consoles have not provided income to AMD


----------



## Mahigan

Quote:


> Originally Posted by *dogen1*
> 
> Beyond3d is just a forum. There are some developers who frequent there, including some console, pc, and likely some cuda programmers. Hopefully a gcn expert will give his input. Sebbbi said he would have made a micro benchmark to test exactly this, along with other dx12 features, including execute indirect, and multi adapter, but he's too busy with a newborn.


Well... now they question my concerns because I'm a former ATi employee... and they bring up that I might be gaining extra change...



Same attacks they leveled against me over at HardOCP. Looks like I can't expect them to perform a fair test. Oh well.

I remember Sebbbi. From way back when I used to frequent Beyond3D. Didn't mean to say they'll are CUDA programmers. I used to be a member there. I mean't to say that the people conducting the test appear to be running the same workloads for both nVIDIA and AMD architectures.


----------



## vloeibaarglas

Quote:


> Originally Posted by *Slaughterem*
> 
> So what market share are you talking about? The 82% of dGpu for PC? You seem to forget that consoles make up the bulk of the market. Factor this in and I would say that AMD has the larger market share. Add to this that M$ is looking to unify games to be played on both platforms, with DX12. Games made for Xbox one would not need to be changed to be played on PC. That is the bigger picture.


Never thought about it that way. What does this even mean?

Every game in the future will be optimized for GCN 1.0 Xbone/W10? What would become the difference between an Xbox One game and a W10 game?


----------



## Slaughterem

Quote:


> Originally Posted by *vloeibaarglas*
> 
> Never thought about it that way. What does this even mean?
> 
> Every game in the future will be optimized for GCN 1.0 Xbone/W10? What would become the difference between an Xbox One game and a W10 game?


IMO Games would be made to play at 1080p for consoles at 60 min FPS. You could then use a 4KTV to upscale the 1080P signal to give you 4K display on that TV. PC gamers would be able to run 4K monitors at high min frame rates because of the brute force of their dGPU.


----------



## Mahigan

Quote:


> Originally Posted by *vloeibaarglas*
> 
> Never thought about it that way. What does this even mean?
> 
> Every game in the future will be optimized for GCN 1.0 Xbone/W10? What would become the difference between an Xbox One game and a W10 game?


That's sort of Microsoft's end game:
http://arstechnica.com/gaming/2015/06/microsoft-xbox-on-windows-10-wont-be-like-games-for-windows-live/
Quote:


> Our vision is to unify platforms so gamers can play the games they want on any Windows 10 device-PC, Xbox One, or otherwise," Spencer said.


Quote:


> We want to make clear that when we talk about Xbox going forward, we're talking about gaming on all Windows 10 devices-PCs, tablets, phones, Xbox One, and HoloLens,


- Microsoft head of Xbox Phil Spencer

On the PC, you'll have access to better resolutions, more nifty post processing effects, larger textures etc. So the games will look much better than on the consoles but, in the end, you're looking at titles which will be optimized for GCN. That doesn't mean nVIDIAs architectures won't run those titles better.



The first big title is Fable Legends: http://www.polygon.com/2015/3/5/8158353/fable-legends-cross-play-will-be-entirely-blind-to-platform-according
Quote:


> the game will be optimized for each device and its input scheme.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> That's sort of Microsoft's end game:
> http://arstechnica.com/gaming/2015/06/microsoft-xbox-on-windows-10-wont-be-like-games-for-windows-live/
> 
> - Microsoft head of Xbox Phil Spencer
> 
> On the PC, you'll have access to better resolutions, more nifty post processing effects, larger textures etc. So the games will look much better than on the consoles but, in the end, you're looking at titles which will be optimized for GCN. That doesn't mean nVIDIAs architectures won't run those titles better.
> 
> The first big title is Fable Legends: http://www.polygon.com/2015/3/5/8158353/fable-legends-cross-play-will-be-entirely-blind-to-platform-according


Microsoft has pushed for this for at least a decade. Many years ago with Dell Computers the folks at MS had been showing us their early efforts


----------



## Mahigan

Gotta love the drama this whole thing has created...


__
https://www.reddit.com/r/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/

I think I know why the program is doing this but nobody will listen to me...


----------



## JunkoXan

Quote:


> Originally Posted by *Forceman*
> 
> Updated Fury benchmark from over at Beyond3d. Apparently the graph posted yesterday was really a 390X, not a Fury X. This new chart shows all three. Interesting that the Fury has so many spikes where it seems to be running serially.


Tahiti looks so smooth!







!


----------



## pharma57

Quote:


> Originally Posted by *Mahigan*
> 
> Gotta love the drama this whole thing has created...
> 
> 
> __
> https://www.reddit.com/r/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/
> 
> I think I know why the program is doing this but nobody will listen to me...


I would think they would listen to a former ATI employee ...


----------



## Mahigan

Quote:


> Originally Posted by *pharma57*
> 
> I would think they would listen to a former ATI employee ...


And what is wrong with having worked for ATi? Have you worked for McDonalds before? Would you then be biased towards McDonalds over, say, Burger King?

Seriously...


----------



## Clocknut

Quote:


> Originally Posted by *Mahigan*
> 
> On the PC, you'll have access to better resolutions, more nifty post processing effects, larger textures etc. So the games will look much better than on the consoles but, in the end, you're looking at titles which will be optimized for GCN. That doesn't mean nVIDIAs architectures won't run those titles better.


Nvidia have to either design around GCN via hardware or get optimization via driver/software. Microsoft is playing into AMD's hands here.


----------



## swiftypoison

Quote:


> Originally Posted by *Mahigan*
> 
> And what is wrong with having worked for ATi? Have you worked for McDonalds before? Would you then be biased towards McDonalds over, say, Burger King?
> 
> Seriously...


I dont think he meant it in a bad way. I think he meant that your knowledge for contributing to ATI should come in handy when discussing GPU architectures.
In any case, @Mahigan you work is extremely appreciated! Can I send you a case of beer?


----------



## Mahigan

Quote:


> Originally Posted by *swiftypoison*
> 
> I dont think he meant it in a bad way. I think he meant that your knowledge for contributing to ATI should come in handy when discussing GPU architectures.
> In any case, @Mahigan you work is extremely appreciated! Can I send you a case of beer?


I believe he mean't it in a malicious manner because I think he is the same guy from Beyond3D making statements that I'm bought. Suffice to say I am tired of having that accusation leveled at me.


Beer? The shipping costs would be enormous


----------



## Devnant

Back to the discussion. some interesting results from PadyEos running a 980 TI with TDR disabled on the beyond3d forums.





Notice how there are more bars overlapping, suggesting async computing support.

Source: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-13#post-1869393

Also, forcing single commandlist (forced synchronized?) latency is massively increased from 97.38ms to 2294.69ms. Any thoughts?


----------



## Forceman

Quote:


> Originally Posted by *Devnant*
> 
> Any thoughts?


That programming for this stuff is not as simple as people think?

For example, why does Tahiti have perfect async scaling on this little test?



And why does this 290X seem to defy physics by being so concurrent that it runs time backwards?


----------



## SpeedyVT

Quote:


> Originally Posted by *Clocknut*
> 
> Nvidia have to either design around GCN via hardware or get optimization via driver/software. Microsoft is playing into AMD's hands here.


More like AMD playing to Nintendo with the original graphics design used in Gamecube but super scaled to vast number of shaders and stuff. Yes GCN is very similar to Hollywood.

However yes more bells and whistles distinctively of course.

Progessively it was evolving from Nintendo's Hollywood to the VLIW types to GCN. Slow progession.

Ultimately graphics technology in PCs have been shaped by console.

Microsoft is smart unifying the best aspects of a console with the best aspects of a PC.


----------



## Baasha

So what's the first big-budget DX12 game coming out? Is AC Syndicate going to be DX12?


----------



## Forceman

Quote:


> Originally Posted by *Baasha*
> 
> So what's the first big-budget DX12 game coming out? Is AC Syndicate going to be DX12?


Fable. Then should be Battlefront, I guess.


----------



## Paul17041993

Quote:


> Originally Posted by *Clocknut*
> 
> Nvidia have to either design around GCN via hardware or get optimization via driver/software. Microsoft is playing into AMD's hands here.


Considering how similar they made maxwell to GCN, they don't have to do much. Once its optimised for GCN maxwell 2 should be able to run it quite efficiently except for this graphics issue, only other note is that due to maxwell only having 32 vector ALU/FPU/IPUs per unit means the performance gain drops out quicker than GCN.

"GCN 2.0" and pascal, well we'll only know when they arrive, AMD could decide to half the vector ALUs per unit and/or nvidia could double theirs, or something else like 4 branch ALUs vs 1 (each of which are equiv to 4 vectors)...


----------



## ZealotKi11er

Quote:


> Originally Posted by *Forceman*
> 
> Fable. Then should be Battlefront, I guess.


I don't think BF will be DX12 on launch.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> Back to the discussion. some interesting results from PadyEos running a 980 TI with TDR disabled on the beyond3d forums.
> 
> 
> 
> 
> 
> Notice how there are more bars overlapping, suggesting async computing support.
> 
> Source: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-13#post-1869393
> 
> Also, forcing single commandlist (forced synchronized?) latency is massively increased from 97.38ms to 2294.69ms. Any thoughts?


I saw the same behavior with the GTX 750 Ti in the previous test. All of this leads me to believe that...

1. The coding is wrong
or
2. Maxwell 2 can't do Async

As for GCN, 50ms latency is absurd... so there's something wrong in the way GCN is executing that code. This leads me to believe the coding is bad.

I'm not a programmer. So I'll leave that to these guys to handle that. I've spent the past few weeks analyzing GPU architectures and brushing up on all of the new changes.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> I saw the same behavior with the GTX 750 Ti in the previous test. All of this leads me to believe that...
> 
> 1. The coding is wrong
> or
> 2. Maxwell 2 can't do Async
> 
> As for GCN, 50ms latency is absurd... so there's something wrong in the way GCN is executing that code. This leads me to believe the coding is bad.
> 
> I'm not a programmer. So I'll leave that to these guys to handle that. I've spent the past few weeks analyzing GPU architectures and brushing up on all of the new changes.


Well, if Maxwell 2 can't do async there shouldn't be any bars overlapping at all. I mean, that's the whole point of this simple benchmark AFAIK.

From the chart maker:

"Each bar in the chart shows the time it took for the async compute to finish.
The red block that floats to the top is the time it would take for the compute, by itself, to finish.
The blue block at the bottom is the time it would take for the graphics, by itself, to finish.

What we want here is for the red and blue to overlap, this signifies the async compute running faster than if you were to run the compute and graphics separately.
Sometimes we see a white gap between the 2 colors, this signifies that the async compute run is slower than it would have been if the two were run separately."

Also, 750 TI (Maxwell 1) hardly benefits according to what I see here:



Best case scenario there's a 2-3 ms lower latency. While the 980 TI (Maxwell 2) shows more regularly benefits of around 12 ms lower latency when async seems to be working properly.

Seems like there is some async support for Maxwell 2, but the GCN architecture is simply way superior for that job.


----------



## glr123

Quote:


> Originally Posted by *Mahigan*
> 
> Gotta love the drama this whole thing has created...
> 
> 
> __
> https://www.reddit.com/r/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/
> 
> I think I know why the program is doing this but nobody will listen to me...


That's me!

So do you think there could be any other explanations for the slow latency on the GCN cards? I thought I read somewhere that the compute command goes CPU -> GPU -> CPU, is that what is going on? Could this sequence just be faster on nvidia, and it isn't a latency effect we ever experience because that isn't a typical compute process during gaming?

(pardon my ignorance!!)


----------



## Klocek001

Quote:


> Originally Posted by *Baasha*
> 
> So what's the first big-budget DX12 game coming out? Is AC Syndicate going to be DX12?


I think we'll first see some dx12 patches to already existing games, to develop a game from scratch takes years.


----------



## SlackerITGuy

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I don't think BF will be DX12 on launch.


With the way DICE has always lived on the bleeding edge regarding API implementation (one of the first to DirectX 11 with BFBC2, the absolutely first one to DirectX 11.1 with BF4 and created Mantle with AMD for BF4) one would assume Battlefront will indeed launch with at least DirectX 12 or Vulkan ready to go.


----------



## ZealotKi11er

Quote:


> Originally Posted by *SlackerITGuy*
> 
> With the way DICE has always lived on the bleeding edge regarding API implementation (one of the first to DirectX 11 with BFBC2, the absolutely first one to DirectX 11.1 with BF4 and created Mantle with AMD for BF4) one would assume Battlefront will indeed launch with at least DirectX 12 or Vulkan ready to go.


I hope so but they said something about 2016.


----------



## SpeedyVT

Quote:


> Originally Posted by *Devnant*
> 
> Well, if Maxwell can't do async there shouldn't be any bars overlapping at all. I mean, that's the whole point of this simple benchmark AFAIK.
> 
> From the chart maker:
> 
> "Each bar in the chart shows the time it took for the async compute to finish.
> The red block that floats to the top is the time it would take for the compute, by itself, to finish.
> The blue block at the bottom is the time it would take for the graphics, by itself, to finish.
> 
> What we want here is for the red and blue to overlap, this signifies the async compute running faster than if you were to run the compute and graphics separately.
> Sometimes we see a white gap between the 2 colors, this signifies that the async compute run is slower than it would have been if the two were run separately."
> 
> Also, 750 TI (Maxwell 1) hardly benefits according to what I see here:
> 
> 
> 
> Best case scenario there's a 2-3 ms lower latency. While the 980 TI (Maxwell 2) shows more regularly benefits of around 12 ms lower latency when async seems to be working properly.
> 
> Seems like there is some async support, but the GCN architecture is simply way superior for that job.


You sure that's not just because the 980 ti is faster than the 750 ti that there would be enough to reduce latency. It's like comparing a 270 to a 290. Also that the 980 ti can do the operations in loads but not asyncronous loads.


----------



## SlackerITGuy

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I hope so but they said something about 2016.


Source on that mate?


----------



## Devnant

Quote:


> Originally Posted by *SpeedyVT*
> 
> You sure that's not just because the 980 ti is faster than the 750 ti that there would be enough to reduce latency. It's like comparing a 270 to a 290.


Almost, because according to this:

"On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. *Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once - early implementations of HyperQ cannot be used in conjunction with graphics*. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage."

Source: http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading


----------



## ZealotKi11er

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Source on that mate?


http://www.overclock3d.net/articles/cpu_mainboard/dice_wants_win_10_plus_dx12_as_minimum_specs_for_holiday_2016_frostbite_games/1


----------



## SpeedyVT

Quote:


> Originally Posted by *Devnant*
> 
> Almost, because according to this:
> 
> "On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. *Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once - early implementations of HyperQ cannot be used in conjunction with graphics*. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage."
> 
> Source: http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading


Well then that explains it right there. It's able to do the ques but not asyncronously. It's why it overlaps and then doesn't and then overlaps again, back to back. It's doing the ques when it can dispatch it. So it will appear asyncronous in the benchmark to an extent. I think the idea is to see blue and red completely overlapping.


----------



## Paul17041993

Quote:


> Originally Posted by *Mahigan*
> 
> As for GCN, 50ms latency is absurd... so there's something wrong in the way GCN is executing that code. This leads me to believe the coding is bad.


either a lot of data back and forth, a lot of teeny tasks that can only use one or a few ALUs at a time or a combination of both.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> 1. That's what I mean. 32 tasks in a CU will show the CU as being 100% occupied when the CU could have had 64 tasks thrown at it and completed it in the same amount of time. Bundling 32 Tasks, per Kernel, is a waste of resources on GCN. I think that's what Beyond3D is doing though.
> 
> 2. Exactly.


Not sure - I'm trying to wrap my head around what really SIMDs and threads mean in practice and what is. I'm in delay


----------



## Vesku

Quote:


> Originally Posted by *Mahigan*
> 
> Well... now they question my concerns because I'm a former ATi employee... and they bring up that I might be gaining extra change...
> 
> 
> 
> Same attacks they leveled against me over at HardOCP. Looks like I can't expect them to perform a fair test. Oh well.
> 
> I remember Sebbbi. From way back when I used to frequent Beyond3D. Didn't mean to say they'll are CUDA programmers. I used to be a member there. I mean't to say that the people conducting the test appear to be running the same workloads for both nVIDIA and AMD architectures.


I wouldn't get bogged down in a discussion over the performance of Nvidia vs AMD in this kind of simple test of Async Compute. It's just a distraction from determining whether Maxwell 2 actually has any useful Async capabilities. So far even GCN 1.0 seems better equipped.


----------



## Kpjoslee

Quote:


> Originally Posted by *Vesku*
> 
> I wouldn't get bogged down in a discussion over the performance of Nvidia vs AMD in this kind of simple test of Async Compute. It's just a distraction from determining whether Maxwell 2 actually has any useful Async capabilities. So far even GCN 1.0 seems better equipped.


That is basically the only thing we can assume at this point. Pretty amusing to see that this drama keeps on going when the game engine fully running async is probably years away lol.


----------



## SpeedyVT

Quote:


> Originally Posted by *Kpjoslee*
> 
> That is basically the only thing we can assume at this point. Pretty amusing to see that this drama keeps on going when the game engine fully running async is probably years away lol.


I can't wait for a fully async game, atleast NVidia by then for NVidia users should have a card compatible with async.

But developing a card to support async compromises it's ability at serial because it has to be designed to handle parrellel loads.


----------



## Kpjoslee

Quote:


> Originally Posted by *SpeedyVT*
> 
> I can't wait for a fully async game, atleast NVidia by then for NVidia users should have a card compatible with async.
> 
> But developing a card to support async compromises it's ability at serial because it has to be designed to handle parrellel loads.


At least DX12 looking pretty good for AMD so far. DX11 generation hasnt been good for AMD and their marketshare suffered because of it. It was 4 years too soon. Hope at least they recover in next few years.


----------



## SpeedyVT

Quote:


> Originally Posted by *Kpjoslee*
> 
> At least DX12 looking pretty good for AMD so far. DX11 generation hasnt been good for AMD and their marketshare suffered because of it. It was 4 years too soon. Hope at least they recover in next few years.


Whether or not AMD or NVidia fans agree it's key that either one survives to keep each other competitive. There isn't another company that'd benefit from producing PC graphics like the two do and would be forced only into a confined business sector. Even if others disagree we see it happen all the time, especially when you consider the Walmart chain killing all other department stores. Eventually Walmart will be the only one and their prices will increase, actually they are starting to increase currently in my areas.

It's like the Yankees vs Red Socks, you can't have a good game without the two throwing punches. Whom ever represents who is irrelevent.


----------



## Devnant

Quote:


> Originally Posted by *SpeedyVT*
> 
> Well then that explains it right there. It's able to do the ques but not asyncronously. It's why it overlaps and then doesn't and then overlaps again, back to back. It's doing the ques when it can dispatch it. So it will appear asyncronous in the benchmark to an extent. I think the idea is to see blue and red completely overlapping.


Yeah, that's Maxwell 1. But Maxwell 2 can do both queues according to Anandtech. It' s overlapping more often than not on 2 compared to 1. But GCN is just way more consistent.

Could also be a driver issue on Maxwell 2.


----------



## provost

Quote:


> Originally Posted by *Devnant*
> 
> Yeah, that's Maxwell 1. But Maxwell 2 can do both queues according to Anandtech. It' s overlapping more often than not on 2 compared to 1. But GCN is just way more consistent.
> 
> Could also be a driver issue on Maxwell 2.


Let's assume for a moment that you are running Nvidia. Your company has an 80% market share of the discreet consumer GPU segment, and your only other competitor in this segment is going through its own issues (restructuring, etc), so it is not in a position to inflict any meaningful pain on your earnings for a sustainable period of time. What would you do to ensure that you get your loyal customer base excited about buying a Nvidia card again? Do you offer them more performance on the cards they already have? No, you offer them performance on the new cards. I have used a very basic example here, but this is how simple it is.


----------



## ku4eto

Quote:


> Originally Posted by *Devnant*
> 
> Yeah, that's Maxwell 1. But Maxwell 2 can do both queues according to Anandtech. It' s overlapping more often than not on 2 compared to 1. But GCN is just way more consistent.
> 
> Could also be a driver issue on Maxwell 2.


This is a simple program, not a game. The driver most probably have close to 0 effect on this matter.


----------



## Devnant

Quote:


> Originally Posted by *ku4eto*
> 
> This is a simple program, not a game. The driver most probably have close to 0 effect on this matter.


Yeah, but it's not a light test. It's doing up to 1024 instructions (very few shaders go past 500 instructions in games?). You could be right about the driver though.


----------



## Paul17041993

Went and took a look back at maxwell 2's engine counts compared to GCN; where GCN has 8 + 1 engines with the 8 computes having 8 queues each, allowing for a total of 64 compute tasks alongside graphics, maxwell 2 has a total of 32 individual engine + queue with one dedicated to graphics. Due to the complexity behind synchronising the units between each engine I doubt there's actually 32 of them, more likely 4 or 8 with 4 or 8 queues each and one of those queues in one engine is for graphics.

I find this makes the most sense for the lack of graphics + compute parallelism as for that one queue to run it must set the engine to graphics mode and pause the other 3/7 queues, and while this one engine is in graphics mode the rest of them likely also have to pause to prevent conflicts. Or at least temporarily until the drivers fix it.

Or it could just be, you know, 32 queues in one massive engine, which of course means that it can only ever do compute or graphics at any point in time, but I'd doubt even nvidia would want to pull that one...


----------



## Devnant

I'm just wondering. Would it be theoretically possible to reduce NVIDIA's poor async computing by using SLI?


----------



## DeathMade

Hi Mahigan. I just wanted to say THANKS for all the work you do here. Even if you are ATI employe or not you still add a lot of actuall links and sources to support your claims. IDK how will this end but I'm sure you did a great job to get people talking about it









Keep up good work please


----------



## Klocek001

all that makes me want a Zen APU so much. 16 core with R9 Nano class GPU with integrated HBM


----------



## mutantmagnet

Quote:


> Originally Posted by *Klocek001*
> 
> all that makes me want a Zen APU so much. 16 core with R9 Nano class GPU with integrated HBM


Their 16 core CPU is for servers. Unless you like to blow a lot of cash the best you can hope for is an 8 core cpu with 16 threads.


----------



## ToTTen

Quote:


> Originally Posted by *Devnant*
> 
> I'm just wondering. Would it be theoretically possible to reduce NVIDIA's poor async computing by using SLI?


On SLI both cards have to render either half the screen (SFR) or the full screen (AFR), meaning they have a constant rendering task anyways.
So no, SLI wouldn't solve the performance degradation caused by the lack of Async Compute in Maxwell 2's hardware (which isn't a sure thing _yet_).


----------



## Klocek001

Quote:


> Originally Posted by *mutantmagnet*
> 
> Their 16 core CPU is for servers. Unless you like to blow a lot of cash the best you can hope for is an 8 core cpu with 16 threads.


Yeah I heard they're planning HT. 8c/16t is great too, and even 2560 cores on r9 290 class iGPU with HBM should do exceptionally well in dx12. You can pair it with whatever GPU you like if you still want more performance if multi adapter works well.
If Zen APU delivers good CPU and iGPU performance you can probably just spend big bucks on the APU and that's all. Buy a cheap (but decent) mATX board with 1 pci-e and put a used 290/290X to runn in multi adapter / CFX with your APU.


----------



## GorillaSceptre

*IF* this turns out to be true, then people should be allowed to return them imo. That's false advertising. Legally they'll probably get away with it as they do technically support it, but it gives worse performance with it on







Reminds me of the 970 fiasco, technically there's 4GB, but.. well you know.

Makes you wonder if they do _really_ support some of the DX12 features too..


----------



## SlackerITGuy

Quote:


> Originally Posted by *ZealotKi11er*
> 
> http://www.overclock3d.net/articles/cpu_mainboard/dice_wants_win_10_plus_dx12_as_minimum_specs_for_holiday_2016_frostbite_games/1


Yeah I thought you might be referring to that.

He's not saying Holiday 2016 will be the first time we see DirectX 12 games, just that he would like to require DirectX 12 as the minimum spec for his games by that time.

Even his most recent games, use DirectX 10 as the minimum spec.


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> 
> 
> *IF* this turns out to be true, then people should be allowed to return them imo. That's false advertising. Legally they'll probably get away with it as they do technically support it, but it gives worse performance with it on
> 
> 
> 
> 
> 
> 
> 
> Reminds me of the 970 fiasco, technically there's 4GB, but.. well you know.
> 
> Makes you wonder if they do _really_ support some of the DX12 features too..


Sadly, End justifies the means in the tech world. 970 fiasco was largely irrelevant unless ran in very extreme scenarios. If the performance ends up fine with it, it wouldnt mean much, and that is what I think might end up happening anyways.


----------



## Klocek001

Quote:


> Originally Posted by *Kpjoslee*
> 
> Sadly, End justifies the means in the tech world. 970 fiasco was largely irrelevant unless ran in very extreme scenarios. If the performance ends up fine with it, it wouldnt mean much, and that is what I think might end up happening anyways.


this.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> GCN was built for Parallelism, not serial workloads like nVIDIAs architectures. This is why you don't see GCN taking a hit with 512 Kernels.
> 
> What did Oxide do? They built two paths. One with Shaders Optimized for CUDA and the other with Shaders Optimized for GCN. On top of that GCN has Async working. Therefore it is not hard to determine why GCN performs so well in Oxide's engine. It's a better architecture if you push it and code for it. If you're only using light compute work, nVIDIAs architectures will be superior.
> 
> This means that the burden is on developers to ensure they're optimizing for both. In the past, this hasn't been the case. Going forward... I hope they do. As for GameWorks titles, don't count them being optimized for GCN. That's a given. Oxide played fair, others... might not.


Most console ports will do so, simply because the consoles are GCN by nature, so at the very least, they will be optimized for GCN. Since so many games these days are console ports, that may not be a huge flaw. Any games that are Vulkan too, will likely be GCN optimized.

Any PC Exclusives though with Nvidia help, not so much as you've noted. Plus due to the sheer market share of Nvidia, developers might have to.

This is one big reason why we need a larger sample size to judge. That means more games to compare with.

Quote:


> Originally Posted by *Forceman*
> 
> Updated Fury benchmark from over at Beyond3d. Apparently the graph posted yesterday was really a 390X, not a Fury X. This new chart shows all three. Interesting that the Fury has so many spikes where it seems to be running serially.


There is definitely something bottlenecking the Fury there. Either way, it is not looking like a good purchase, not unless the price goes down a lot.

In theory, the massive shader performance should lead to a close to 45% gain, but in reality, we see spikes and a much smaller gain.

Quote:


> Originally Posted by *gamervivek*
> 
> Theoretical specifications mean next to nothing.
> 
> 
> 
> If the theoretical specifications mattered you'd see Fury fall way behind at 4k.


+ Rep.

A very important observation that I overlooked. Previously I'd always assumed that actual results would be a percentage of the theoretical maximums, but it's looking like this is not the case.

The reason why I thought it might have been the triangles is because of what the Oxide developer said. He strongly implied that the weaker triangle performance might be the issue, but if this benchmark is the case, that may not be the case at all.

@Mahigan:
We still need to figure out what is bottlenecking the Fury X.

It is looking like this graph may not have the answers.



See above - there is evidence that it is not the triangles?
The ROPs, you mentioned the superior color compression offset that.
The Rasterizer does not seem to be it, as the Fury X can sustain more Draw Calls than the 980Ti?

What does that leave us? I still think that it could be the HBM that is bottlenecking the Fury X.

Should I email Joel Huskra about this? I have his email and I'd love to hear his thoughts about this. Maybe if possible too, it may be worth getting in touch with Dave Kanter.

Quote:


> Originally Posted by *provost*
> 
> Let's assume for a moment that you are running Nvidia. Your company has an 80% market share of the discreet consumer GPU segment, and your only other competitor in this segment is going through its own issues (restructuring, etc), so it is not in a position to inflict any meaningful pain on your earnings for a sustainable period of time. What would you do to ensure that you get your loyal customer base excited about buying a Nvidia card again? Do you offer them more performance on the cards they already have? No, you offer them performance on the new cards. I have used a very basic example here, but this is how simple it is.


Actually, there's another way. Look at the type of response we get in this thread. Many do not "want" to believe that Nvidia has a temporary disadvantage. The main arguments against Mahigan have been basically, " we don't want to believe you because you made Nvidia look bad".

Now that isn't to say that's every argument. For example the person that pointed out the rasterizer was not the bottleneck due to draw calls and above gamervivek's observations on real vs theoretical performance are very good arguments as to why Mahigan's hypothesis is in some ways not accurate. Those were great observations and could lead to serious modifications on his hypothesis. Also, the one game part does remain valid, and we need more games to figure out what is happening here.

Nvidia still has a huge monetary advantage, more connections with developers, dominates the compute market (although that is being challenged now by Intel), and perhaps most importantly, mind share.

If they make their next generation super parallel (and Mahigan's tessellation analogy comes to mind here - introduced by the 5870 but vastly overtaken by Nvidia), then it will be sold to gamers that way. It may also prove to be a huge boon to CUDA.

I think that AMD is still very much the underdog here. Very much so.


----------



## provost

Actually, I think that it does make the Fury (or any AMD card starting with 290) a better buy, since it provides a lot more future proof insurance than Maxwell 2, if everything theorized here turns out to be true.

As for me, I am too long on Nvidia with 4 OG Titans, 1 780 TI KPE, 1 690 and Nvidia in my laptop.... Lol

I had ordered the Furyx a way back, and Amazon kept delaying, so I ended up removing it from my pre-order. However, I were being completely candid, I don't know how I don't pick one of new AMD cards over Maxwell 2, if I were building today (assuming everything said here is accurate). But, that's just me..... and my opinion and a nickel would get me a cup of coffee.

Edit: we can't discount Nvidia's dx11 performance of today vs dx 12 of the same cards in the future . So everyone may have their own preference. Since I am already overweight Nvidia, a little AMD in the mix would be fun for me anyway... Lol


----------



## Mahigan

To lighten up the thread...




A little humor









I find it funny that this is turning out to be a massive issue, people are returning their GTX 980 Ti cards when we don't yet have a response from nVIDIA. We still have the folks working hard at Beyond3D to see whether or not the information we uncovered in this thread is true.

I'm still being patient. I'm not willing to commit to a conclusion just yet.


----------



## Klocek001

Quote:


> Originally Posted by *provost*
> 
> I don't know how I don't pick one of new AMD cards over Maxwell 2, if I were building today (assuming everything said here is accurate)


Fallout 4,The Forest,Black Ops III, Rise of Tomb Raider,AC:Syndicate,Elex,Hitman,Star Wars,DayZ - are any of those DX12 ? Those are just the ones that I'd like to try out, there are much more launching this year/2016.

Quote:


> Originally Posted by *Mahigan*
> 
> I find it funny that this is turning out to be a massive issue, people are returning their GTX 980 Ti cards when we don't yet have a response from nVIDIA. We still have the folks working hard at Beyond3D to see whether or not the information we uncovered in this thread is true.


Like this dude says:
Quote:


> Originally Posted by *Kpjoslee*
> 
> Sadly, End justifies the means in the tech world. 970 fiasco was largely irrelevant unless ran in very extreme scenarios. If the performance ends up fine with it, it wouldnt mean much, and that is what I think might end up happening anyways.


----------



## Dudewitbow

Quote:


> Originally Posted by *Klocek001*
> 
> Fallout 4,The Forest,Black Ops III, Rise of Tomb Raider,AC:Syndicate,Elex,Hitman,Star Wars,DayZ - are any of those DX12 ? Those are just the ones that I'd like to try out, there are much more launching this year/2016.


none are directly confirmed. Rise of Tomb raider will likely due to microsoft publishing, Hitman may probably get it if AMD pushes for it(if the game actually needs it). Battlefront will likely get something similar(not necessarily day one) as vulkan and DX12 are being built into frostbyte 3(which the statement applies to the other upcoming frostbyte 3 games such as mirrors edge catalyst and NFS)


----------



## PostalTwinkie

Quote:


> Originally Posted by *Dudewitbow*
> 
> none are directly confirmed. Rise of Tomb raider will likely due to microsoft publishing, Hitman may probably get it if AMD pushes for it(if the game actually needs it). Battlefront will likely get something similar(not necessarily day one) as vulkan and DX12 are being built into frostbyte 3(which the statement applies to the other upcoming frostbyte 3 games such as mirrors edge catalyst and NFS)


I remember reading that Battlefront will get DX 12 support, but I believe post launch.


----------



## Mahigan

Quote:


> Originally Posted by *glr123*
> 
> That's me!
> 
> So do you think there could be any other explanations for the slow latency on the GCN cards? I thought I read somewhere that the compute command goes CPU -> GPU -> CPU, is that what is going on? Could this sequence just be faster on nvidia, and it isn't a latency effect we ever experience because that isn't a typical compute process during gaming?
> 
> (pardon my ignorance!!)


It could be the sequence they're running and sending to the Graphics card. It is hard to say because the author hasn't made his test Open Source, which quite frankly he should have done.


----------



## Mahigan

Quote:


> Originally Posted by *Dudewitbow*
> 
> none are directly confirmed. Rise of Tomb raider will likely due to microsoft publishing, Hitman may probably get it if AMD pushes for it(if the game actually needs it). Battlefront will likely get something similar(not necessarily day one) as vulkan and DX12 are being built into frostbyte 3(which the statement applies to the other upcoming frostbyte 3 games such as mirrors edge catalyst and NFS)


Tomb Raider has been confirmed and will use Async Compute. Hitman is confirmed as well. Battlefront may not launch with DX12 but will support by a patch. Deux Ex is confirmed, Fable Legends is confirmed... I had a list done up a few threads back with confirmed DX12 and Async compute titles... all of them major titles.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> To lighten up the thread...
> 
> 
> 
> 
> A little humor
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I find it funny that this is turning out to be a massive issue, people are returning their GTX 980 Ti cards when we don't yet have a response from nVIDIA. We still have the folks working hard at Beyond3D to see whether or not the information we uncovered in this thread is true.
> 
> I'm still being patient. I'm not willing to commit to a conclusion just yet.


um, you are responsible for adding fuel to the fire, don't you think? Putting strong conclusive words in what is otherwise yet to be proven hypothesis is usually strong recipe for massive thread war. And you say "people" are returning their 980Ti cards when all I see is one guy from Hardforum pondering about returning 980ti and save some money for 290x and wait out instead. Usually that kind of exaggeration is what invites fanboys lol.


----------



## Mahigan

If this is all true, this is why Asynchronous Compute matters:

Mirror's Edge Catalyst will be released on *February 23, 2016* for Xbox One, PS4, and PC.
Quote:


> In order to fit a slew of heavy processing tasks into its tight budgets, DICE has employed new techniques specific to modern graphics processors and the new console generations. *By taking advantage of Asynchronous Compute*, the developer is now capable of reaching new levels of in-depth optimizations, with which it has been able to squeeze more work out of the graphics pipeline.


Read more: http://www.vcpost.com/articles/87174/20150826/mirrors-edge-catalyst-boasts-advanced-rendering-techniques-reflection-technologies-glass-city.htm#ixzz3kSkxnueB

Rise of the Tomb Raider Q1 2016
Quote:


> Of all the rendering techniques used in the game, the most fascinating is its *use of asynchronous compute* for the generation of advanced volumetric lights. For this purpose, the developer has employed a resolution-agnostic voxel method, which allows volumetric lights to be rendered using asynchronous compute after the rendering of shadows, with correctly handled transparency composition.


Read more: http://gearnuke.com/rise-of-the-tomb-raider-uses-async-compute-to-render-breathtaking-volumetric-lighting-on-xbox-one/

Deus Ex: Mankind Divided Q1 2016
Quote:


> Deus Ex: Mankind Divided to use *async compute to enhance Pure Hair simulation*. During its SIGGRAPH 2015 presentation, Eidos Montreal revealed that Deus Ex: Mankind Divided will be the first title to make use of Pure Hair technology for hair simulation. For the uninitiated, Pure Hair is the successor to AMD's TressFX technology, which was first seen in Tomb Raider. The new hair solution has been created in collaboration between AMD and Eidos Montreal's research and development lab.


Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/

Just three titles on the way. That's without mentioning Fable Legends and others...

That's why this is a very big deal, if all this is true, because Pascal won't arrive before, early estimates, Q2 2016.

Quote:


> Originally Posted by *Kpjoslee*
> 
> um, you are responsible for adding fuel to the fire, don't you think? Putting strong conclusive words in what is otherwise yet to be proven hypothesis is usually strong recipe for massive thread war. And you say "people" are returning their 980Ti cards when all I see is one guy from Hardforum pondering about returning 980ti and save some money for 290x and wait out instead. Usually that kind of exaggeration is what invites fanboys lol.


I don't think it is fuel to the fire, that's why I wrote my commentary at the bottom. This is how people are acting and I find it to be quite funny considering we don't yet have any info from nVIDIA.

If anything, nVIDIA are responsible for the way people are acting by not releasing a press Statement in order to alleviate peoples fears. Fears will just keep growing until they do. I've said this from day one and I'll keep saying it... nVIDIA need to respond. By ignoring this, they're adding fuel to the fire.

Reddit threads are filled with people returning their cards, ExtremeTech's comment section as well as Guru3D. Youtube also has several videos with people thinking about returning their cards. I've been across the tech forums, reading various threads on this topic... it is far more prevalent than you think.


----------



## Klocek001

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *Mahigan*
> 
> If this is all true, this is why Asynchronous Compute matters:
> 
> Mirror's Edge Catalyst will be released on *February 23, 2016* for Xbox One, PS4, and PC.
> Read more: http://www.vcpost.com/articles/87174/20150826/mirrors-edge-catalyst-boasts-advanced-rendering-techniques-reflection-technologies-glass-city.htm#ixzz3kSkxnueB
> 
> Rise of the Tomb Raider Q1 2016
> Read more: http://gearnuke.com/rise-of-the-tomb-raider-uses-async-compute-to-render-breathtaking-volumetric-lighting-on-xbox-one/
> 
> Deus Ex: Mankind Divided Q1 2016
> Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/
> 
> Just three titles on the way. That's without mentioning Fable Legends and others...
> 
> That's why this is a very big deal because Pascal won't arrive before, early estimates, Q2 2016.





yikes


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> If this is all true, this is why Asynchronous Compute matters:
> 
> Mirror's Edge Catalyst will be released on *February 23, 2016* for Xbox One, PS4, and PC.
> Read more: http://www.vcpost.com/articles/87174/20150826/mirrors-edge-catalyst-boasts-advanced-rendering-techniques-reflection-technologies-glass-city.htm#ixzz3kSkxnueB
> 
> Rise of the Tomb Raider Q1 2016
> Read more: http://gearnuke.com/rise-of-the-tomb-raider-uses-async-compute-to-render-breathtaking-volumetric-lighting-on-xbox-one/
> 
> Deus Ex: Mankind Divided Q1 2016
> Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/
> 
> Just three titles on the way. That's without mentioning Fable Legends and others...
> 
> That's why this is a very big deal, if all this is true, because Pascal won't arrive before, early estimates, Q2 2016.
> I don't think it is fuel to the fire, that's why I wrote my commentary at the bottom. This is how people are acting and I find it to be quite funny considering we don't yet have any info from nVIDIA.
> 
> If anything, nVIDIA are responsible for the way people are acting by not releasing a press Statement in order to alleviate peoples fears. Fears will just keep growing until they do. I've said this from day one and I'll keep saying it... nVIDIA need to respond. By ignoring this, they're adding fuel to the fire.
> 
> Reddit threads are filled with people returning their cards, ExtremeTech's comment section as well as Guru3D. Youtube also has several videos with people thinking about returning their cards. I've been across the tech forums, reading various threads on this topic... it is far more prevalent than you think.


You didn't mention the part where the uses you are listing for ASC have alternatives that work as well.

In other words; more than one way to skin a cat. That, of course, being true if the statements I read yesterday are true.

Time shall tell!


----------



## Klocek001

Quote:


> Originally Posted by *Mahigan*
> 
> we don't yet have a response from nVIDIA.


"we lied."

nah, don't take this seriously







but it's like nvidia's trying to bill a consumer twice (980 and 980ti) before giving us a real dx12 card with async compute that Pascal might be.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> If this is all true, this is why Asynchronous Compute matters:
> 
> Mirror's Edge Catalyst will be released on *February 23, 2016* for Xbox One, PS4, and PC.
> Read more: http://www.vcpost.com/articles/87174/20150826/mirrors-edge-catalyst-boasts-advanced-rendering-techniques-reflection-technologies-glass-city.htm#ixzz3kSkxnueB
> 
> Rise of the Tomb Raider Q1 2016
> Read more: http://gearnuke.com/rise-of-the-tomb-raider-uses-async-compute-to-render-breathtaking-volumetric-lighting-on-xbox-one/
> 
> Deus Ex: Mankind Divided Q1 2016
> Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/
> 
> Just three titles on the way. That's without mentioning Fable Legends and others...
> 
> That's why this is a very big deal, if all this is true, because Pascal won't arrive before, early estimates, Q2 2016.
> I don't think it is fuel to the fire, that's why I wrote my commentary at the bottom. This is how people are acting and I find it to be quite funny considering we don't yet have any info from nVIDIA.
> 
> If anything, nVIDIA are responsible for the way people are acting by not releasing a press Statement in order to alleviate peoples fears. Fears will just keep growing until they do. Reddit threads are filled with people returning their cards, ExtremeTech's comment section as well as Guru3D. Youtube also has several videos with people thinking about returning their cards. I've been across the tech forums, reading various threads on this topic... it is far more prevalent than you think.


But the point is, async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader. I think your hypothesis would be more relevant in a situation where the game engine fully utilizes async compute for every aspect of shaders, but from Ashes of the Singularity to titles coming out in Q1, they are only using async to the extent that it wouldn't be much advantageous as you would want people to believe that was the case.

For Reddit, they are always known for doing knee jerk reactions in almost every issue, and Youtube just wants your clicks lol.

For me, this looks like one of those situation where this might end up becoming not much of a deal in next year or two.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Kpjoslee*
> 
> But the point is, async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader. I think your hypothesis would be more relevant in a situation where the game engine fully utilizes async compute for every aspect of shaders, but from Ashes of the Singularity to titles coming out in Q1, they are only using async to the extent that it wouldn't be much advantageous as you would want people to believe that was the case.
> 
> For Reddit, they are always known for doing knee jerk reactions in almost every issue, and Youtube just wants your clicks lol.
> 
> For me, this looks like one of those situation where this might end up becoming not much of a deal in next year or two.


No problem with Pascal







.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> You didn't mention the part where the uses you are listing for ASC have alternatives that work as well.
> 
> In other words; more than one way to skin a cat. That, of course, being true if the statements I read yesterday are true.
> 
> Time shall tell!


That's what Oxide did, if Oxide hadn't worked closely with nVIDIA we'd be seeing even lower performance under Ashes of the Singularity. The performance we see is due to the Vendor ID specific path Oxide implemented in conjunction with the Optimized Shader code nVIDIA provided them with.

Therefore Maxwell/2 could still play the games... but the performance hit might be quite noticeable. Only time will tell for sure but I think we're getting ahead of ourselves. We still don't have a definitive conclusion from either nVIDIA or Beyond3D (or any other source).


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kpjoslee*
> 
> But the point is, async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader. I think your hypothesis would be more relevant in a situation where the game engine fully utilizes async compute for every aspect of shaders, but from Ashes of the Singularity to titles coming out in Q1, they are only using async to the extent that it wouldn't be much advantageous as you would want people to believe that was the case.
> 
> For Reddit, they are always known for doing knee jerk reactions in almost every issue, and Youtube just wants your clicks lol.
> 
> For me, this looks like one of those situation where this might end up becoming not much of a deal in next year or two.


I kind of expect the big DX12 push to be with Pascal and AMD's next big release. Should be when we see this ASC stuff come to life in full view.

Any game dev around that can comment on how easy of a move it is to DX12? Heard it was "easy", but we have all heard a lot lately.


----------



## UtopiA

Quote:


> Originally Posted by *Mahigan*
> 
> Reddit threads are filled with people returning their cards, ExtremeTech's comment section as well as Guru3D. Youtube also has several videos with people thinking about returning their cards. I've been across the tech forums, reading various threads on this topic... it is far more prevalent than you think.


How? I talked to Newegg yesterday and they aren't giving refunds because nothing is officially confirmed. In their eyes a bunch of forum posts wasn't enough to get a reaction from them. Maybe if major tech sites start running tests and come to the same conclusion, they will change their mind. My 980 Ti is still within the 30-day window and they won't even give me a refund because they don't offer refunds on video cards.

Ripping a system apart over a single Alpha stage benchmark and hearsay on internet forums is insane. If this all turns out to be fluff then I hope people will take some time and realize how much damage these pitchfork-tier over reactions can cause. Sort of like how leading up to Fury X's launch, 4 GB VRAM was going to cripple the card right? And nothing came of that. We had about 2 months of VRAM outrage earlier this summer and it ended up fizzling out. First the Fury X was crippled with 4 GB VRAM, then it launched and turned out to be a weak performer, and now today it's a gift from the DX12 gods? People can't even make up their minds. A week from today maybe the story will be flipped again?!

Would love to hear what Nvidia has to say about this though.


----------



## Klocek001

Quote:


> Originally Posted by *Kpjoslee*
> 
> But the point is, async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader.


AMD's gameworks ?


----------



## HalGameGuru

Quote:


> Originally Posted by *Kpjoslee*
> 
> But the point is, async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader. I think your hypothesis would be more relevant in a situation where the game engine fully utilizes async compute for every aspect of shaders, but from Ashes of the Singularity to titles coming out in Q1, they are only using async to the extent that it wouldn't be much advantageous as you would want people to believe that was the case.
> 
> For Reddit, they are always known for doing knee jerk reactions in almost every issue, and Youtube just wants your clicks lol.
> 
> For me, this looks like one of those situation where this might end up becoming not much of a deal in next year or two.


As its been said that it is possible, but not easy ATM, to completely render graphics through compute I would actually like to see someone take a stab at that with current hardware. Just to see it done and the results.


----------



## sugarhell

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I kind of expect the big DX12 push to be with Pascal and AMD's next big release. Should be when we see this ASC stuff come to life in full view.
> 
> Any game dev around that can comment on how easy of a move it is to DX12? Heard it was "easy", but we have all heard a lot lately.


I havent seen it too much but the jump is not that easy. Developing a game for dx12 is complete different than a dx11 on the mindset.

From the point of reducing the triangles and optimize the meshes of the 3d models for the dx11 draw call limits we must go the point we dont care that much anymore. So it changes the whole development project.

Now as an API you must choose if you really need it. I can see most indie groups to use dx11(or if the graphic engine supports dx12) but the big companies will use dx12. They have capable engineer teams and big enough teams so they can make the jump easily.


----------



## dogen1

Quote:


> Originally Posted by *HalGameGuru*
> 
> As its been said that it is possible, but not easy ATM, to completely render graphics through compute I would actually like to see someone take a stab at that with current hardware. Just to see it done and the results.


That's what media molecule is doing


----------



## GnarlyCharlie

Quote:


> Originally Posted by *UtopiA*
> 
> Ripping a system apart over a single Alpha stage benchmark and hearsay on internet forums is insane.


Not when it comes to something as life threatening as a video card, evidently.


----------



## PostalTwinkie

Quote:


> Originally Posted by *sugarhell*
> 
> I havent seen it too much but the jump is not that easy. Developing a game for dx12 is complete different than a dx11 on the mindset.
> 
> From the point of reducing the triangles and optimize the meshes of the 3d models for the dx11 draw call limits we must go the point we dont care that much anymore. So it changes the whole development project.
> 
> Now as an API you must choose if you really need it. I can see most indie groups to use dx11(or if the graphic engine supports dx12) but the big companies will use dx12. They have capable engineer teams and big enough teams so they can make the jump easily.


Hmm...

Maybe they were saying it was easy to go from Console to DX 12 PC...

Quote:


> Originally Posted by *GnarlyCharlie*
> 
> Not when it comes to something as life threatening as a video card, evidently.


Other than a few fluffs, things have been going pretty good in here. We are having some serious conversation about future technologies and what it means.

What is wrong with that?


----------



## sugarhell

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Hmm...
> 
> Maybe they were saying it was easy to go from Console to DX 12 PC...
> Other than a few fluffs, things have been going pretty good in here. We are having some serious conversation about future technologies and what it means.
> 
> What is wrong with that?


Yeah it is much easier to go from a console API (only PS4 xbones API sucks) to dx12 PC. They can use most of their shaders without a change.


----------



## GnarlyCharlie

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Hmm...
> 
> Other than a few fluffs, things have been going pretty good in here. We are having some serious conversation about future technologies and what it means.
> 
> What is wrong with that?


Not a thing, I was addressing the guy I quoted (you'd need to read the portion of his post that I quoted for context). He mentioned people ripping apart a perfectly viable system and returning video cards over a single benchmark.

My comment was that people were acting as if the async compute deal was somehow life threatening. I remember when I got my GTX 680s, the AMD cards of the day like the 7970 were ruling the benchmark charts. Yet I didn't feel the need to jump off a metaphorical cliff. Maybe setting my hair on fire and RMAing the crap out of those 680s would have been seen as some as the more rational approach, can't say. What video card I run isn't life threatening, to me anyway.

So AMD gets back on top, that's fine by me. Competition is great! I think Nvidia will likely counter at some point, and if I'm running AMD cards when they do, it's doubtful I'll rip those out, either.


----------



## ZealotKi11er

Quote:


> Originally Posted by *sugarhell*
> 
> I havent seen it too much but the jump is not that easy. Developing a game for dx12 is complete different than a dx11 on the mindset.
> 
> From the point of reducing the triangles and optimize the meshes of the 3d models for the dx11 draw call limits we must go the point we dont care that much anymore. So it changes the whole development project.
> 
> Now as an API you must choose if you really need it. I can see most indie groups to use dx11(or if the graphic engine supports dx12) but the big companies will use dx12. They have capable engineer teams and big enough teams so they can make the jump easily.


Considering big companies like EA which use same engine across many games it would make it much easier to get DX12 out.


----------



## sugarhell

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Considering big companies like EA which use same engine across many games it would make it much easier to get DX12 out.


Still the whole projects that will release until early 2017 or late 2016 will be with a dx11 mindset. We will see better performance but the true evolution i believe will be after that.


----------



## PostalTwinkie

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Considering big companies like EA which use same engine across many games it would make it much easier to get DX12 out.


Oh God, we might actually get a benefit from having these Giants? One big step into DX 12!










In the event people aren't aware, I am all about moving to DX 12 in full as soon as possible!


----------



## Kand

Then you have ports like Arkham Knight that were coded using the ps4 version.....


----------



## PontiacGTX

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Oh God, we might actually get a benefit from having these Giants? One big step into DX 12!


even owners o oldf nvidia cards?


----------



## sugarhell

Quote:


> Originally Posted by *Kand*
> 
> Then you have ports like Arkham Knight that were coded using the ps4 version.....


Exactly this. They just use their entirely code from ps4 to the pc version. But the APIs are completely different and the dx11 needs special treatment which they didint do it. They even disabled some shaders effects as you can see the ps4 version has more features and special effects.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kand*
> 
> Then you have ports like Arkham Knight that were coded using the ps4 version.....


Mahigan brought up an interesting point(i think it was him) regarding AK.

What if the poor performance/broken mess was due to Async? If it was ported from the PS4 then it could be that.


----------



## JunkoXan

Quote:


> Originally Posted by *UtopiA*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Mahigan*
> 
> Reddit threads are filled with people returning their cards, ExtremeTech's comment section as well as Guru3D. Youtube also has several videos with people thinking about returning their cards. I've been across the tech forums, reading various threads on this topic... it is far more prevalent than you think.
> 
> 
> 
> _*How? I talked to Newegg yesterday and they aren't giving refunds because nothing is officially confirmed.*_ In their eyes a bunch of forum posts wasn't enough to get a reaction from them. Maybe if major tech sites start running tests and come to the same conclusion, they will change their mind. My 980 Ti is still within the 30-day window and they won't even give me a refund because they don't offer refunds on video cards.
> 
> Ripping a system apart over a single Alpha stage benchmark and hearsay on internet forums is insane. If this all turns out to be fluff then I hope people will take some time and realize how much damage these pitchfork-tier over reactions can cause. Sort of like how leading up to Fury X's launch, 4 GB VRAM was going to cripple the card right? And nothing came of that. We had about 2 months of VRAM outrage earlier this summer and it ended up fizzling out. First the Fury X was crippled with 4 GB VRAM, then it launched and turned out to be a weak performer, and now today it's a gift from the DX12 gods? People can't even make up their minds. A week from today maybe the story will be flipped again?!
> 
> Would love to hear what Nvidia has to say about this though.
Click to expand...

a Microcenter near me did refunds over the weekend over this situation.

People can't make up their minds because of how fast as of right now Computer technology in the Video Card Category is going, from Pascal and DX12 and so on being so close on the Horizon. we'll see.


----------



## Klocek001

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Mahigan brought up an interesting point(i think it was him) regarding AK.
> 
> What if the poor performance/broken mess was due to Async? If it was ported from the PS4 then it could be that.


didn't it run awful on amd too?


----------



## sugarhell

Quote:


> Originally Posted by *Klocek001*
> 
> didn't it run awful on amd too?


Dx11 doesnt support Async


----------



## xxdarkreap3rxx

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Oh God, we might actually get a benefit from having these Giants? One big step into DX 12!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> In the event people aren't aware, I am all about moving to DX 12 in full as soon as possible!


https://en.wikipedia.org/wiki/List_of_Unreal_Engine_games
https://en.wikipedia.org/wiki/List_of_CryEngine_games
https://en.wikipedia.org/wiki/List_of_Frostbite_games

So many games


----------



## Klocek001

Quote:


> Originally Posted by *xxdarkreap3rxx*
> 
> https://en.wikipedia.org/wiki/List_of_Unreal_Engine_games
> https://en.wikipedia.org/wiki/List_of_CryEngine_games
> https://en.wikipedia.org/wiki/List_of_Frostbite_games
> 
> So many games


looking forward to that Plants vs. Zombies: Garden Warfare 2, that will be a game changer.


----------



## dogen1

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Mahigan brought up an interesting point(i think it was him) regarding AK.
> 
> What if the poor performance/broken mess was due to Async? If it was ported from the PS4 then it could be that.


Well, async compute doesn't exist in dx11.
More likely it was issues with streaming. Tons of detail + no load screens + fast batmobile is probably tough to pull off on pc. On console you have lots of goodies to make this easier. Unified memory(faster texture uploads) plus explicit gpu memory management.


----------



## PontiacGTX

Quote:


> Originally Posted by *xxdarkreap3rxx*
> 
> https://en.wikipedia.org/wiki/List_of_Unreal_Engine_games
> https://en.wikipedia.org/wiki/List_of_CryEngine_games
> https://en.wikipedia.org/wiki/List_of_Frostbite_games
> 
> So many games


Only that the devs are the ones who decide which API use,not all CE/UE4 games will use DX12


----------



## GorillaSceptre

Quote:


> Originally Posted by *Klocek001*
> 
> didn't it run awful on amd too?


Quote:


> Originally Posted by *dogen1*
> 
> Well, async compute doesn't exist in dx11.
> More likely it was issues with streaming. Tons of detail + no load screens + fast batmobile is probably tough to pull off on pc. On console you have lots of goodies to make this easier. Unified memory(faster texture uploads) plus explicit gpu memory management.


That might of been why it was such a mess. The game using/designed with Async and then ported to Dx11.

It was unusually messed up. Just a thought.


----------



## Noufel

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Klocek001*
> 
> didn't it run awful on amd too?
> 
> 
> 
> Quote:
> 
> 
> 
> Originally Posted by *dogen1*
> 
> Well, async compute doesn't exist in dx11.
> More likely it was issues with streaming. Tons of detail + no load screens + fast batmobile is probably tough to pull off on pc. On console you have lots of goodies to make this easier. Unified memory(faster texture uploads) plus explicit gpu memory management.
> 
> Click to expand...
> 
> That might of been why it was such a mess. The game using/designed with Async and then ported to Dx11.
> 
> It was unusually messed up. Just a thought.
Click to expand...

why would a dev that makes a game with gimpworks use Async knowing that it will be a mess with nvidia ??


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That might of been why it was such a mess. The game using/designed with Async and then ported to Dx11.
> 
> It was unusually messed up. Just a thought.


That's what I was thinking as a possibility too. Sending many Asynchronous commands over an API not built to handle the load in Parallel would lead to terrible performance. Just a theory though.


----------



## airfathaaaaa

Quote:


> Originally Posted by *PontiacGTX*
> 
> Only that the devs are the ones who decide which API use,not all CE/UE4 games will use DX12


but if nvidia moves to lets say an 80% of dx12 acceptance on pascal how they will be able to provide good numbers on dx11 then? (watching at how amd did it there is no way to provide support in order to be serial and trully paraller at the same time)


----------



## sugarhell

Quote:


> Originally Posted by *Noufel*
> 
> why would a dev that makes a game with gimpworks use Async knowing that it will be a mess with nvidia ??


Because the studio that made the ps4 version didint do the pc port. And they spent little budget to do the pc port. Also they got paid from nvidia (paid not money but mostly dev support for the pc port so they can add these nvidia features)


----------



## GorillaSceptre

Quote:


> Originally Posted by *Noufel*
> 
> why would a dev that makes a game with gimpworks use Async knowing that it will be a mess with nvidia ??


Quote:


> Originally Posted by *Mahigan*
> 
> That's what I was thinking as a possibility too. Sending many Asynchronous commands over an API not built to handle the load in Parallel would lead to terrible performance. Just a theory though.


Quote:


> Originally Posted by *sugarhell*
> 
> Because the studio that made the ps4 version didint do the pc port. And they spent little budget to do the pc port. Also they got paid from nvidia (paid not money but mostly dev support for the pc port so they can add these nvidia features)


----------



## PontiacGTX

Quote:


> Originally Posted by *airfathaaaaa*
> 
> but if nvidia moves to lets say an 80% of dx12 acceptance on pascal how they will be able to provide good numbers on dx11 then? (watching at how amd did it there is no way to provide support in order to be serial and trully paraller at the same time)


because AMD has less budget/workers to optimize all the DX11 games..

Really GCN can manage both well only that Nvidia started to get more devs on their side which might use nvidia optimized shaders or just with gameworks crippling card wiith less tessellation than Maxwell.

like Mahigan said


----------



## delboy67

People are returning cards? (facepalm) anyone want to sell me a cheap 980ti? People love a good fanboy war/pitch fork session, I'm not sure that was reason for this thread and probably the reason we get next to no info or responses from devs etc like this on other games and technologies!


----------



## Noufel

Quote:


> Originally Posted by *sugarhell*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> why would a dev that makes a game with gimpworks use Async knowing that it will be a mess with nvidia ??
> 
> 
> 
> Because the studio that made the ps4 version didint do the pc port. And they spent little budget to do the pc port. Also they got paid from nvidia (paid not money but mostly dev support for the pc port so they can add these nvidia features)
Click to expand...

Didn't know that it was another studio that made the pc port








Nvidia and its shady tactics i think i will sell my second 980ti ( just like the first one )and use my 2x290 spare cards just for principles .


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Mahigan brought up an interesting point(i think it was him) regarding AK.
> 
> What if the poor performance/broken mess was due to Async? If it was ported from the PS4 then it could be that.


Nah, nothing to do with it. They just released PC version out the door when they clearly knew it was not ready. Reason why they postponed all the PC sales until they fix the issues.


----------



## sugarhell

Quote:


> Originally Posted by *Noufel*
> 
> Didn't know that it was another studio that made the pc port
> 
> 
> 
> 
> 
> 
> 
> 
> Nvidia and its shady tactics i think i will sell my second 980ti ( just like the first one )and use my 2x290 spare cards just for principles .


I dont think that nvidia use shady tactics that much just pushing their power to the devs more. Maybe they use overtesselation that also hurts their cards but most of their features you can simply turn them off.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kpjoslee*
> 
> Nah, nothing to do with it. They just released PC version out the door when they clearly knew it was not ready. Reason why they postponed all the PC sales until they fix the issues.


Oh, they definitely rushed it out, but Iron Galaxy handled the Arkham Origins port too, and with a similar time crunch. They handled that port fine (not great, but not Arkham Knight bad). Why is the patch taking so long? I think there's a lot more to that story.

Does anyone know if Arkham Knight was using Async on PS4? Has RockSteady commented about it at all?


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Oh, they definitely rushed it out, but Iron Galaxy handled the Arkham Origins port too, and with a similar time crunch. They handled that port fine (not great, but not Arkham Knight bad). Why is the patch taking so long? I think there's a lot more to that story.
> 
> Does anyone know if Arkham Knight was using Async on PS4? Has RockSteady commented about it at all?


Not sure if they use async or not, but seems unrelated.

http://kotaku.com/sources-warner-bros-knew-that-arkham-knight-pc-was-a-1714915219


----------



## PostalTwinkie

Quote:


> Originally Posted by *Klocek001*
> 
> looking forward to that Plants vs. Zombies: Garden Warfare 2, that will be a game changer.


Imagine all the pollen VFX we can get with more draw calls!!!!!!


----------



## Kand

Arkham Knight uses Unreal 3 which does not support a shred of async.


----------



## sugarhell

Quote:


> Originally Posted by *Kand*
> 
> Arkham Knight uses Unreal 3 which does not support a shred of async.


A custom Unreal 3. And if the API support a feature you can do it through there easily. You dont need the graphic engine to do it for you if you can do it through the API.


----------



## Xuper

according to slide in Post 1846



The Graphic pipline is *Not* designed for this *abuse*.Who ? Dev or Gamework?


----------



## Kand

Quote:


> Originally Posted by *sugarhell*
> 
> A custom Unreal 3. And if the API support a feature you can do it through there easily. You dont need the graphic engine to do it for you if you can do it through the API.


I find that farfetched.


----------



## Kpjoslee

Quote:


> Originally Posted by *Kand*
> 
> I find that farfetched.


Thief is based on custom Unreal Engine 3. It is not really far fetched. Although I think it is unrelated to the problem of AK.


----------



## sugarhell

Quote:


> Originally Posted by *Kand*
> 
> I find that farfetched.


A graphics engine has nothing to do with API hardware features. Unreal support an implementation of dx11 and can support some features. If you want more performance you need to custom work the API support within the graphics engines. Thats why we have custom engines even if the base is the unreal 3/unity/cryengine. Just take this as an example: Unreal 3 doesnt support ps4 API. So they had already to add support for this API. That means more features so more custom elements to the engine. My point is that by using the logic that unreal 3 doesnt support async you cant claim that arkham knight doesnt support async


----------



## infranoia

Quote:


> Originally Posted by *Kpjoslee*
> 
> But the point is, *async compute is limited post processing effects, which means current engines are far from utilizing fully async capable graphics shader.* I think your hypothesis would be more relevant in a situation where the game engine fully utilizes async compute for every aspect of shaders, but from Ashes of the Singularity to titles coming out in Q1, they are only using async to the extent that it wouldn't be much advantageous as you would want people to believe that was the case.
> 
> For Reddit, they are always known for doing knee jerk reactions in almost every issue, and Youtube just wants your clicks lol.
> 
> For me, this looks like one of those situation where this might end up becoming not much of a deal in next year or two.


Quote:


> Originally Posted by *Kollock*
> 
> [ ... ]*I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process.* Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. For example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


You raise a good point, echoed by Kollock's assessment above. Current engines won't be optimized for async shaders for some time.


----------



## Noufel

http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/
Nice article ( so wccf can do that







)


----------



## Mahigan

More fodder...

On the Beyond3D test...

*AMD's senior graphics PR manager Antal Tungler*
Quote:


> The results produced by the benchmark do, in fact, illustrate that Maxwell is not capable of asynchronously executing graphics and compute. If you look at the Maxwell async compute results, you will see that the bar heights are the result of adding graphics and compute together. This indicates that the workloads are being done serially, not asynchronously. Compare that to the AMD results, where the async compute results show graphics and compute being processed simultaneously with no noticeable rise in overall frame latency.
> 
> "If Maxwell supported asynchronous compute, their results would look like the GCN results. Remember that asynchronous compute isn't whether or not a GPU can do compute and graphics across a long workload, it's whether or not the GPU can perform these workloads simultaneously without affecting the frame latency. MDolenc's benchmark clearly shows that only GCN can do this.


Source: http://www.pcgamesn.com/amd-respond-to-nvidia-dx12-async-controversy-maxwell-is-not-capable-of-asynchronously-executing-graphics-and-compute

PS. This is what I stated over at HardForums. nVIDIA can perform the tasks synchronously (in order) but not Asynchronously (out of order). Anyone who looks at the block diagrams of both GPUs can clearly deduce this.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> More fodder...
> 
> On the Beyond3D test...
> 
> *AMD's senior graphics PR manager Antal Tungler*
> Source: http://www.pcgamesn.com/amd-respond-to-nvidia-dx12-async-controversy-maxwell-is-not-capable-of-asynchronously-executing-graphics-and-compute
> 
> PS. This is what I stated over at HardForums. nVIDIA can perform the tasks synchronously (in order) but not Asynchronously (out of order). Anyone who looks at the block diagrams of both GPUs can clearly deduce this.


I think the more important issue is how is that going to reflect in DX12 titles in the near future. Async implementation is still minor, and Async is not a required path in DX12 API. If DX12 titles in near future (notably some few major titles in Q1) definitely shows the performance difference, then we can declare it to be the major issue. But right now, whole exchange about having truly async capable GPU is pointless unless we have more samples proving that to be the case.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Mahigan*
> 
> More fodder...
> 
> On the Beyond3D test...
> 
> *AMD's senior graphics PR manager Antal Tungler*
> Source: http://www.pcgamesn.com/amd-respond-to-nvidia-dx12-async-controversy-maxwell-is-not-capable-of-asynchronously-executing-graphics-and-compute
> 
> PS. This is what I stated over at HardForums. nVIDIA can perform the tasks synchronously (in order) but not Asynchronously (out of order). Anyone who looks at the block diagrams of both GPUs can clearly deduce this.


So this is good for VR?


----------



## p4inkill3r

Quote:


> Originally Posted by *ZealotKi11er*
> 
> So this is good for VR?


They've been claiming that VR is in their cone of focus for awhile now, I'm assuming that it is.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> So this is good for VR?


It allows for much lower latency which is required for VR. That being said... a good VR experience will likely require far more powerful Graphics cards than those available today. So if this is all true, nVIDIA have ample time to catch up.


----------



## Paul17041993

Quote:


> Originally Posted by *GorillaSceptre*
> 
> 
> 
> *IF* this turns out to be true, then people should be allowed to return them imo. That's false advertising. Legally they'll probably get away with it as they do technically support it, but it gives worse performance with it on
> 
> 
> 
> 
> 
> 
> 
> Reminds me of the 970 fiasco, technically there's 4GB, but.. well you know.
> 
> Makes you wonder if they do _really_ support some of the DX12 features too..


notice how it says async compute, but doesn't specifically mention graphics...


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> It allows for much lower latency which is required for VR. That being said... a good VR experience will likely require far more powerful Graphics cards than those available today. So if this is all true, nVIDIA have ample time to catch up.


I wouldn't say Nvidia is much behind in that aspect. They have been working on their own set of APIs for quite a while. They do have their own feature set to tackle on latency. Truly great VR experience will be years away I agree.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Paul17041993*
> 
> notice how it says async compute, but doesn't specifically mention graphics...


Similar to saying "notice how it says 4Gb vram, but doesn't specifically mention fast vram"









Which is why i said they'll probably be fine legally. It's still slimy though, *If*.


----------



## Tojara

Quote:


> Originally Posted by *Kpjoslee*
> 
> I wouldn't say Nvidia is much behind in that aspect. They have been working on their own set of APIs for quite a while. They do have their own feature set to tackle on latency. Truly great VR experience will be years away I agree.


It might not be that far off. There are a lot of companies working on it and 14nm could push the graphics card where they need to be. Consumable content might be the holdup as usual.


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/
> Nice article ( so wccf can do that
> 
> 
> 
> 
> 
> 
> 
> )


They did like I did at first. They looked at the documentation. According to the documentation, Maxwell 2.0 supports Async Compute/Shaders through HyperQ. It is very hard to find documentation on Maxwell 2 because there is no White paper published by nVIDIA. This is why I used the Kepler papers, wccf did the same.

The problem is that Beyond3D's test has shown that Maxwell 2 isn't using Mixed mode at all. It is behaving like Maxwell instead. Maxwell doesn't support Mixed mode. The Oxide developer stated that, as far as he knew, Maxwell couldn't do Async.

The controversy has grown, rather than shrunk, with the tests Beyond3D are doing because now their findings are point towards the ability of Maxwell 2 to handle compute loads Asynchronously (32 Compute) but not Graphics + Compute (31 Compute + 1 Graphics). This is what is most interesting thus far. I don't think WCCFtech has grasped just what is happening right now.

As far as a software scheduler, I also mentioned this in relation to Kepler. One of the big changes between Kepler and Fermi was the removal of a Hardware scheduler. This is why Kepler/Maxwell and Maxwell 2 use up less power (not because they're designed more efficiently per-say). By placing the scheduler in software, you can fine tune the scheduler using the driver. This is why nVIDIA has far more leg room under DX11 to fine tune the driver and derive a boost in performance. GCN, on the other hand, relies on a Hardware scheduler. A Hardware Scheduler is better for DX12, because the API is closer to metal leaving less room for shader replacements and other forms of driver intervention.

What we have with Kepler/Maxwell/Maxwell 2 are cards which are fine tuned for DX11. What we have with GCN are cards fine tuned for Vulcan, Mantle and DX12.

Now while nVIDIAs Maxwell/2 cards can support more DX12 features (the same was true of the GeForce 6800 Ultra) that doesn't necessarily translate into better performance. The nVIDIA architectures lack the Compute Parallelism performance now unlocked by the new APIs. For the first DX12 titles, this might not be too much of a problem, but Pascal would need to be a completely revamped architecture on this front. We saw nVIDIA take the first steps with Maxwell/2 towards that direction. AMD, on the other hand, are set to strike with an architecture which will further boost their lead in this area.

We can't speculate as to whether Greenland or Pascal will be better, but we can mention that AMD have far less architectural changes needed in order to derive incredible Vulcan and DX12 performance. nVIDIA, on the other hand, needs a huge overhaul of its architecture in order to achieve the same result.


----------



## Kand

Quote:


> Originally Posted by *Mahigan*
> 
> What we have with GCN are cards fine tuned for Vulcan, Mantle and DX12.


I wouldn't go so far as to call them "fine tuned". More like, better compatible.


----------



## Mahigan

Quote:


> Originally Posted by *Kand*
> 
> I wouldn't go so far as to call them "fine tuned". More like, better compatible.


Fair enough


----------



## provost

Quote:


> Originally Posted by *Kand*
> 
> I wouldn't go so far as to call them "fine tuned". More like, better compatible.


I guess since he is using the term "fine tune" in reference to the drivers, so "compatible" may be a better term to describe hardware "harmony " with DX 12.
However, if AMD's hardware has better compatibility with DX12, and the results based on this one benchmark are representative of the performance increase over DX 11 for all DX 12 games, then one would think that "refinement" of drivers would only bring further performance increases (?). I guess I would put this as a ? to both you and Mahigan.


----------



## SlackerITGuy

Best case scenario for all of this would be:

- NVIDIA circumvents this issue altogether by clever programming driver side and/or by working extremely close with game devs and at the same time ramps up Pascal's development so it launches Q1 2016 lol.


----------



## UtopiA

What can they "ramp up"? They are at the whim of Hynix + TSMC.


----------



## infranoia

Well, DX12 as an API is bare-metal. Natively there isn't much room for shenanigans as in DX11 with all its driver-based shader recompilations. More than likely there are optimizations still possible within the Maxwell software scheduler, but at the end of the day it's going to have to try to fit an elephant into a Ferrari. It's not going to be pretty, what Nvidia's scheduler is going to do to that poor elephant.


----------



## ToTheSun!

Quote:


> Originally Posted by *SlackerITGuy*
> 
> NVIDIA [...] ramps up Pascal's development so it launches Q1 2016 lol.


Man, that would be sweet.


----------



## Paul17041993

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Similar to saying "notice how it says 4Gb vram, but doesn't specifically mention fast vram"
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Which is why i said they'll probably be fine legally. It's still slimy though, *If*.


exactly


----------



## Mahigan

I seem to remember mentioning this last night...

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-18#post-1869645


Quote:


> a single work item runs each SIMD at 1/4 throughput over time


----------



## Paul17041993

are they seriously expecting a single thread running a for-loop to be even remotely fast on a GPU...?


----------



## Mahigan

Quote:


> Originally Posted by *Paul17041993*
> 
> are they seriously expecting a single thread running a for-loop to be even remotely fast on a GPU...?


At first their goal was to test for Async Shading... when Maxwell 2 couldn't do this... they resorted to turning it into a performance benchmark. This, to me, appears to be an attempt to divert from the argument. Many of those people started off by attacking the messenger, which was me for a time, when that didn't work, and the message gained traction, they figured they'd discredit the message. Now that this hasn't worked, they figure they'll find a way to prove that Maxwell 2 can perform so well at serial tasks, that its lack of parallelism is a moot point.

The problem is, they don't appear to know what they're doing. Sebbi chimed in but they quickly brushed him off (Sebbi knows how to code for GCN).

So now its sort of like an attempt to save face. They're actually extending the soap opera and drama surrounding this. Can Maxwell 2 perform Asynchronous Compute? According to their test... No.

End of discussion.

The nVIDIA response is all that is left now. I think it will hit tomorrow.


----------



## Forceman

I don't think they were trying to turn it into a performance test, they were just trying to discern whether the test was actually testing what they thought it was testing. But since they don't really know what they are doing, it isn't going well.


----------



## pengs

Quote:


> Originally Posted by *infranoia*
> 
> More than likely there are optimizations still possible within the Maxwell software scheduler, but at the end of the day it's going to have to try to fit an elephant into a Ferrari. It's not going to be pretty, what Nvidia's scheduler is going to do to that poor elephant.


Good analogy








Quote:


> Originally Posted by *Mahigan*
> 
> At first their goal was to test for Async Shading... when Maxwell 2 couldn't do this... they resorted to turning it into a performance benchmark. This, to me, appears to be an attempt to divert from the argument. Many of those people started off by attacking the messenger, which was me for a time, when that didn't work, and the message gained traction, they figured they'd discredit the message. Now that this hasn't worked, they figure they'll find a way to prove that Maxwell 2 can perform so well at serial tasks, that its lack of parallelism is a moot point.
> 
> The problem is, they don't appear to know what they're doing. Sebbi chimed in but they quickly brushed him off (Sebbi knows how to code for GCN).
> 
> So now its sort of like an attempt to save face. They're actually extending the soap opera and drama surrounding this. Can Maxwell 2 perform Asynchronous Compute? According to their test... No.
> 
> End of discussion.
> 
> The nVIDIA response is all that is left now. I think it will hit tomorrow.


The problem is that NVIDIA has already encapsulated the early unoptimized demonstration into a pill for it's supporters to swallow with that little P.S.A. they made.


----------



## provost

I don't think Nvidia's non response is due to Nvidia being worried about whiny Maxwell owners, just as Nvidia wasn't worried about the whiny 970 owners, whiny Kepler owners, etc. Everyone who buys a GPU these days should be aware of caveat emptor, and it is not Nvidia's job to alert every customer of what's around the corner, or interpret its marketing material (this should be the job of the tech journalist sites, but we have already established that's not happening because no one has any incentive to poke the golden goose) And, quite frankly everyone buying a gpu these days should be doing their own due diligence if they really feel strongly about this entertainment hobby.

I think Nvidia is quiet because it does not want to tip its hand on what's coming next and how they are planning to solve this DX12 riddle.

From a strategic perspective, I can at least understand that, otherwise Nvidia's silence just seems odd.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> At first their goal was to test for Async Shading... when Maxwell 2 couldn't do this... they resorted to turning it into a performance benchmark. This, to me, appears to be an attempt to divert from the argument. Many of those people started off by attacking the messenger, which was me for a time, when that didn't work, and the message gained traction, they figured they'd discredit the message. Now that this hasn't worked, they figure they'll find a way to prove that Maxwell 2 can perform so well at serial tasks, that its lack of parallelism is a moot point.
> 
> The problem is, they don't appear to know what they're doing. Sebbi chimed in but they quickly brushed him off (Sebbi knows how to code for GCN).
> 
> So now its sort of like an attempt to save face. They're actually extending the soap opera and drama surrounding this. Can Maxwell 2 perform Asynchronous Compute? According to their test... No.
> 
> End of discussion.
> 
> The nVIDIA response is all that is left now. I think it will hit tomorrow.


I apologize for asking what may sound silly, still heavily medicated from a surgery, but....

What would the original Titan look like in this situation? Since it is compute heavy.

Or am I talking two entirely different forms of compute?


----------



## Forceman

Quote:


> Originally Posted by *provost*
> 
> I don't think Nvidia's non response is due to Nvidia being worried about whiny Maxwell owners, just as Nvidia wasn't worried about the whiny 970 owners, whiny Kepler owners, etc. Everyone who buys a GPU these days should be aware of caveat emptor, and it is not Nvidia's job to alert every customer of what's around the corner, or interpret its marketing material (this should be the job of the tech journalist sites, but we have already established that's not happening because no one has any incentive to poke the golden goose) And, quite frankly everyone buying a gpu these days should be doing their own due diligence if they really feel strongly about this entertainment hobby.
> 
> I think Nvidia is quiet because it does not want to tip its hand on what's coming next and how they are planning to solve this DX12 riddle.
> 
> From a strategic perspective, I can at least understand that, otherwise Nvidia's silence just seems odd.


Or they may be waiting for some other game/bench they know is coming that may portray things in a different light, and draw attention away from this.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I apologize for asking what may sound silly, still heavily medicated from a surgery, but....
> 
> What would the original Titan look like in this situation? Since it is compute heavy.
> 
> Or am I talking two entirely different forms of compute?


The new Maxwell architecture improved upon the older Kepler derived Titan in terms of compute efficiency. At least that's what nVIDIA claims.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Forceman*
> 
> Or they may be waiting for some other game/bench they know is coming that may portray things in a different light, and draw attention away from this.


Which is reasonable, this is one early title.

The only reason it made this big of a stink is that Mahigan came into the picture and started talking. Even though several people have "called his BS", no one has actually done that. I have heard "Oh, he is wrong." but not once have I seen someone actually show why. So, I think the scale of this is because the conversation leader doing work. Work that has yet to be contested, at least in this scenario.

But, as even Mah has said, only time will tell for sure!

Quote:


> Originally Posted by *Mahigan*
> 
> The new Maxwell architecture improved upon the older Kepler derived Titan in terms of compute efficiency. At least that's what nVIDIA claims.


I thought they whacked the snot out of compute. Pretty sure they did, because a lot of people flipped that the Titan was the last big Nvidia compute card to consumer.


----------



## Forceman

Titan was big on double precision compute, but I'm pretty sure Maxwell is better at single precision, which is what is used in games (and these tests).


----------



## PostalTwinkie

Quote:


> Originally Posted by *Forceman*
> 
> Titan was big on double precision compute, but I'm pretty sure Maxwell is better at single precision, which is what is used in games (and these tests).


Ah, thanks. That is what I was looking for!


----------



## Mahigan

SilverforceG
Quote:


> Mahigan & I was the OPs that began this whole expose on r/pcgaming, we now have more info, enough to make a judgement call based on programmers on b3d analyzing the results as well was what Oxide & AMD have to say.
> Let me start with this: MAXWELL DOES SUPPORT ASYNC SHADERS/COMPUTE.
> But it software emulates it. The driver is sending the compute workload to the CPU for it to process while the GPU is processing graphics (link below). It's a clever trick to claim feature "support", one that breaks down when a game either needs those CPU cycles or has lot of Async Compute that it floods the CPU causing a massive performance loss.
> This is why Oxide had to disable Async Compute in their test. It would have tank performance even harder.
> http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1400#post_24360916
> just some unfortunate complex interaction between software scheduling trying to emmulate it which appeared to incure some heavy CPU costs
> For a basic understanding of what Async Compute is: https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-16#post-1869561
> 2 Cars are on the road, let's call them Car 1 (Compute) and Car 2 (Graphics). Both cars are trying to go from A - > B.
> The time it takes for Car 1 to travel the journey is 1 hour. The time it takes for Car 2 to travel the journey is 2 hours.
> The question is, how long does it take for both Cars to reach destination B?
> Both Cars can travel on the road together, simultaneously, starting at the same time: 2 hours.
> Only ONE Car can be on the road at once, so Car 1 goes first (order doesn't matter), finishes, then Car 2 starts. Thus, both Cars reach their destination in: 3 hours.
> Minor variations aside, that should be the expected behavior, correct? #1 would therefore be Async Mode, and #2 is not.
> This is the official explanation, please have a look and compare:
> This is Serial Compute (currently DX11): http://images.anandtech.com/doci/9124/Async_DX11_575px.png
> This is Async Compute: http://images.anandtech.com/doci/9124/Async_DX12_575px.png
> It's purpose is to take a serial task and make it parallel so that it can bypass traffic jams in the pipeline. By doing so, graphics is not blocked by compute task (& vice versa), leading to faster performance. The more compute that is used, the faster the performance gains.
> Links: As to the confusion with the b3d program and how to interpret it, it's been resolved here and also at b3d, thanks to Sebbi, ToTTenTranz & others:
> https://forum.beyond3d.com/posts/1869578/
> http://forums.anandtech.com/showpost.php?p=37674878&postcount=820
> http://forums.anandtech.com/showpost.php?p=37675312&postcount=829
> The program only tests for Async Compute function or not, as it currently is, it's not a benchmark tool, this is what confused some people initially, they see lower ms on NV gpus and assume its working, but its the incorrect analysis for the purpose of the tool.
> What does all this mean for us gamers? Firstly, understand a few points:
> DX12 is an amazing API that has great benefits for ALL GPUs due to a few key features which are hardware-agnostic (works on everything): Lower API Overhead (less CPU bottlenecks!), Multi-thread rendering (higher efficiency GPU, reduces shaders idling), multi-adapter support (think CF/SLI sharing vram, giving smoother frametimes).
> Other features such as Async Compute and FL12.1 (which Maxwell 2 has and not GCN) will be game dependent.
> The likely scenario will be if DX12 games use Async Compute, it has to be disabled by devs for NV GPUs, thus, there will be no performance gains. If a lot of compute is used, in a serial pipeline, it will cause traffic jams for graphics, leading to a performance loss.
> If games do not use much compute, it's fine, all GPUs will benefit.
> Now, what if games use FL12.1, will it tank GCN GPUs? No. Because AMD GPUs do not support 12.1 at all, they cannot run the code. The effects will be disabled for AMD GPUs. Think GPU PhysX, AMD users won't enjoy the fancy effects, but Maxwell 2 owners will.
> NV is still being silent on this.




__
https://www.reddit.com/r/3jfgs9/maxwell_does_support_async_compute_but_with_a/


----------



## infranoia

Regarding "saving face" and the async test becoming a benchmark, I think at some point that will be a valid line of inquiry, at least from a competitive viewpoint. B3D reads a bit like a bunch of GameWorks developers beginning a strategy for DX12. Understand that when the dust settles, 3 out of 4 new consumer GPUs are still Nvidias-- and no game studio will want to gimp the majority of their users out of principle. So there will be an effort to find a sweet spot-- the async shader batch size that puts Maxwell in the best possible light in relation to GCN. The stuff at B3D sounds a lot like an early shot at this.


----------



## SpeedyVT

Quote:


> Originally Posted by *Devnant*
> 
> Yeah, but it's not a light test. It's doing up to 1024 instructions (very few shaders go past 500 instructions in games?). You could be right about the driver though.


In a DX11 game.

The 780 ti will have no overlaps. The 750 ti has fewer overlaps and the 980 ti has even more but not complete overlaps. Most of all no overlap is complete this says that there are no asyncronous compute cycle but serialized compute cycle that will occur simutaneously durring a dispatch of the que. An emulated feature.

Possible it could be improved with a driver, but not made truly asyncronous it'll improve the dispatching however this may put more overhead on the CPU making it less efficient in other performances.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> SilverforceG
> 
> __
> https://www.reddit.com/r/3jfgs9/maxwell_does_support_async_compute_but_with_a/


The CPU emulation part is pure speculation. I ran the test on a 960 with a weak quad-core and it never exceeded 1% CPU usage, so it certainly isn't putting much stress on the CPU. And as someone else pointed out, CPU emulation would probably play holy heck with the latency and destroy that even stair step you see with Maxwell.


----------



## Slaughterem

Mahigan something that is different for Fury http://forums.anandtech.com/showpost.php?p=37656793&postcount=204
Quote:


> All newer GCN 1.2 cards have this configuration. There are 4 core ACEs. The two HWS units can do the same work as 4 ACEs, so this is why AMD refer to 8 ACEs in some presentations. The HWS units just smarter and can support more interesting workloads, but AMD don't talk about these right now. I think it has something to do with the HSA QoS feature. Essentially the GCN 1.2 design is not just a efficient multitask system, but also good for multi-user environments.
> 
> Most GPUs are not designed to run more than one program, because these systems are not optimized for latency. They can execute multiply GPGPU programs, but executing a game when a GPGPU program is running won't give you good results. This is why HSA has a graphics preemption feature. These GCN 1.2 GPUs can prioritize all graphics task to provide a low-latency output. QoS is just one level further. It can run two games or a game and a GPGPU app simultaneously for two different users, and the performance/experience will be really good with these HWS units.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> The CPU emulation part is pure speculation. I ran the test on a 960 with a weak quad-core and it never exceeded 1% CPU usage, so it certainly isn't putting much stress on the CPU. And as someone else pointed out, CPU emulation would probably play holy heck with the latency and destroy that even stair step you see with Maxwell.


Did you disable TDR and use the new test?

As for what Sebbi said:
Quote:


> Benchmarking thread groups that are under 256 threads on GCN is not going to lead into any meaningful results, as you would (almost) never use smaller thread groups in real (optimized) applications. I would suspect a performance bug if a kernel thread count doesn't belong to {256, 384, 512}. Single lane thread groups result in less than 1% of meaningful work on GCN. Why would you run code like this on a GPU (instead of using the CPU)? Not a good test case at all. No GPU is optimized for this case.


And he's right:

Slide 12: http://www.slideshare.net/DevCentralAMD/gcn-performance-ftw-by-stephan-hodes


----------



## Mahigan

Quote:


> Originally Posted by *Slaughterem*
> 
> Mahigan something that is different for Fury http://forums.anandtech.com/showpost.php?p=37656793&postcount=204


Not that weird when you consider this:
http://www.extremetech.com/extreme/213278-amds-new-multiuser-gpu-will-slug-it-out-with-nvidias-geforce-grid


----------



## provost

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Ah, thanks. That is what I was looking for!


Good to know that my old Titan Dino's are still good for a bit of binomial and monte carlo here and there, even if Nvidia doesn't want to show em a lot of gaming optimization love...









Quote:


> Originally Posted by *Forceman*
> 
> Or they may be waiting for some other game/bench they know is coming that may portray things in a different light, and draw attention away from this.


Well, if they have to work that hard to just come up with the benchmark for the purposes of deflecting attention, the core issue of what arch will be better for DX12 may yet linger a bit longer while reinforcing Mahigan's case








I was hoping that they would have a new card in the offing as a definitive response, but oh well.... Lol


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> SilverforceG
> 
> __
> https://www.reddit.com/r/3jfgs9/maxwell_does_support_async_compute_but_with_a/


I think your friend had an error where he said
Quote:


> *Async Compute* and FL12.1 (which Maxwell 2 has and not GCN)


Maybe he means
Conservative Rasterization Tier 1, Rasterizer Ordered Views.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> I think your friend had an error where he said
> Maybe he means
> Conservative Rasterization Tier 1, Rasterizer Ordered Views.


Yep, he made a mistake LOL

No biggie, I do the same... we all do.

On another note.. who is this TaintedSquirrel guy?

__
https://www.reddit.com/r/3jfcny/nvidia_directx_12_and_asynchronous_compute_dont/%5B/URL

Is he one of us here at OCN? He posts so much mis-information that I can't keep up with him. The guy is running around everywhere, squirrel name fits, mis-interpreting the benchmark results at Beyond3D.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> Not that weird when you consider this:
> http://www.extremetech.com/extreme/213278-amds-new-multiuser-gpu-will-slug-it-out-with-nvidias-geforce-grid


the HWS on AMD could be the equivalent for AWS on nvidia?


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Did you disable TDR and use the new test?


New and old, but I stopped the new one before the delays got too long. Otherwise I'd still be watching it - it takes forever.
Quote:


> Originally Posted by *provost*
> 
> Well, if they have to work that hard to just come up with the benchmark for the purposes of deflecting attention, the core issue of what arch will be better for DX12 may yet linger a bit longer while reinforcing Mahigan's case


I didn't mean they are coming up with a benchmark, but Ark is supposed to drop soon and that's a Gameworks game, so it could show significantly different results.

And has AMD ever explained why the Fiji-based Fury block diagram has 8 ACEs while the Fiji-based Nano diagram only has 4?



http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/4

Edit: Or was that some kind of leaked slide, because now that I'm looking for it I can't find it on any of the legit tech sites.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> the HWS on AMD could be the equivalent for AWS on nvidia?


HWS?


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> HWS?


Provost linked you to a site where there was a diagram of Fury Nano which shows 4 ACEs and 2 HWS


----------



## Slaughterem

Quote:


> Originally Posted by *Mahigan*
> 
> HWS?


Look at the link I have a few posts up


----------



## Slaughterem

Quote:


> Originally Posted by *Mahigan*
> 
> On another note.. who is this TaintedSquirrel guy?
> 
> __
> https://www.reddit.com/r/3jfcny/nvidia_directx_12_and_asynchronous_compute_dont/%5B/URL
> 
> Is he one of us here at OCN? He posts so much mis-information that I can't keep up with him. The guy is running around everywhere, squirrel name fits, mis-interpreting the benchmark results at Beyond3D.


Look for Hard OCP forum


----------



## Forceman

Quote:


> Originally Posted by *PontiacGTX*
> 
> Provost linked you to a site where there was a diagram of Fury Nano which shows 4 ACEs and 2 HWS


The only place I can find that slide is WCCF. None of the normal tech sites seem to be using it. Anand's preview, for example, uses the normal Fiji slide that is the same as the Fury X.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> Provost linked you to a site where there was a diagram of Fury Nano which shows 4 ACEs and 2 HWS
> 
> 
> Spoiler: Warning: Spoiler!


Ahhh...

I never caught that. Seems it relates to the "Compute Wave Switch" they're talking about. Which would relate to the slide 12 I linked. Sneaky AMD... they've solved a problem in their GCN architecture it would seem.

Read:
Quote:


> Don't switch between compute/rasterization too frequently


Seems to me that those devices will solve that problem. This would allow for far faster context switching. That's interesting.


----------



## error-id10t

Someone just tell me which lolly tastes nicer, red or green?

I know know.. I've tried to keep up ignoring stuff I have no idea about and nobody knows at the moment but we're all hoping this evens the game up for everyone's benefit.


----------



## ZealotKi11er

Quote:


> Originally Posted by *error-id10t*
> 
> Someone just tell me which lolly tastes nicer, red or green?
> 
> I know know.. I've tried to keep up ignoring stuff I have no idea about and nobody knows at the moment but we're all hoping this evens the game up for everyone's benefit.


Red last longer while Green has a stronger initial punch.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> Ahhh...
> 
> I never caught that. Seems it relates to the "Compute Wave Switch" they're talking about. Which would relate to the slide 12 I linked. Sneaky AMD... they've solved a problem in their GCN architecture it would seem.
> 
> Read:
> Seems to me that those devices will solve that problem. This would allow for far faster context switching. That's interesting.


then could be similar to what AWS does in nvidia?


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> then could be similar to what AWS does in nvidia?


AWS's only handle Compute tasks. If what I think AMD has done is true... they'd allow a Nano to switch between contexts (Graphics, Compute, Copy) at an accelerated rate. Maybe one of the reasons for the power reduction of the Nano. They cut down on the ACEs and implemented these devices instead. A lot of people have claimed that 8 ACEs was overkill... this may be the answer to those claims.

We should know on September 3rd, when the Nano reviews start being published.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> AWS's only handle Compute tasks. If what I think AMD has done is true... they'd allow a Nano to switch between contexts (Graphics, Compute, Copy) at an accelerated rate. Maybe one of the reasons for the power reduction of the Nano. They cut down on the ACEs and implemented these devices instead. A lot of people have claimed that 8 ACEs was overkill... this may be the answer to those claims.
> 
> We should know on September 3rd, when the Nano reviews start being published.


I can't believe they'd have redesigned the Fiji core just for Nano, given its likely miniscule sales numbers.


----------



## Paul17041993

Quote:


> Originally Posted by *Forceman*
> 
> And has AMD ever explained why the Fiji-based Fury block diagram has 8 ACEs while the Fiji-based Nano diagram only has 4?


I'd say they just have half of them disabled for binning and/or power saving, seeing as the nano is for low profile systems.

Quote:


> Originally Posted by *error-id10t*
> 
> Someone just tell me which lolly tastes nicer, red or green?
> 
> I know know.. I've tried to keep up ignoring stuff I have no idea about and nobody knows at the moment but we're all hoping this evens the game up for everyone's benefit.


red at this point, I'd add "as usual" but that'd sound a bit fanboishy...


----------



## spacin9

My first question is: What if c-a-t spelled dog?

My second question is: What if AMD delivers this huge increase in performance with async compute, but is still a micro-stuttering mess, as per usual?


----------



## p4inkill3r

Quote:


> Originally Posted by *spacin9*
> 
> What if AMD delivers this huge increase in performance with async compute, but is still a micro-stuttering mess, as per usual?


Seems like it has been a while since you've used an AMD GPU.


----------



## SpeedyVT

Quote:


> Originally Posted by *spacin9*
> 
> My first question is: What if c-a-t spelled dog?
> 
> My second question is: What if AMD delivers this huge increase in performance with async compute, but is still a micro-stuttering mess, as per usual?


Microstuttering is cause of latency and the ques getting jammed up. You won't experience microstuttering in an asyncronous system per say. Unless you exceed the thread load and that's nearly impossible at this point. However the best way to remove microstuttering in a non-async system was just to mask it. Everything including the best NVidia microstuttered.


----------



## FLaguy954

Quote:


> Originally Posted by *sugarhell*
> 
> I havent seen it too much but the jump is not that easy. Developing a game for dx12 is complete different than a dx11 on the mindset.
> 
> From the point of reducing the triangles and optimize the meshes of the 3d models for the dx11 draw call limits we must go the point we dont care that much anymore. So it changes the whole development project.
> 
> Now as an API you must choose if you really need it. I can see most indie groups to use dx11(or if the graphic engine supports dx12) but the big companies will use dx12. They have capable engineer teams and big enough teams so they can make the jump easily.


I feel like more people will have more respect for the devs who choose to take on the challenge of DX12 development. 



, the devs have a lot more responsibility in reducing rendering mistakes and controlling bottlenecks, which is more work.

Similarly to Mantle, DX12 allows a dev to actually see what they have done wrong and are better equipped to handle the problem instead of passing the bulk of the work unto Nvidia or AMD's driver teams.

You can already see a real-world example of this with AMD's latest 15.8 driver where there was no noticeable improvement on AotS/DX12 because the devs have already done most of the optimizations themselves.


----------



## gamervivek

Quote:


> Originally Posted by *Mahigan*
> 
> Yep, he made a mistake LOL
> 
> No biggie, I do the same... we all do.
> 
> On another note.. who is this TaintedSquirrel guy?
> 
> __
> https://www.reddit.com/r/3jfcny/nvidia_directx_12_and_asynchronous_compute_dont/%5B/URL
> 
> Is he one of us here at OCN? He posts so much mis-information that I can't keep up with him. The guy is running around everywhere, squirrel name fits, mis-interpreting the benchmark results at Beyond3D.


He is a [H]Ocp regular and also posts on reddit. He has gone from AMD to 980ti recently, this async compute business has him worried.
Quote:


> Originally Posted by *Forceman*
> 
> New and old, but I stopped the new one before the delays got too long. Otherwise I'd still be watching it - it takes forever.
> I didn't mean they are coming up with a benchmark, but Ark is supposed to drop soon and that's a Gameworks game, so it could show significantly different results.
> 
> And has AMD ever explained why the Fiji-based Fury block diagram has 8 ACEs while the Fiji-based Nano diagram only has 4?
> 
> 
> 
> http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/4
> 
> Edit: Or was that some kind of leaked slide, because now that I'm looking for it I can't find it on any of the legit tech sites.


Dave Baumann replied with a cryptic answer here,

https://forum.beyond3d.com/threads/amd-pirate-islands-r-3-series-speculation-rumor-thread.55600/page-136

So they've probably improved them so that they don't need 8 now.

Or the HWS are doing the ACE duty besides perhaps hardware virtualization.
Quote:


> There are 4 core ACEs. The two HWS units can do the same work as 4 ACEs, so this is why AMD refer to 8 ACEs in some presentations. The HWS units just smarter and can support more interesting workloads, but AMD don't talk about these right now. I think it has something to do with the HSA QoS feature. Essentially the GCN 1.2 design is not just a efficient multitask system, but also good for multi-user environments.


http://forums.anandtech.com/showpost.php?p=37656793&postcount=204


----------



## Clocknut

Quote:


> Originally Posted by *error-id10t*
> 
> Someone just tell me which lolly tastes nicer, red or green?
> 
> I know know.. I've tried to keep up ignoring stuff I have no idea about and nobody knows at the moment but we're all hoping this evens the game up for everyone's benefit.


u play DirectX9-11 games? = Get Nvidia.

U buying GPU for playing games design for PS4/Xbox One? Buy AMD.


----------



## Paul17041993

Quote:


> Originally Posted by *spacin9*
> 
> My first question is: What if c-a-t spelled dog?


catdog
Quote:


> Originally Posted by *spacin9*
> 
> My second question is: What if AMD delivers this huge increase in performance with async compute, but is still a micro-stuttering mess, as per usual?


apart from 2013 crossfire I have no idea what you're talking about...


----------



## SpeedyVT

Quote:


> Originally Posted by *Clocknut*
> 
> u play DirectX9-11 games? = Get Nvidia.
> 
> U buying GPU for playing games design for PS4/Xbox One? Buy AMD.


I would say more that NVidia and AMD have quite different strengths buying either is a good idea but one should consider the intent and use of their card. While current NVidia plays hard on older software and is probably more ideal for anyone want to play older games AMD has a prolong life expectancy with it's support toward DX12. However AMD performance isn't as important as it's fluidity and flexibility to handle the newly adopted API.

However both GPUs have dramatically increased their drawcalls.

Drawcalls is irrelevent when a game utilize asyncronous shaders.

AMD currently should DX12 be popular also provide the better VR experience.


----------



## velocityx

I wonder if we ever gonna see combined cards, nvidia and amd in one rig, dx 12 is supposedly capable of running that well. It would be the perfect rig for a gamer, fury x and 980ti under one hood. ;]


----------



## spacin9

Quote:


> Originally Posted by *Paul17041993*
> 
> catdog
> apart from 2013 crossfire I have no idea what you're talking about...


2014 Crossfire.

Hey look I'm glad to see Crossfire fixed. I'll go with two AMD next gen if it's fixed.

Why don't I think it's fixed? Maybe DX 12 will fix it. Async compute...according to this thread asynchronous workloads on one GPU perhaps gets complicated when two GPU cores have to it in tandem? I dunno.

Just playing devil's advocate.


----------



## Paul17041993

Quote:


> Originally Posted by *spacin9*
> 
> 2014 Crossfire.
> 
> Hey look I'm glad to see Crossfire fixed. I'll go with two AMD next gen if it's fixed.
> 
> Why don't I think it's fixed? Maybe DX 12 will fix it. Async compute...according to this thread asynchronous workloads on one GPU perhaps gets complicated when two GPU cores have to it in tandem? I dunno.
> 
> Just playing devil's advocate.


The microstuttering issue was fixed for DX 10 and 11 particularly for the 7990 back in 2013, so what exactly are you referring to...?

Keeping in mind that spanking new games will always have problems with crossfire and SLI untill they get their profiles.


----------



## spacin9

Quote:


> Originally Posted by *Paul17041993*
> 
> The microstuttering issue was fixed for DX 10 and 11 particularly for the 7990 back in 2013, so what exactly are you referring to...?
> 
> Keeping in mind that spanking new games will always have problems with crossfire and SLI untill they get their profiles.


Yeah... I wanna believe that. But it's not fixed. I have had many Crossfire rigs since XFX 1900 or whatever it was the black dongle. I had 4850 X2, 6970 Crossfire, which was actually okay in some games, 7970 crossfire was pretty much a disaster. I was big into Skyrim and a 7970 stutered with one card, which I attributed to driver cheats, but I don't know. I've just never seen one card microstutter before. And two were even worse. I think the rumor was Tri-Fire would actually get rid of the stuttering and er um.. no. Then of course I had R9 290s, which I very much wanted to keep because they were cheap... and there's a reason why they're cheap because they were hot, and they microsttuered just about every game except Battlefield they were pretty good with that.

I keep looking for those side-by side you tube reviews for Fury X Crossfire vs GTX 980 Ti SLI where they would run say Crysis 3 and you could tell which one was smoother. With the 7970 and frame pacing the stutter was diminished but it was still there.

I run SLI for a reason. It works... most of the time. I've got 150 hours in the Witcher 3 @ 4K with Hairworks on and it's smooth and lovely and I hope Ashes can run the same way.

But this thread does concern me that I bought gimped cards... but on second thought perhaps it was NV who was thinking ahead in the sense that they knew dums dum like me are going to buy two GPUs, maybe if they introduced hardware level async, it would mess up SLI. That's complete conjecture... but you know when you're selling beasts like Titans , they know most are going to want more than one and maybe they figured it would be easier to optimize an architecture that only did like one workload well instead of two workloads that might be harder to synchronize for multi-GPU, which AMD isn't really good at anyway.

Like I said I have no idea. I know this is NGreedia were talking about. But I bought into it... and at least it works. Mostly. I just I hope Ashes works well.. so far I've seen it work well for me with acceptable visuals. If they can get SLI working reasonably, this'll be another game I play for years. Even without SLI... I'm quite sure I'll be enjoying it.


----------



## Paul17041993

Quote:


> Originally Posted by *spacin9*
> 
> -snip-


Well I have no idea, I've never had microstutter ever on various cards including my more modern 7970 and 290X, though neither have I done any crossfire setups. Only thing that pops to mind about your single-card stutter would be that it was unstable, wouldn't surprise me if it were an ASUS.

Oh and multi-GPU compute is completely up to the developer to decide, and for the most part it wouldn't be that hard. DX12 basically makes both crossfire and SLI completely redundant and I don't even think either company will implement automatic scaling to multiple GPUs...


----------



## Klocek001

What does this async compute do in AotS? Is it like 1-2 optional post processing options or an important graphic option ? How does AMD compare to Nvidia with this option disabled ?
Quote:


> Originally Posted by *Clocknut*
> 
> u play DirectX9-11 games? = Get Nvidia.


So that leaves AMD for dx8.1 or earlier


----------



## moey1974

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Well, that is the possible concern. As it appears now, Nvidia isn't invested heavily into this Async Compute situation. We don't know what Pascal will be yet, so we can't say what they are going to do. However, it has been argued that developers will have more control over the actual performance of the game, and less control is given to the actual GPU manufacturer. So if Nvidia decides they want to take another approach to it, whatever those options might be (who knows), we could have two very different philosophies. Maybe even more so than now...
> 
> Why that could all be a potential concern is that if AMD goes heavy support Async Compute, and Nvida does XYZ SomethingForUs, you now have developers with two very clear paths. Do they have the funding to support full development and optimization for both of those unique paths? Did Nvidia, with their trucks of cash, flat buy out a developer?
> 
> If Nvidia and AMD can't make huge impacts with drivers, what happens if two clear paths emerge and a developer takes just one? This isn't even a path of two different APIs going to war, but different paths within a single API.
> 
> It leaves an extreme amount of room for developer bias. If DX12 locks out the GPU manufacturer as much as some claim in terms of performance. We think we see heavy bias now in games, I can't imagine what it would look like if a developer didn't give equal treatment, _and the left out party couldn't make extreme driver improvements on their own_.
> Actually AMD has their own validation requirements and tests they run specific to FreeSync, as Freesync is specific to AMD. What the default DP spec for AdaptiveSync does isn't enough for FreeSync to work as FreeSync is marketed. It requires a hell of a lot of R&D and tuning to get done; Nixeus has commented on this heavily.


Quote:


> Originally Posted by *Remij*
> 
> No. It's too early. People aren't running out and buying AMD gpus at breakneck speeds based off this one benchmark. The people who are claiming this early victory for AMD are more than likely AMD fanboys. I remember well when Mantle was gonna destroy Nvidia in games that supported both Mantle and DX11, and we saw how that turned out. Nvidia's already said to expect the same thing that happened with DX11 to happen with DX12.
> 
> But I'm sure it will come full circle. In the near future once DX12 is out and Nvidia is ahead again, people will cite all the technical reasons why it shouldn't be so and claim Nvidia sabotages their competitors performance with proprietary features/code and their stranglehold on the market..
> 
> It would be cool to see AMD smash the hell out of Nvidia and show them they aren't invincible, but even these early tests aren't painting that picture, so I wouldn't expect it, but would rather be pleasantly surprised if it does happen.


Oh Remi, stop calling people fanboys, so childish dude, you sound like the fanboy for bringing that up.

Listen fella's...as a unbiased gamer that i am, Nvidia might have the truck loads of cash but AMD has been quietly sitting back in a almost "Sleeping Giant" kind of way. The reason why i say this is that a ton of you guys, hell, about 97% of you in here have missed one HUGE thing here. We are talking about technology that is specifically for getting better performance out of videogames...where do 99% of PC AAA games come from? The consoles...AMD hit a home run when they inked that deal with both Microsoft and Sony for the Xone and PS4....all developers will take full advantage of GCN since both systems including the next Nintendo system have AMD GCN Tech under the hood. I suppose you can do the rest of the math here, its not rocket science...but make no mistake about it, what ever consoles dictate..it will have a big impact on the PC side of gaming including what developers harness. Its not a matter of what developers will decide to go with...we already know all console games will be fine tuned to the teeth with AMD GCN technology....its simply now a matter of when Nvidia will blink. Like i said...AMD inked a nice deal with the consoles that will help them bit in the long run....it may not be a grand slam for AMD but it surely is a home run for AMD because this instantly assures them that THEIR TECH WILL NOT DIE....


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> SilverforceG
> 
> __
> https://www.reddit.com/r/3jfgs9/maxwell_does_support_async_compute_but_with_a/%5B/URL
> 
> Now, concerning the big elephant in the room everyone is ignoring. Why is Fury X performing the same as a 290 on the Aots benchmark? Lack of async compute could explain Maxwell 2 poor performance, but what explains the failings of the top AMD card right now?


----------



## provost

Quote:


> Originally Posted by *Devnant*
> 
> Thanks for posting this!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I've just noticed the jump on CPU usage to 50% on the 980 TI test and this indeed suggests nVidia is trying to emulate the lack of hardware Async Compute through the CPU.
> 
> Source: https://forum.beyond3d.com/posts/1869789/
> 
> Now, concerning the big elephant in the room everyone is ignoring. Why is Fury X performing the same as a 290 on the Aots benchmark? Lack of async compute could explain Maxwell 2 poor performance, but what explains the failings of the top AMD card right now?


My guess (and it's just that), Furyx is driver/bios gimped for reasons known only to AMD... lol
And, who says AMD didn't learn from best practices of Nvidia ...







. J/K


----------



## Klocek001

Quote:


> Originally Posted by *Devnant*
> 
> Thanks for posting this!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I've just noticed the jump on CPU usage to 50% on the 980 TI test and this indeed suggests nVidia is trying to emulate the lack of hardware Async Compute through the CPU.
> 
> Source: https://forum.beyond3d.com/posts/1869789/
> 
> Now, concerning the big elephant in the room everyone is ignoring. Why is Fury X performing the same as a 290 on the Aots benchmark? Lack of async compute could explain Maxwell 2 poor performance, but what explains the failings of the top AMD card right now?


probably still resterization. as good as async compute looks on gcn it still has worse rasterizatin rate than 780ti, not to mention Fury X is 35% slower in that area compared to a stock 980ti.
Quote:


> Originally Posted by *provost*
> 
> My guess (and it's just that), Furyx is driver/bios gimped for reasons known only to AMD... lol


they're gimping their 2015 flagship card to bring it more to their 2013 flagship level. Once they ungimp it in drivers AMD is gonna crush nvidia in dx11 as well. They're just waiting for the right moment.


----------



## gamervivek

If nvidia cards are doing it on CPU then the times won't add up for graphics and compute shaders when doing them together.


----------



## Slaughterem

Quote:


> Originally Posted by *provost*
> 
> My guess (and it's just that), Furyx is driver/bios gimped for reasons known only to AMD... lol
> And, who says AMD didn't learn from best practices of Nvidia ...
> 
> 
> 
> 
> 
> 
> 
> . J/K


Would be interesting to see results with new 15.8 driver released a couple of days ago. Has anyone tested Fury with 15.8 on AoTs?


----------



## provost

Quote:


> Originally Posted by *Klocek001*
> 
> probably still resterization. as good as async compute looks on gcn it still has worse rasterizatin rate than 780ti, not to mention Fury X is 35% slower in that area compared to a stock 980ti.
> they're gimping their 2015 flagship card to bring it more to their 2013 flagship level. Once they ungimp it in drivers AMD is gonna crush nvidia in dx11 as well. They're just waiting for the right moment.


LMAO...









Ok. That was funny. No, my guess is, if they are indeed gimping, it has to do with what they have in the pipeline as future releases. This is just a hunch.

As for crushing this or that, I think both Nvidia and AMD have arrived at a high level mutual understanding how the industry should evolve, given the pressures that are stressing them both. Yeah, they will continue to compete, but that doesn't mean there isn't room for general understanding how things should be, I.e survival....


----------



## airfathaaaaa

so wait nvidia driver is sending the packet it cant process to the cpu and then back to the card?

i wonder what is the absolute lowest cpu that can drive that program to crash


----------



## Devnant

Actually, forget what I said. I created an account on Beyond 3D forums just to run the latest version of the test and monitor CPU usage. CPU was hardly stressed during duration of the test.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> Actually, forget what I said. I created an account on Beyond 3D forums just to run the latest version of the test and monitor CPU usage. CPU was hardly stressed during duration of the test.


Did you disable TDR?

With the latest version you need to perform a regedit task and disable TDR, that's what I've seen them do in order to get some degree of Async working.


----------



## Kana-Maru

Quote:


> Originally Posted by *Klocek001*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> they're gimping their 2015 flagship card to bring it more to their 2013 flagship level. Once they ungimp it in drivers AMD is gonna crush nvidia in dx11 as well. They're just waiting for the right moment.


Do you really believe this? I guess it's possible since the voltage is STILL locked on the Fury X cards.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> Did you disable TDR?
> 
> With the latest version you need to perform a regedit task and disable TDR, that's what I've seen them do in order to get some degree of Async working.


No, because I thought that shoudn't matter because it only crashes during the single commandlist test (I don't know what is the purpose of that test). But I will disable it and run the test again.


----------



## Mahigan

Quote:


> Originally Posted by *Paul17041993*
> 
> Well I have no idea, I've never had microstutter ever on various cards including my more modern 7970 and 290X, though neither have I done any crossfire setups. Only thing that pops to mind about your single-card stutter would be that it was unstable, wouldn't surprise me if it were an ASUS.
> 
> Oh and *multi-GPU compute is completely up to the developer to decide, and for the most part it wouldn't be that hard*. DX12 basically makes both crossfire and SLI completely redundant and I don't even think either company will implement automatic scaling to multiple GPUs...


Exactly!
Quote:


> Originally Posted by *moey1974*
> 
> Oh Remi, stop calling people fanboys, so childish dude, you sound like the fanboy for bringing that up.
> 
> Listen fella's...as a unbiased gamer that i am, Nvidia might have the truck loads of cash but AMD has been quietly sitting back in a almost "Sleeping Giant" kind of way. The reason why i say this is that a ton of you guys, hell, about 97% of you in here have missed one HUGE thing here. We are talking about technology that is specifically for getting better performance out of videogames...where do 99% of PC AAA games come from? The consoles...AMD hit a home run when they inked that deal with both Microsoft and Sony for the Xone and PS4....all developers will take full advantage of GCN since both systems including the next Nintendo system have AMD GCN Tech under the hood. I suppose you can do the rest of the math here, its not rocket science...but make no mistake about it, what ever consoles dictate..it will have a big impact on the PC side of gaming including what developers harness. Its not a matter of what developers will decide to go with...we already know all console games will be fine tuned to the teeth with AMD GCN technology....its simply now a matter of when Nvidia will blink. Like i said...AMD inked a nice deal with the consoles that will help them bit in the long run....it may not be a grand slam for AMD but it surely is a home run for AMD because this instantly assures them that THEIR TECH WILL NOT DIE....


We've discussed this at great length at several points during this ginormous thread. It's just that it gets lost with the amount of content being created here. This thread blew up in popularity lol

Quote:


> Originally Posted by *spacin9*
> 
> Yeah... I wanna believe that. But it's not fixed. I have had many Crossfire rigs since XFX 1900 or whatever it was the black dongle. I had 4850 X2, 6970 Crossfire, which was actually okay in some games, 7970 crossfire was pretty much a disaster. I was big into Skyrim and a 7970 stutered with one card, which I attributed to driver cheats, but I don't know. I've just never seen one card microstutter before. And two were even worse. I think the rumor was Tri-Fire would actually get rid of the stuttering and er um.. no. Then of course I had R9 290s, which I very much wanted to keep because they were cheap... and there's a reason why they're cheap because they were hot, and they microsttuered just about every game except Battlefield they were pretty good with that.
> 
> I keep looking for those side-by side you tube reviews for Fury X Crossfire vs GTX 980 Ti SLI where they would run say Crysis 3 and you could tell which one was smoother. With the 7970 and frame pacing the stutter was diminished but it was still there.
> 
> I run SLI for a reason. It works... most of the time. I've got 150 hours in the Witcher 3 @ 4K with Hairworks on and it's smooth and lovely and I hope Ashes can run the same way.
> 
> But this thread does concern me that I bought gimped cards... but on second thought perhaps it was NV who was thinking ahead in the sense that they knew dums dum like me are going to buy two GPUs, maybe if they introduced hardware level async, it would mess up SLI. That's complete conjecture... but you know when you're selling beasts like Titans , they know most are going to want more than one and maybe they figured it would be easier to optimize an architecture that only did like one workload well instead of two workloads that might be harder to synchronize for multi-GPU, which AMD isn't really good at anyway.
> 
> Like I said I have no idea. I know this is NGreedia were talking about. But I bought into it... and at least it works. Mostly. I just I hope Ashes works well.. so far I've seen it work well for me with acceptable visuals. If they can get SLI working reasonably, this'll be another game I play for years. Even without SLI... I'm quite sure I'll be enjoying it.


For DX12,

We're talking Multi-Adapter. It is a far more efficient way to do SLI/Crossfire. It uses SFR (Split Frame Rendering) rather than AFR (Alternate Frame Rendering). The bonus, of using SFR, is that you don't create redundant textures in your Graphic Cards memory buffer. You effectively Split every frame into two segments and all of the textures for split frame a is in GPU1 Memory Buffer, textures for split frame b is in GPU2 Memory buffer. This means that if GPU1 has 4GB RAM and GPU2 has 4GB RAM then you now have 8GB of memory buffer. With AFR you would have still had only 4GB.


----------



## Klocek001

Quote:


> Originally Posted by *Kana-Maru*
> 
> Do you really believe this? I guess it's possible since the voltage is STILL locked on the Fury X cards.


they did the same to 290 when it came out in 2013 to replace 7970, they're still holding optimized dx11 drivers for it, there is no point for releasing them now, gotta wait for a good moment too.
Quote:


> Originally Posted by *Mahigan*
> 
> Exactly!
> We've discussed this at great length at several points during this ginormous thread. It's just that it gets lost with the amount of content being created here. This thread blew up in popularity lol
> For DX12,
> 
> We're talking Multi-Adapter. It is a far more efficient way to do SLI/Crossfire. It uses SFR (Split Frame Rendering) rather than AFR (Alternate Frame Rendering). The bonus, of using SFR, is that you don't create redundant textures in your Graphic Cards memory buffer. You effectively Split every frame into two segments and all of the textures for split frame a is in GPU1 Memory Buffer, textures for split frame b is in GPU2 Memory buffer. This means that if GPU1 has 4GB RAM and GPU2 has 4GB RAM then you now have 8GB of memory buffer. With AFR you would have still had only 4GB.


I wanna see 3x7950 run against a single 390X @4K in that case. Just from pure curiosity.


----------



## Xuper

Mahigan , this SFR is AMD thing ? or Microsoft , part of DX12 ?


----------



## Klocek001

Quote:


> Originally Posted by *Xuper*
> 
> Mahigan , this SFR is AMD thing ? or Microsoft , part of DX12 ?


dx12 and both for amd and nvidia. they said it's replacing CFX and SLI.


----------



## Forceman

Quote:


> Originally Posted by *airfathaaaaa*
> 
> so wait nvidia driver is sending the packet it cant process to the cpu and then back to the card?
> 
> i wonder what is the absolute lowest cpu that can drive that program to crash


No, it isn't. How do these rumors keep getting started? That test doesn't stress the CPU at all, and the latency would be obvious if it was offloading it to the CPU.

It's maybe/probably (definitely?) doing the scheduling in software, but it isn't running the actual compute take in software. It times out and TDRs because by trying to force the cards to run the commands synchronously they accidentally made Nvidia run all the commands sequentially, which eventually ends up taking hundreds of times longer.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> Did you disable TDR?
> 
> With the latest version you need to perform a regedit task and disable TDR, that's what I've seen them do in order to get some degree of Async working.


Here you go, with TDR disabled CPU usage is still pretty low. No crashes this time.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> Here you go, with TDR disabled CPU usage is still pretty low. No crashes this time.
> 
> 
> Spoiler: Warning: Spoiler!


Cool, now you need to do like they did at Beyond3D, Look at how the CPU behaves when you see some Asynchronous commands come into play. It is a rare occurrence, happens ever so often. The best thing for you to do is to submit your results to Beyond3D so they add them to the graph. From this point you will be able to compare, visually, when your GPU is doing two compute commands Asynchronously and the effect it has on your CPU.

According to Beyond3D, they see a spike in CPU usage when this happens. It only appears as a small spike, because their test isn't as strenuous as the Oxide engine.

I see a few potential spikes caused, potentially, by Asynchronous Compute commands...


Spoiler: Warning: Spoiler!


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> Cool, now you need to do like they did at Beyond3D, Look at how the CPU behaves when you see some Asynchronous commands come into play. It is a rare occurrence, happens ever so often. The best thing for you to do is to submit your results to Beyond3D so they add them to the graph. From this point you will be able to compare, visually, when your GPU is doing two compute commands Asynchronously and the effect it has on your CPU.
> 
> According to Beyond3D, they see a spike in CPU usage when this happens. It only appears as a small spike, because their test isn't as strenuous as the Oxide engine.
> 
> I see a few potential spikes caused by Asynchronous Compute commands...
> 
> 
> Spoiler: Warning: Spoiler!


In my case, they analysed the data and there was no evidence of async compute at all. Compute and graphics latency adds up almost perfectly. Maybe Nvidia didn't develop async compute emulation for my CPU yet?


----------



## Klocek001

Quote:


> Originally Posted by *Mahigan*
> 
> Cool, now you need to do like they did at Beyond3D, Look at how the CPU behaves when you see some Asynchronous commands come into play. It is a rare occurrence, happens ever so often. The best thing for you to do is to submit your results to Beyond3D so they add them to the graph. From this point you will be able to compare, visually, when your GPU is doing two compute commands Asynchronously and the effect it has on your CPU.
> 
> According to Beyond3D, they see a spike in CPU usage when this happens. It only appears as a small spike, because their test isn't as strenuous as the Oxide engine.
> 
> I see a few potential spikes caused, potentially, by Asynchronous Compute commands...
> 
> 
> Spoiler: Warning: Spoiler!


can a few spikes like this really influence avf fps that much ?


----------



## Mahigan

Quote:


> Originally Posted by *Klocek001*
> 
> can a few spikes like this really influence avf fps that much ?


It can if an engine made use of full fledged Asynchronous Compute in order to process Post Processing effects. This is what the Oxide dev told us and when nVIDIA, on site with Oxide, noticed this... they asked that the feature be shut down for their architecture.

You have to remember that AotS maxes out the CPUs already. Any additional usage of the CPU would hinder performance. GCN has a hardware scheduler, therefore it doesn't make any additional use of the CPU.

This won't occur in every single title, we have to remember that AotS is an RTS with a lot going on at any given time (many units permeate the screen). They process a lot of AI (what they call smart Battlegroups) where your battlegroup (a bunch of units assigned to work together as a squad) support each other intelligently. This means the gamer doesn't have to control each individual unit him/her self, the AI does it for them. AI is CPU intensive.

I think that all of these tests, as well as Oxide's words, remove any claimed bias allegations some have leveled towards Oxide. Their game is just more parallel than others.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Forceman*
> 
> No, it isn't. How do these rumors keep getting started? That test doesn't stress the CPU at all, and the latency would be obvious if it was offloading it to the CPU.
> 
> It's maybe/probably (definitely?) doing the scheduling in software, but it isn't running the actual compute take in software. It times out and TDRs because by trying to force the cards to run the commands synchronously they accidentally made Nvidia run all the commands sequentially, which eventually ends up taking hundreds of times longer.


i can hardly keep up with the amount of info (im trying) im getting from all of forum inculding 4chan and reddit....
some posted diagrams that the cpu had very high util some others not its really getting frustrating to try and keep it with that insane amount of infomartion
also the best place to be now is the nvidia forums mucho cry







and also mucho missunderstanding of missinformations


----------



## Forceman

Quote:


> Originally Posted by *airfathaaaaa*
> 
> i can hardly keep up with the amount of info (im trying) im getting from all of forum inculding 4chan and reddit....
> some posted diagrams that the cpu had very high util some others not its really getting frustrating to try and keep it with that insane amount of infomartion
> also the best place to be now is the nvidia forums mucho cry
> 
> 
> 
> 
> 
> 
> 
> and also mucho missunderstanding of missinformations


Which is why people need to stop jumping to conclusions like "it's emulating compute in the CPU". Testing is good, making theories is fine, but these "pronouncements" should stop. All it does is confuse the issue.

And Oxide never said that's what they think is happening (at least not that I saw), they said Nvidia had poor performance with async enabled, which could be caused by lots of things.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Which is why people need to stop jumping to conclusions like "it's emulating compute in the CPU". Testing is good, making theories is fine, but these "pronouncements" should stop. All it does is confuse the issue.
> 
> And Oxide never said that's what they think is happening (at least not that I saw), they said Nvidia had poor performance with async enabled, which could be caused by lots of things.


They stated that nVIDIA has a higher CPU overhead. That's what Beyond3D has found as well thus far.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Forceman*
> 
> Which is why people need to stop jumping to conclusions like "it's emulating compute in the CPU". Testing is good, making theories is fine, but these "pronouncements" should stop. All it does is confuse the issue.
> 
> And Oxide never said that's what they think is happening (at least not that I saw), they said Nvidia had poor performance with async enabled, which could be caused by lots of things.


well thats why i inserted a little ? at the end i asked cause i wasnt sure if that was what people were saying


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> They stated that nVIDIA has a higher CPU overhead. That's what Beyond3D has found as well thus far.


That was in regard to Tier 2/Tier 3, not async. His comments about async were just that it affected performance.
Quote:


> Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster in terms of performance and conformance so we shut it down on their hardware. As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports.


Also
Quote:


> Still, DX12 CPU overhead is still far far better on Nvidia, and we haven't even tuned it as much as DX11.


No one has yet explained why, if it is emulating compute on the CPU:

1. It only happens for some people, sometimes,
2. it is spikes of CPU load, when the test is running async commands continuously
3. Why it has no affect on latency
4. Why TDR makes any difference to it

It's pretty much based on one post at Beyond3D, and the guy who found that re-ran the test and didn't have high CPU use.


----------



## CrazyElf

From what I understand, the reason why AMD was unable to do more driver optimizations compared to Nvidia was because they simply lacked the money to do it. Actually, you could just as easily blame Nvidia for not building hardware that had more "brute force" the way AMD does. It's for that reason that AMD seems to be "aging" better (the 7970 and 290X made relative gains compared to the 680 and 780Ti). For optimal DX11 performance, what would have been wanted is a company with both the brute force hardware and the money for lots of driver optimizations. The close to the metal nature of DX12 prevents this from Nvidia. The point is starting to get moot now with DX12 though.

This driver reliance I suspect is also a reason why when overclocking Nvidia GPUs, you don't see a linear gain in frame rates vs clockspeed. You do with AMD GPUs I find.

Either way, Nvidia is no way the underdog. They still have the massive monetary and R&D resources to do as they see fit. They've got marketshare, mind share, and lots of $$$ (more than AMD) to make a huge comeback, which I think they will.

I'm worried about an Intel Nvidia monopoly here. That's a lose for us consumers.









Quote:


> Originally Posted by *provost*
> 
> Actually, I think that it does make the Fury (or any AMD card starting with 290) a better buy, since it provides a lot more future proof insurance than Maxwell 2, if everything theorized here turns out to be true.


You may have a point.

The Oxide developer did say this:


Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *Kollock*
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. *Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. F*or example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.






If this happens, then we could see huge gains in the Fury X relative to Nvidia's 980Ti.

The biggest advantages of the Fury X are the massive shader performance and the bandwidth of HBM. This would suggest that their future engines are more shader oriented. We'll need to wait and see on Oxide's next release. Besides, we need more games before drawing conclusions.

Earlier, I suggested that AMD might be playing the long game too with the Fury X. Kind of like how the 7970 was widely regarded as worse than the 680, but now, look at the performance.

Most of the people I have spoken that I consider credible too suggest that now is not a good time to buy, simply because of the leap we expect at 16nm.

Quote:


> Originally Posted by *sugarhell*
> 
> I havent seen it too much but the jump is not that easy. Developing a game for dx12 is complete different than a dx11 on the mindset.
> 
> From the point of reducing the triangles and optimize the meshes of the 3d models for the dx11 draw call limits we must go the point we dont care that much anymore. So it changes the whole development project.
> 
> Now as an API you must choose if you really need it. I can see most indie groups to use dx11(or if the graphic engine supports dx12) but the big companies will use dx12. They have capable engineer teams and big enough teams so they can make the jump easily.


I'm expecting to see relatively rapid DX12 adoption - more so than say, DX11.

Net for the large studios, it might actually save effort. It will be "easier" to port between the consoles, DX12 on PC, and if they want, a Linux for via Vulkan. I just hope that we see greater adoption from smaller studios (could take a few years) and perhaps greater Vulkan adoption too (to push up Linux).

Normally I'm not a fan of big studios. I think that their business practices are quite disagreeable and at times, they have caused innovation to stagnate, preferring to "milk" what they have














, but this may be an exception.

Quote:


> Originally Posted by *error-id10t*
> 
> Someone just tell me which lolly tastes nicer, red or green?
> 
> I know know.. I've tried to keep up ignoring stuff I have no idea about and nobody knows at the moment but we're all hoping this evens the game up for everyone's benefit.


Quote:


> Originally Posted by *ZealotKi11er*
> 
> Red last longer while Green has a stronger initial punch.


This. And Red gets stronger relative to Green over time. Green stops getting stronger once the new architecture comes out and begins to weaken relative to the competition.

If you plan to keep the GPU for more than 1 year, go with Team Red. If you upgrade a lot, go with Green. I suppose if one company closely works with the developer for a game you play a lot, and if it's your top game by far, go with that color.

Quote:


> Originally Posted by *spacin9*
> 
> My first question is: What if c-a-t spelled dog?
> 
> My second question is: What if AMD delivers this huge increase in performance with async compute, but is still a micro-stuttering mess, as per usual?


That hasn't been an issue since the Frame pacing drivers. Crossfire 290X did better than the 780Ti for that reason - and scaled better.

Fury X Crossfire at 4k will actually outperform a Titan X, provided VRAM is not an issue. This, despite the Titan X being 5-10% stronger at single GPU at 4k (and the gap is bigger at 2560).

Quote:


> Originally Posted by *Forceman*
> 
> Which is why people need to stop jumping to conclusions like "it's emulating compute in the CPU". Testing is good, making theories is fine, but these "pronouncements" should stop. All it does is confuse the issue.
> 
> And Oxide never said that's what they think is happening (at least not that I saw), they said Nvidia had poor performance with async enabled, which could be caused by lots of things.


The problem is, nobody has been able to come up with a good alternative hypothesis that challenges Mahigan's with data supporting it. Not saying he's 100% right here (I think there is a lot of room for modification because we simply don't have all the information to work with here), but I am saying he has the best hypothesis so far.


----------



## ToTTen

Quote:


> Originally Posted by *Devnant*
> 
> Now, concerning the big elephant in the room everyone is ignoring. Why is Fury X performing the same as a 290 on the Aots benchmark? Lack of async compute could explain Maxwell 2 poor performance, but what explains the failings of the top AMD card right now?


Where have you seen a 290 vs. Fury X direct standoff in AotS?

If the benchmark is bottlenecked by the number of async compute queues that a GPU can do, then both Hawaii and Fiji have similar clocks and both support 64 concurrent queues (8 queues * 8 ACEs on Hawaii, 8 queues * 4 ACEs + 16 queues * 2 HWEs in Fiji).


----------



## Forceman

Quote:


> Originally Posted by *CrazyElf*
> 
> The problem is, nobody has been able to come up with a good alternative hypothesis that challenges Mahigan's with data supporting it. Not saying he's 100% right here (I think there is a lot of room for modification because we simply don't have all the information to work with here), but I am saying he has the best hypothesis so far.


It is an interesting theory, and it needs more testing. My issue is with people accepting it as fact before it has been adequately tested. That's not happening here on OCN so much, but it is happening all over the Internet (specifically reddit). That's the problem with making sensational (not sensational on purpose, but still sensational) claims - people have a strong tendency to take them as fact without actually checking or testing them.

Edit: I just realized I did my 960 testing on driver 353.62, re-running it now to see if maybe there a change with the new drivers.


----------



## Devnant

Quote:


> Originally Posted by *ToTTen*
> 
> Where have you seen a 290 vs. Fury X direct standoff in AotS?
> 
> If the benchmark is bottlenecked by the number of async compute queues that a GPU can do, then both Hawaii and Fiji have similar clocks and both support 64 concurrent queues (8 queues * 8 ACEs on Hawaii, 8 queues * 4 ACEs + 16 queues * 2 HWEs in Fiji).


Extreme tech has some benchmarks comparing the Fury X to the 980 TI here:
http://www.extremetech.com/gaming/212314-directx-12-arrives-at-last-with-ashes-of-the-singularity-amd-and-nvidia-go-head-to-head



They are performing about the same. 290X is also performing pretty close to both the 980 TI and Fury X on DX12, with Oxide even mentioning the 290X has better min framerates.


----------



## provost

Quote:


> Originally Posted by *Forceman*
> 
> Which is why people need to stop jumping to conclusions like "it's emulating compute in the CPU". Testing is good, making theories is fine, but these "pronouncements" should stop. All it does is confuse the issue.
> 
> And Oxide never said that's what they think is happening (at least not that I saw), they said Nvidia had poor performance with async enabled, which could be caused by lots of things.


I hear what you all are saying, but here is what I am struggling with:

What's stopping Nvidia from coming out and explaining how it's cards do in fact handle cpu overhead for DX12, and why exactly there is a difference in performance jump in detail? So far, I have only heard vague references to arch differences, and that statement in itself seems like an excuse rather than a reason. Nvidia is not shy about PR spin and counter spin, actually they are one of the best I have seen on any industry (and I mean it as a complement.







).

So, let me put on three different hats as follows:

If I am an unbiased undecided potential gpu buyer, I am really mystified about the lack of Nvidia's response, which leads me to the natural conclusion that Nvidia doesn't really have credible response to the theories being propagated here;

If I am a biased Nvidia customer, I am feeling really anxious about my existing cards and also upset as to why Nvidia can't come back with an intelligent counter argument to put these AMD guys in their place. (I am role playing here, so don't kill me pls.







)

If I am an AMD biased customer or a potential buyer, I am feeling pretty good about the future proofing potential of my purchase (both real and perceived) , especially relative to any Nvidia cards. And, I am thinking, yeah it serves all those Maxwell Nvidia guys well who keep beating up on us , they should have known better after the "970 misunderstanding" because of Nvidia's shady practices, etc etc etc (again just role paying here.







. )


----------



## Mahigan

*What we do know is this...*

*HyperQ*:
The Grid Management Unit is a software implementation.
The Work Distributor is a software implementation.
The Asynchronous Warp Schedulers are in the hardware.

Ergo what is feeding the AWSs is a software side scheduler.

*Ever since Kepler (Anandtech as source*):
Quote:


> GF114, owing to its heritage as a compute GPU, had a rather complex scheduler. Fermi GPUs not only did basic scheduling in hardware such as register scoreboarding (keeping track of warps waiting on memory accesses and other long latency operations) and choosing the next warp from the pool to execute, but Fermi was also responsible for scheduling instructions within the warps themselves. While hardware scheduling of this nature is not difficult, it is relatively expensive on both a power and area efficiency basis as it requires implementing a complex hardware block to do dependency checking and prevent other types of data hazards. And since GK104 was to have 32 of these complex hardware schedulers, the scheduling system was reevaluated based on area and power efficiency, and eventually stripped down.


Quote:


> The end result is an interesting one, if only because by conventional standards it's going in reverse. With GK104 NVIDIA is going back to static scheduling. Traditionally, processors have started with static scheduling and then moved to hardware scheduling as both software and hardware complexity has increased. Hardware instruction scheduling allows the processor to schedule instructions in the most efficient manner in real time as conditions permit, as opposed to strictly following the order of the code itself regardless of the code's efficiency. This in turn improves the performance of the processor.
> 
> However based on their own internal research and simulations, in their search for efficiency NVIDIA found that hardware scheduling was consuming a fair bit of power and area for few benefits. In particular, since Kepler's math pipeline has a fixed latency, hardware scheduling of the instruction inside of a warp was redundant since the compiler already knew the latency of each math instruction it issued. So NVIDIA has replaced Fermi's complex scheduler with a far simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIA's compiler. In essence it's a return to static scheduling.
> 
> Ultimately it remains to be seen just what the impact of this move will be. Hardware scheduling makes all the sense in the world for complex compute applications, which is a big reason why Fermi had hardware scheduling in the first place, and for that matter why AMD moved to hardware scheduling with GCN. At the same time however when it comes to graphics workloads even complex shader programs are simple relative to complex compute applications, so it's not at all clear that this will have a significant impact on graphics performance, and indeed if it did have a significant impact on graphics performance we can't imagine NVIDIA would go this way.
> 
> What is clear at this time though is that NVIDIA is pitching GTX 680 specifically for consumer graphics while downplaying compute, which says a lot right there. Given their call for efficiency and how some of Fermi's compute capabilities were already stripped for GF114, this does read like an attempt to further strip compute capabilities from their consumer GPUs in order to boost efficiency. Amusingly, whereas AMD seems to have moved closer to Fermi with GCN by adding compute performance, NVIDIA seems to have moved closer to Cayman with Kepler by taking it away.
> 
> With that said, in discussing Kepler with NVIDIA's Jonah Alben, one thing that was made clear is that NVIDIA does consider this the better way to go. *They're pleased with the performance and efficiency they're getting out of software scheduling*, going so far to say that had they known what they know now about software versus hardware scheduling, they would have done Fermi differently. But whether this only applies to consumer GPUs or if it will apply to Big Kepler too remains to be seen.


http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3

*As for the CPU usage in AotS*:


Spoiler: Warning: Spoiler!


----------



## Forceman

Maybe Nvidia doesn't want to tip their hand to what they are doing, for whatever reason. Or maybe they are crafting a thought out explanation that is working it's way through headquarters. Or maybe they are exactly as evil as people seem to think and are hoping this all blows over. Could be anything really.

Retesting now in 355.82 and there isn't any difference in CPU load, still about 3% total and 1% for the actual program. Seems to be taking longer with 355.82, but although the results seem the same so far.

Edit: Great. Defender just kicked in and totally wrecked my chart.


----------



## swiftypoison

Quote:


> Originally Posted by *Forceman*
> 
> Maybe Nvidia doesn't want to tip their hand to what they are doing, for whatever reason. Or maybe they are crafting a thought out explanation that is working it's way through headquarters. Or maybe they are exactly as evil as people seem to think and are hoping this all blows over. Could be anything really.
> 
> Retesting now in 355.82 and there isn't any difference in CPU load, still about 3% total and 1% for the actual program. Seems to be taking longer with 355.82, but although the results seem the same so far.
> 
> Edit: Great. Defender just kicked in and totally wrecked my chart.


.

It seems to me that they are probably crafting a thoughtful explanation. They know there are smart and talented contributors like Mahigan so its not like they can just say "sorry we didn't communicate" like the whole 3.5 gb deal. I fully expect a technical explanation because it really that's the only way you can explain what is going on here.


----------



## semitope

Quote:


> Originally Posted by *Clocknut*
> 
> u play DirectX9-11 games? = Get Nvidia.
> 
> U buying GPU for playing games design for PS4/Xbox One? Buy AMD.


not true. If you are getting a card over $600, probably nvidia. Below that check benchmarks for the games you play. Fury > 980 even in dx11, 390x >970 even in dx11. 390x vs 980 depends on resolution most of the time. etc etc. It's not a wash regarding dx11 in most games.


----------



## Casey Ryback

Quote:


> Originally Posted by *semitope*
> 
> not true. If you are getting a card over $600, probably nvidia. Below that check benchmarks for the games you play. Fury > 980 even in dx11, 390x >970 even in dx11. 390x vs 980 depends on resolution most of the time. etc etc. It's not a wash regarding dx11 in most games.


Exactly, even though AMD architecture is said to 'struggle' in DX11, they still match nvidia price/performance through a large majority of the gpu market.

As per usual with AMD you also get more vram and high memory bandwidth for high res gaming. (Bar the top tier due to HBM) It's people's mindset that is the problem for AMD rather than non competitive products.


----------



## semitope

Quote:


> Originally Posted by *CrazyElf*
> 
> From what I understand, the reason why AMD was unable to do more driver optimizations compared to Nvidia was because they simply lacked the money to do it. Actually, you could just as easily blame Nvidia for not building hardware that had more "brute force" the way AMD does. It's for that reason that AMD seems to be "aging" better (the 7970 and 290X made relative gains compared to the 680 and 780Ti). For optimal DX11 performance, what would have been wanted is a company with both the brute force hardware and the money for lots of driver optimizations. The close to the metal nature of DX12 prevents this from Nvidia. The point is starting to get moot now with DX12 though.
> 
> This driver reliance I suspect is also a reason why when overclocking Nvidia GPUs, you don't see a linear gain in frame rates vs clockspeed. You do with AMD GPUs I find.
> 
> Either way, Nvidia is no way the underdog. They still have the massive monetary and R&D resources to do as they see fit. They've got marketshare, mind share, and lots of $$$ (more than AMD) to make a huge comeback, which I think they will.


"driver optimizations" are a waste of money. Hardware drivers should not change so much if things are working right. If the hardware is not changing and the os is the same on the same API, whats with all the drivers? Hopefully dx12 fixes that. Drivers should be to add features, not patch individual games.


----------



## Devnant

Quote:


> Originally Posted by *Forceman*
> 
> Maybe Nvidia doesn't want to tip their hand to what they are doing, for whatever reason. Or maybe they are crafting a thought out explanation that is working it's way through headquarters. Or maybe they are exactly as evil as people seem to think and are hoping this all blows over. Could be anything really.
> 
> Retesting now in 355.82 and there isn't any difference in CPU load, still about 3% total and 1% for the actual program. Seems to be taking longer with 355.82, but although the results seem the same so far.
> 
> Edit: Great. Defender just kicked in and totally wrecked my chart.


The only thing that I'm 100% sure so far is that: *if* Maxwell 2 is indeed capable at all to do async compute, there are no concrete evidence so far showing it can (as far as we all know).

The way I see it, Nvidia is guilty until proven innocent right now.

I mean, Oxide said they can't, Beyond3d's test showed they can't, Mahigan presented plenty of plausible theories as to why they can't, etc.


----------



## ZealotKi11er

This game sure is getting a lot of publicity. Maybe this was their goal all along.


----------



## Kana-Maru

Quote:


> Originally Posted by *Klocek001*
> 
> they did the same to 290 when it came out in 2013 to replace 7970, they're still holding optimized dx11 drivers for it, there is no point for releasing them now, gotta wait for a good moment too.


I hope AMD doesn't hold on to the optimized DX11 drivers to long. Well no complaints here. I came from dual GTX 670s and decided to go with AMD [-R9 Radeon Fury X-] this time around. I'm no fanboy for either company and I actually have principles. AMD is still making plenty of innovation and standards. There were a lot of negative things Nvidia had been doing that made me consider AMD this time around instead of the GTX 980 Ti I wanted.

I have a friend with a 7970Ghz 6GB and he says that the drivers has been great over the past few years. I couldn't say the same for my Kepler GTX 670s after the 900 series released. The performance diminished and I started having all kinds of driver crashes. It seems DX12 is getting a lot of initial support from some of the biggest developers in the market.


----------



## p4inkill3r

Quote:


> Originally Posted by *ZealotKi11er*
> 
> This game sure is getting a lot of publicity. Maybe this was their goal all along.


Being the first is usually a notable thing, I'm sure they knew what they were doing.


----------



## Forceman

Quote:


> Originally Posted by *Devnant*
> 
> The only thing that I'm 100% sure so far is that: *if* Maxwell 2 is indeed capable at all to do async compute, there are no concrete evidence so far showing it can (as far as we all know).
> 
> The way I see it, Nvidia is guilty until proven innocent right now.
> 
> I mean, Oxide said they can't, Beyond3d's test showed they can't, Mahigan presented plenty of plausible theories as to why they can't, etc.


Not being able to do async compute is not the same thing as emulating async compute on the CPU. The first is pretty conclusive, the second is far from it.

Edit: Here's my 960/355.82 run with TDR off.




Of note, I started (but didn't finish) the test about 5 times , and one of those times it seemed to go crazy. The output display was all over the place, scrolling really fast and without correct formatting, and the CPU usage was through the roof. So it's possible there is some issue where it occasionally goes crazy and that's where the high CPU use came from. Every other time I've run it (about 10 now, with different drivers and TDR on/off) it's only done that once.

Edit: Caught a screen shot of it bugging out. Bugged out on the left, normal on the right. That's at about the same point in the test.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Devnant*
> 
> The only thing that I'm 100% sure so far is that: *if* Maxwell 2 is indeed capable at all to do async compute, there are no concrete evidence so far showing it can (as far as we all know).
> 
> The way I see it, Nvidia is guilty until proven innocent right now.
> 
> I mean, Oxide said they can't, Beyond3d's test showed they can't, Mahigan presented plenty of plausible theories as to why they can't, etc.


and that is the problem nvidia is facing
they cant get a trusted site to say here we done it its good we all see this move along if this was a small problem they would just shut the door and thats enough but now every major tech forum is talking about it


----------



## Devnant

Quote:


> Originally Posted by *Forceman*
> 
> Not being able to do async compute is not the same thing as emulating async compute on the CPU. *The first is pretty conclusive, the second is far from it*.


True.

The async compute emulation theory is actually based on only one weird test result with odd CPU spikes. No one with Maxwell 2 has been able to reproduce those results so far.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> True.
> 
> The async compute emulation theory is actually based on only one weird test result with odd CPU spikes. No one with Maxwell 2 has been able to reproduce those results so far.


It would be better for nVIDIA if this was the case... because the alternative is they can't do Graphics + compute in Parallel... that would be an even worse conclusion for them.


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> It would be better for nVIDIA if this was the case... because the alternative is they can't do Graphics + compute in Parallel... that would be an even worse conclusion for them.


Is running forcing them to run in sequence (or disabling it altogether) really worse than running GPU compute tasks on CPU? Running anything GPU related on the CPU seems like a bad idea. Heck, the whole reason this came up was because Oxide disabled async because of performance problems, which led people to believe it was emulated - seems like the better answer would be to just disable it altogether if that was the case (which is what they did).

Anyway, ran the test again, same general results.


----------



## SimBy

I don't expect a response from Nvidia at this point. I think the damage from not responding and keeping people guessing is far smaller than the damage of confessing they lied about async compute capability.

With 970 at least they didn't flat out lie about having 4GB.


----------



## JohnLai

Quote:


> Originally Posted by *SimBy*
> 
> I don't expect a response from Nvidia at this point. I think the damage from not responding and keeping people guessing is far smaller than the damage of confessing they lied about async compute capability.
> 
> With 970 at least they didn't flat out lie about having 4GB.


Actually....I think Nvidia only replies to this async issue if the management thinks the issue will affect company future sales of gtx 900 series.

Assuming if fudzilla report can be trusted.....
http://www.fudzilla.com/news/graphics/37938-geforce-gtx-970-issue-destroyed-sales-in-february


----------



## ToTTen

Quote:


> Originally Posted by *Devnant*
> 
> The async compute emulation theory is actually based on *only one weird test result* with odd CPU spikes. No one with Maxwell 2 has been able to reproduce those results so far.


The emulation theory came from some very clear statements made by the Oxide developer, well before that test appeared in B3D.

Quote:


> Originally Posted by *Forceman*
> 
> Is running forcing them to run in sequence (or disabling it altogether) really worse than running GPU compute tasks on CPU?


According to Mark Cerny (from 2 years ago!) and the claims from many developers working on DX12, you're missing out on 30-50% of your GPU's compute potential.

nVidia Maxwell cards won't be slower in DX12. It's just that AMD cards will be quite a bit faster.


----------



## Klocek001

Quote:


> Originally Posted by *Kana-Maru*
> 
> I hope AMD doesn't hold on to the optimized DX11 drivers to long. Well no complaints here. I came from dual GTX 670s and decided to go with AMD [-R9 Radeon Fury X-] this time around. I'm no fanboy for either company and I actually have principles. AMD is still making plenty of innovation and standards. There were a lot of negative things Nvidia had been doing that made me consider AMD this time around instead of the GTX 980 Ti I wanted.
> 
> I have a friend with a 7970Ghz 6GB and he says that the drivers has been great over the past few years. I couldn't say the same for my Kepler GTX 670s after the 900 series released. The performance diminished and I started having all kinds of driver crashes. It seems DX12 is getting a lot of initial support from some of the biggest developers in the market.


I'm just poking at your funny bone







yes I had r9 290 trix, and while 980 I had later ran smoother, AMD's dx11 drivers aren't so bad. It's just not as fast as 980. The reason why 290x comes closer or outperforms 980 in dx12 lies more on the green side not being able to do async compute well, from what I'm reading here.


----------



## Falknir

Quote:


> Originally Posted by *ZealotKi11er*
> 
> This game sure is getting a lot of publicity. Maybe this was their goal all along.


Probably. It certainly takes attention away from the marketing material saying how it "redefines the possibilities of RTS with the unbelievable scale" and such when every video, screenshot, and alpha-experience shows it being nothing more then a very scaled down theater, combat experience, and overall composition when being compared to Planetary Annihilation and Supreme Commander: Forged Alliance. They however clearly demonstrated their superiority in the art of turning many units into obscenely-inefficient particle FX fountains (especially with DX12).


----------



## delboy67

Quote:


> Originally Posted by *Klocek001*
> 
> I'm just poking at your funny bone
> 
> 
> 
> 
> 
> 
> 
> yes I had r9 290 trix, and while 980 I had later ran smoother, AMD's dx11 drivers aren't so bad. It's just not as fast as 980. The reason why 290x comes closer or outperforms 980 in dx12 lies more on the green side not being able to do async compute well, from what I'm reading here.


Its not just async that gives the boost, oxide have already said aots isnt even big on it, the gcn cards get their, boost from basically being fed properly by the cpu in dx12. Async and other features from both companies specific to their architecture are just the gravy on top. Somewhere lost in all this discussion are games, games that should in theory run better.

Off topic but who all thought the consoles where maxed out at release? I admit I did! Seems gcn had at least some secret sauce not being used on pc/dx11


----------



## Devnant

Quote:


> Originally Posted by *ToTTen*
> 
> The emulation theory came from some very clear statements made by the Oxide developer, well before that test appeared in B3D.


Fair enough *but*:

http://forums.anandtech.com/showpost.php?p=37675312&postcount=829



Everything said here is just not true. According the B3D folks:


----------



## provost

Quote:


> Originally Posted by *ToTTen*
> 
> The emulation theory came from some very clear statements made by the Oxide developer, well before that test appeared in B3D.
> According to Mark Cerny (from 2 years ago!) and the claims from many developers working on DX12, you're missing out on 30-50% of your GPU's compute potential.
> 
> nVidia Maxwell cards won't be slower in DX12. It's just that AMD cards will be quite a bit faster.


Well sure, and I would guess it is by design on Maxwell cards, not an oversight or a blunder. Here write this down, memorize it or whatever:

"Nvidia is in the business of selling performance", not giving it away for free. The next gen of Nvidia cards will have complete management of how the DX12 performance will be distributed to the respective pricing tiers of the cards. This is not rocket science folks... Lol

It's just that AMD is giving it away more generously because they have to (or at least that's my theory anyway)


----------



## PontiacGTX

Quote:


> Originally Posted by *Clocknut*
> 
> u play DirectX9-11 games? = Get Nvidia.
> 
> U buying GPU for playing games design for PS4/Xbox One? Buy AMD.


it isnt like that...AMD does well in most of games on dx9/w/o cf on newest cards) and dx10
and some of the newest DX11 games ar ejust biased toward nvidia for obvious reason.

also the console ports under DX11 could be designed for consoles aswell and they are biased no matter the API
Quote:


> Originally Posted by *CrazyElf*
> 
> That hasn't been an issue since the Frame pacing drivers. Crossfire 290X did better than the 780Ti for that reason - and scaled better.


xDMA was the reason it scales better but also improved drivers for CF
Quote:


> Originally Posted by *Forceman*
> 
> Is running forcing them to run in sequence (or disabling it altogether) really worse than running GPU compute tasks on CPU? Running anything GPU related on the CPU seems like a bad idea. Heck, the whole reason this came up was because Oxide disabled async because of performance problems, which led people to believe it was emulated - seems like the better answer would be to just disable it altogether if that was the case (which is what they did).
> 
> Anyway, ran the test again, same general results.


what happens if you disable HT?


----------



## Forceman

Quote:


> Originally Posted by *PontiacGTX*
> 
> what happens if you disable HT?


It's a 3470S, no HT.


----------



## Klocek001

Quote:


> Originally Posted by *provost*
> 
> It's just that AMD is giving it away more generously because they have to (or at least that's my theory anyway)


I can't think of a better investment than buying 2500k + 7970 back in early 2012.


----------



## airfathaaaaa

just an informative video about dx12 and async maybe it adds up 2 or 3 things to the table (or not i dont know )


----------



## ToTheSun!

Quote:


> Originally Posted by *Klocek001*
> 
> Quote:
> 
> 
> 
> Originally Posted by *provost*
> 
> It's just that AMD is giving it away more generously because they have to (or at least that's my theory anyway)
> 
> 
> 
> I can't think of a better investment than buying 2500k + 7970 back in early 2012.
Click to expand...

I can: 2600K.


----------



## Mahigan

*Just to remove some of the confusion surrounding my first assessments of Fiji in AotS...*

As most of you know, at first, I was under the impression that Fiji was bound by its Fill rate. This was my view until a user pointed me towards the Color Compression scheme found in Fiji, relative to Hawaii. From that moment on, my hypothesis has been set to the Gtris/s rate. When the Oxide developer arrived, he too concluded the same. Why is it logical to claim that Fiji and Hawaii are raster bound?
Well that has to do with what one can observe from the Ashes of the Singularity benchmark. We see an ample use of Tessellation in conjunction with the triangles required to draw all of the 1000s of units onto the screen. These two elements alone aren't that big of a deal, that is until you look at the Shading and Terrain Shading Samples. When you add all three together, you get a scenario described in the Hawaii White Papers:


Spoiler: Warning: Spoiler!









In terms of Tessellation, you're looking at a Geometry Processor working with the Rasterizer in order to draw ever smaller triangles. This means more triangles being drawn in an ever more inefficient manner. This issue is further compounded by the Tessellation rate which draws smaller and smaller triangles. Drawing smaller triangles means drawing more triangles. Tahiti, Hawaii, Tonga and Fiji were not designed for this sort of abuse. They usually achieve their best performance to visual quality at a Tessellation rate of around x15/16.

But where's the Shader aspect of this equation? Have a glance at the AotS benchmark settings (marked in yellow):


Spoiler: Warning: Spoiler!







The combination of all three is what is mentioned in the Hawaii White Paper slides. This is why, this has been for quite some time now, my hypothesis. I mention this because many people are still quoting my original theory and this is leading to a lot of mis-information being spread. I may be right, I may be wrong, but this is a far more likely scenario than a Pixel Fill Rate bottleneck. I say this because this is the result one obtains from the 3D Mark Pixel Fill benchmark:


Spoiler: Warning: Spoiler!







The Color compression on Fiji appears to be doing its job quite nicely.

Peace.


----------



## Klocek001

Quote:


> Originally Posted by *ToTheSun!*
> 
> I can: 2600K.


I'd take 2500k + 60GB SSD over 2600k (back when SSDs weren't so widespread)


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> *Just to remove some of the confusion surrounding my first assessments of Fiji in AotS...*
> 
> As most of you know, at first, I was under the impression that Fiji was bound by its Fill rate. This was my view until a user pointed me towards the Color Compression scheme found in Fiji, relative to Hawaii. From that moment on, my hypothesis has been set to the Gtris/s rate. When the Oxide developer arrived, he too concluded the same. Why is it logical to claim that Fiji and Hawaii are raster bound?
> Well that has to do with what one can observe from the Ashes of the Singularity benchmark. We see an ample use of Tessellation in conjunction with the triangles required to draw all of the 1000s of units onto the screen. These two elements alone aren't that big of a deal, that is until you look at the Shading and Terrain Shading Samples. When you add all three together, you get a scenario described in the Hawaii White Papers:
> 
> In terms of Tessellation, you're looking at a Geometry Processor working with the Rasterizer in order to draw ever smaller triangles. This means more triangles being drawn in an ever more inefficient manner. This issue is further compounded by the Tessellation rate which draws smaller and smaller triangles. Drawing smaller triangles means drawing more triangles. Tahiti, Hawaii, Tonga and Fiji were not designed for this sort of abuse. They usually achieve their best performance to visual quality at a Tessellation rate of around x15/16.
> 
> But where's the Shader aspect of this equation? Have a glance at the AotS benchmark settings (marked in yellow):
> 
> The combination of all three is what is mentioned in the Hawaii White Paper slides. This is why, this has been for quite some time now, my hypothesis. I mention this because many people are still quoting my original theory and this is leading to a lot of mis-information being spread. I may be right, I may be wrong, but this is a far more likely scenario than a Pixel Fill Rate bottleneck. I say this because this is the result one obtains from the 3D Mark Pixel Fill benchmark:
> 
> The Color compression on Fiji appears to be doing its job quite nicely.
> 
> Peace.


Interesting. But isn't Fiji supposed to be superior at tesselation?
Quote:


> As with our R9 285 review, I took the time to quickly run TessMark across the x8/x16/x32/x64 tessellation factors just to see how tessellation and geometry performance scales on AMD's cards as the tessellation factor increases. Keeping in mind that all of the parts here have a 4-wide geometry front-end, the R9 285, R9 290X, and R9 Fury X all have the same geometry throughput on paper, give or take 10% for clockspeeds. What we find is that Fury X shows significant performance improvements at all levels, beating not only the Hawaii based R9 290X, but even the Tonga based R9 285. Tessellation performance is consistently 33% ahead of the R9 290X, while against Tonga it's anywhere between a 33% lead at high factors to a 130% lead at low tessellation factors, showing the influence of AMD's changes to how tessellation is handled with low factors.


Source: http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/2


----------



## ZealotKi11er

Quote:


> Originally Posted by *Klocek001*
> 
> I'd take 2500k + 60GB SSD over 2600k (back when SSDs weren't so widespread)


Meh 2600K better investment then 2500K.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> Interesting. But isn't Fiji supposed to be superior at tesselation?
> Source: http://www.anandtech.com/show/9390/the-amd-radeon-r9-fury-x-review/2


In tessellation alone? Yes. But when you combine Tessellation, Rasterization and Shading... it creates a bottleneck. That's why I didn't mention a Tessellation bottleneck but rather a Raster-bound scenario.


----------



## spacin9

Quote:


> Originally Posted by *Mahigan*
> 
> Exactly!
> We've discussed this at great length at several points during this ginormous thread. It's just that it gets lost with the amount of content being created here. This thread blew up in popularity lol
> For DX12,
> 
> We're talking Multi-Adapter. It is a far more efficient way to do SLI/Crossfire. It uses SFR (Split Frame Rendering) rather than AFR (Alternate Frame Rendering). The bonus, of using SFR, is that you don't create redundant textures in your Graphic Cards memory buffer. You effectively Split every frame into two segments and all of the textures for split frame a is in GPU1 Memory Buffer, textures for split frame b is in GPU2 Memory buffer. This means that if GPU1 has 4GB RAM and GPU2 has 4GB RAM then you now have 8GB of memory buffer. With AFR you would have still had only 4GB.


This is interesting, because I was able to get SLI working in DX 11 for Ashes via an option that I had ignored for a while "AFRGPU" in the options menu. SLi works in the Ashes benchmark in DX 11 because of this, works horribly tho. The option does not work in DX 12.

This leads me to believe, obviously, they are working on multi-GPU support. Would they have a multi-GPU option for DX 11 and separate multi-GPU option for DX 12?


----------



## Digidi

Quote:


> Originally Posted by *Mahigan*
> 
> *Just to remove some of the confusion surrounding my first assessments of Fiji in AotS...*
> 
> As most of you know, at first, I was under the impression that Fiji was bound by its Fill rate. This was my view until a user pointed me towards the Color Compression scheme found in Fiji, relative to Hawaii. From that moment on, my hypothesis has been set to the Gtris/s rate. When the Oxide developer arrived, he too concluded the same. Why is it logical to claim that Fiji and Hawaii are raster bound?
> Well that has to do with what one can observe from the Ashes of the Singularity benchmark. We see an ample use of Tessellation in conjunction with the triangles required to draw all of the 1000s of units onto the screen. These two elements alone aren't that big of a deal, that is until you look at the Shading and Terrain Shading Samples. When you add all three together, you get a scenario described in the Hawaii White Papers:
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> In terms of Tessellation, you're looking at a Geometry Processor working with the Rasterizer in order to draw ever smaller triangles. This means more triangles being drawn in an ever more inefficient manner. This issue is further compounded by the Tessellation rate which draws smaller and smaller triangles. Drawing smaller triangles means drawing more triangles. Tahiti, Hawaii, Tonga and Fiji were not designed for this sort of abuse. They usually achieve their best performance to visual quality at a Tessellation rate of around x15/16.
> 
> But where's the Shader aspect of this equation? Have a glance at the AotS benchmark settings (marked in yellow):
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> The combination of all three is what is mentioned in the Hawaii White Paper slides. This is why, this has been for quite some time now, my hypothesis. I mention this because many people are still quoting my original theory and this is leading to a lot of mis-information being spread. I may be right, I may be wrong, but this is a far more likely scenario than a Pixel Fill Rate bottleneck. I say this because this is the result one obtains from the 3D Mark Pixel Fill benchmark:
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> The Color compression on Fiji appears to be doing its job quite nicely.
> 
> Peace.


I think the bottleneck of amd was to feed the shader. You rembered the 3dmark drawcall test. There you can see that a lot of triangles can be changed into Pixels.
I calculate:
AMD has 18.000.000 Drwacalls. Each Drawcall have about 112 Triangles. So These are about 2.000.000.000 trianagles/s. We know that 3dmark is looking the max Polygonoutput till it reached 30fps. So we can say at 30fps the gpu can Output 2.000.0000.000 triangles. So each fps have 66.666.666 triangles. At UHD you have 8x more triangles than Pixels. So thats realy abuse









So bottlneck is handover to shader.


----------



## Mahigan

Quote:


> Originally Posted by *spacin9*
> 
> This is interesting, because I was able to get SLI working in DX 11 for Ashes via an option that I had ignored for a while "AFRGPU" in the options menu. SLi works in the Ashes benchmark in DX 11 because of this, works horribly tho. The option does not work in DX 12.
> 
> This leads me to believe, obviously, they are working on multi-GPU support. Would they have a multi-GPU option for DX 11 and separate multi-GPU option for DX 12?


Considering how both APIs handle Multi-GPU configurations differently, I would say, yes.


----------



## Mahigan

Quote:


> Originally Posted by *Digidi*
> 
> I think the bottleneck of amd was to feed the shader. You rembered the 3dmark drawcall test. There you can see that a lot of triangles can be changed into Pixels.
> I calculate:
> AMD has 18.000.000 Drwacalls. Each Drawcall have about 112 Triangles. So These are about 2.000.000.000 trianagles/s. We know that 3dmark is looking the max Polygonoutput till it reached 30fps. So we can say at 30fps the gpu can Output 2.000.0000.000 triangles. So each fps have 66.666.666 triangles. At UHD you have 8x more triangles than Pixels. So thats realy abuse
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *So bottlneck is handover to shader*.


I think you have a point there


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> Considering how both APIs handle Multi-GPU configurations differently, I would say, yes.


...

What about the rumor (has it been confirmed?) that DX12 will support multiple GPU vendors at once? Would it be possible to slap an Nvidia GPU and an AMD GPU in your rig. Then in situations where ASC was needed/wanted, send that to the AMD card, and then run the other goodies off the Nvidia card?

Think of the old Physx card hack for AMD users a few years back, but for ASC and Nvidia.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> ...
> 
> What about the rumor (has it been confirmed?) that DX12 will support multiple GPU vendors at once? Would it be possible to slap an Nvidia GPU and an AMD GPU in your rig. Then in situations where ASC was needed/wanted, send that to the AMD card, and then run the other goodies off the Nvidia card?
> 
> Think of the old Physx card hack for AMD users a few years back, but for ASC and Nvidia.


I think that because Multi-Adapter uses Split Frame Rendering, both adapters would have to synchronize at some point. Half a frame is rendered by GPU1 and the other half by GPU2. Therefore I think that in order for two GPUs to function in the way you're describing, the developer would have to explicitly code for this. I don't think devs will bother with that but I could be wrong.


----------



## spacin9

Quote:


> Originally Posted by *Mahigan*
> 
> It can if an engine made use of full fledged Asynchronous Compute in order to process Post Processing effects. This is what the Oxide dev told us and when nVIDIA, on site with Oxide, noticed this... they asked that the feature be shut down for their architecture.
> *
> You have to remember that AotS maxes out the CPUs already. Any additional usage of the CPU would hinder performance. GCN has a hardware scheduler, therefore it doesn't make any additional use of the CPU.*
> 
> This won't occur in every single title, we have to remember that AotS is an RTS with a lot going on at any given time (many units permeate the screen). *They process a lot of AI (what they call smart Battlegroups) where your battlegroup (a bunch of units assigned to work together as a squad) support each other intelligently. This means the gamer doesn't have to control each individual unit him/her self, the AI does it for them. AI is CPU intensive.*
> 
> I think that all of these tests, as well as Oxide's words, remove any claimed bias allegations some have leveled towards Oxide. Their game is just more parallel than others.


From what I've seen, CPU usage is higher in DX 11 (Windows 10), and in a DX 11 environment in Windows 7, there are times when some CPUs aren't being used at all. In DX 12, I'm seeing mostly 50% usage. Of course this could jump dramatically with 4-8 players, which would make DX 11 usage, if the pattern holds, completely overloaded, making it much more advantageous to play in DX 12. Of course it's an alpha and has not been optimized and I guess would affect NV users more.

*edit* just a thought. I would love if Sins of a Solar Empire could be coded with this engine somehow. Sins would be completely wicked
with more efficient *multi-threaded optimizations*. Just a note to Stardock if you are watching.









*and another edit on combat grouping* Combat grouping in Sins is more for organizational purposes. To leave a group or fleet to fight another just on auto-attack could be very hazardous. What makes Sins so wonderfully involving is that the player must combat maneuver for himself to maximize his fleets effectiveness. Doing this while having to watch and maintain your whole empire of several planets in real time makes hours of your life disappear without you barely noticing it. I've seen the combat groupings in Ashes, and I'm sure I'll want to control my combat groups myself for maximum effectiveness, that's why full strategic zoom is absolutely necessary. Another note to Oxide and Stardock...


----------



## infranoia

Quote:


> Originally Posted by *airfathaaaaa*
> 
> 
> 
> 
> 
> just an informative video about dx12 and async maybe it adds up 2 or 3 things to the table (or not i dont know )


I recommend this video to everyone on the thread. It's basic stuff and we're further in the weeds now, but it's a great overview why async compute will matter so much.

I liked his analogy with the different parts of the GPU heating up at different times; textures make your VRAM hot, particle shaders make your cores hot-- while async shading maximizes your GPU resources. That's a bit simplistic of course (and refers more to parallel tasks which of course Nvidia does well) but it's a good presentation.

All this reminds me of the old preemptive vs. cooperative multitasking 'debates' of ye olde tymes.


----------



## infranoia

Quote:


> Originally Posted by *spacin9*
> 
> *edit* just a thought. I would love if Sins of a Solar Empire could be coded with this engine somehow. Sins would be completely wicked
> with more efficient *multi-threaded optimizations*. Just a note to Stardock if you are watching.


Except that Sins is so crazy fast and butter-smooth with any CPU/GPU combo from 2008 on, even with large maps, that it would take a sequel to bump up the graphics to the point where it would matter.

Mmm... Sins sequel... That's not Ashes, unfortunately. I'm hoping the Ashes gameplay and lore gets deeper as it develops, but remember that Sins was pretty shallow when it was first launched. I expect a bunch of DLC / add-ons for Ashes, there's a minimum number guaranteed for the top tier of Founders.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> I think that because Multi-Adapter uses Split Frame Rendering, both adapters would have to synchronize at some point. Half a frame is rendered by GPU1 and the other half by GPU2. Therefore I think that in order for two GPUs to function in the way you're describing, the developer would have to explicitly code for this. I don't think devs will bother with that but I could be wrong.


For clarity I should have stated "if the developer wants", as the original rumor is that the dual GPU thing is entirely up to them. You are right though, I don't see them doing it.


----------



## airfathaaaaa

Quote:


> Originally Posted by *infranoia*
> 
> Except that Sins is so crazy fast and butter-smooth with any CPU/GPU combo from 2008 on, even with large maps, that it would take a sequel to bump up the graphics to the point where it would matter.
> 
> Mmm... Sins sequel... That's not Ashes, unfortunately. I'm hoping the Ashes gameplay and lore gets deeper as it develops, but remember that Sins was pretty shallow when it was first launched. I expect a bunch of DLC / add-ons for Ashes, there's a minimum number guaranteed for the top tier of Founders.


oh no if you create a realistic map lets say with 900 system with variable amounts of planets of them i can assure you that no system can handle it


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> oh no if you create a realistic map lets say with 900 system with variable amounts of planets of them i can assure you that no system can handle it


Time to move to 64bit on the CPU side for that


----------



## airfathaaaaa

Quote:


> Originally Posted by *Mahigan*
> 
> Time to move to 64bit on the CPU side for that


huh what you mean soas isnt 64bit? well if so that is explaing some weird stuff i was seeing after playing it a while..


----------



## CasualCat

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Meh 2600K better investment then 2500K.


This especially considering a 60GB SSD you would have bought then is a pretty lousy investment given prices now for even larger SSDs. The 2600K is still great, 60GB is tiny.


----------



## spacin9

Quote:


> Originally Posted by *infranoia*
> 
> Except that *Sins is so crazy fast and butter-smooth with any CPU/GPU combo from 2008 on, even with large maps*, that it would take a sequel to bump up the graphics to the point where it would matter.
> 
> Mmm... Sins sequel... That's not Ashes, unfortunately. I'm hoping the Ashes gameplay and lore gets deeper as it develops, but remember that Sins was pretty shallow when it was first launched. I expect a bunch of DLC / add-ons for Ashes, there's a minimum number guaranteed for the top tier of Founders.


lol... this is bait right?

Um no. It's a slide show late game. Single-threaded. The graphics are easily rendered; that much is true. But thousands of unit.. slows it to a crawl. No matter what rig you have.

Even Sup Com is multi-threaded to some extent... Core Maximizer makes it a bit smoother and more evenly distributed. But that still comes to a crawl, even with the latest CPU and video card.

Sins is shallow because there is little back-story, no campaign and a text tutorial. But Sins doesn't need a back story. It's a bug race, a military race and a religious zealot race. There's the story.. and here's the game. It's has hypnotic lasers, huge gratuitous space battles. Funny, corny one-liners. "*This* is how space junk gets born." Go play it.

And I have since 2008. Sins 2 needs some form of this engine if it ever hopes to be "butter smooth".


----------



## SlackerITGuy

Quote:


> Originally Posted by *ToTheSun!*
> 
> I can: 2600K.


Yeah, that HT sure does help in gaming


----------



## JunkoXan

Quote:


> Originally Posted by *CasualCat*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ZealotKi11er*
> 
> Meh 2600K better investment then 2500K.
> 
> 
> 
> This especially considering a 60GB SSD you would have bought then is a pretty lousy investment given prices now for even larger SSDs. The 2600K is still great, 60GB is tiny.
Click to expand...

I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.







best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.








Quote:


> Originally Posted by *SlackerITGuy*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> I can: 2600K.
> 
> 
> 
> Yeah, that HT sure does help in gaming
Click to expand...

helps with BF4,DA:I and other games I play online with.


----------



## ZealotKi11er

Quote:


> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.


I payed $270 CAD for my 3770K BNIB. For reference a 6700K costs $540 CAD after TAX lol.


----------



## JunkoXan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Quote:
> 
> 
> 
> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I payed $270 CAD for my 3770K BNIB. For reference a 6700K costs $540 CAD after TAX lol.
Click to expand...

lol, I can't really buy stuff new, way too expensive. I have to find used or free.


----------



## SlackerITGuy

Quote:


> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> *helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> *


Is that right?

Just an FYI, just because Task Manager shows usage in all 8 threads doesn't mean you're actually benefiting from it FPS wise.

HT would probably benefit from DirectX 11's multi-threaded rendering though, too bad only 1 game uses that.

Post edited by Blitz6804 to remove response to flame bait.


----------



## spacin9

I see this game better now. It seems like spamming units and power control nodes. Not too much turtling going on. Don't know how Titans or super-weapons figure in. Would be nice for a monkeylord or something. I guess they might need something for a late-game stale-mate. Still be nice to have full zoom even to see the terrain you haven't explored yet. Skipping around with the minimap doesn't feel natural. It's pretty fun.. keeps you on your toes.









*edit*

I'm sure by changing victory conditions you can dramatically change the game. If it's built with that in mind, this could be a very versatile game!


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> huh what you mean soas isnt 64bit? well if so that is explaing some weird stuff i was seeing after playing it a while..


A lot of games aren't built with 64-bit memory addressing in mind. I know... 2015 and still not the case









Open worlds of the sort would require a LOT of RAM and 64-bit memory addressing as a bare minimum. I hear that Star Citizen might even be 64-bit only.

Sources:
http://starcitizen.wikia.com/wiki/Star_Citizen

__
https://www.reddit.com/r/3eiz4v/64bit_how_much_larger_worlds_could_it/


----------



## Paul17041993

Quote:


> Originally Posted by *Mahigan*
> 
> I think that because Multi-Adapter uses Split Frame Rendering, both adapters would have to synchronize at some point. Half a frame is rendered by GPU1 and the other half by GPU2. Therefore I think that in order for two GPUs to function in the way you're describing, the developer would have to explicitly code for this. I don't think devs will bother with that but I could be wrong.


I would


----------



## HalGameGuru

Quote:


> Originally Posted by *Mahigan*
> 
> I think that because Multi-Adapter uses Split Frame Rendering, both adapters would have to synchronize at some point. Half a frame is rendered by GPU1 and the other half by GPU2. Therefore I think that in order for two GPUs to function in the way you're describing, the developer would have to explicitly code for this. I don't think devs will bother with that but I could be wrong.


I'm not sure multi-GPU is going to be so black and white, you already have demos where asymmetrical rendering is taking place, i.e. the lion's share is done on dGPU and iGPU handles a minority of the work, in multiple areas.

I think, with proper implementation, there would have to be a way for disparate GPU's to provide asymmetrical rendering AND Asymmetrical compute. Each frame having specific render or compute tasks done by specific GPU's as per their strengths.

I don't know if it will be easy, or if the support will be that granular, but the possibility follows from what has been seen and alluded to.


----------



## Kana-Maru

Quote:


> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.


Don't feel bad. I'm still rocking 2008-X58 and still have no need to upgrade. I recently retired my GTX 670s 2-Way SLI for a single AMD Fury X. Speed isn't a issue when it comes to SSDs either. PCIe 2.0 still performs great.


----------



## SpeedyVT

Quote:


> Originally Posted by *HalGameGuru*
> 
> I'm not sure multi-GPU is going to be so black and white, you already have demos where asymmetrical rendering is taking place, i.e. the lion's share is done on dGPU and iGPU handles a minority of the work, in multiple areas.
> 
> I think, with proper implementation, there would have to be a way for disparate GPU's to provide asymmetrical rendering AND Asymmetrical compute. Each frame having specific render or compute tasks done by specific GPU's as per their strengths.
> 
> I don't know if it will be easy, or if the support will be that granular, but the possibility follows from what has been seen and alluded to.


Another possibility is just to offload shader or computational operations while the dGPU focuses on the primary rendering. This will dramatically raise frames and reduce latency even more.

This could further viability in APUs.


----------



## HalGameGuru

Quote:


> Originally Posted by *SpeedyVT*
> 
> Another possibility is just to offload shader or computational operations while the dGPU focuses on the primary rendering. This will dramatically raise frames and reduce latency even more.
> 
> This could further viability in APUs.


Same vein definitely. DX12 is going to open a lot of doors to strange new places for devs and hardware vendors


----------



## JunkoXan

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Quote:
> 
> 
> 
> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> *helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> *
> 
> 
> 
> Is that right?
> 
> Just an FYI, just because Task Manager shows usage in all 8 threads doesn't mean you're actually benefiting from it FPS wise.
> 
> HT would probably benefit from DirectX 11's multi-threaded rendering though, too bad only 1 game uses that.
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> Hi! How's 2009?
> 
> Click to expand...
> 
> Lol, I wasn't aware debating whether HT actually helped or not in games was settled for good, care to educate me on that?
Click to expand...

while I enjoy the mention, FPS isn't my concern entirely. but rather the general smoothness that I get is what is meant, as my I7 when compared to my AMD 955 QC which was clocked at 4.2ghz on same games which I was actually experiencing hitchings in the Multiplayer games. I did however experience a higher average of 10-15% depending on situation in game FPS wise when compared again.

Mileage varies person to person, set up to set up.







also, I play at 1680x1050 if that has ANY real bearing.
Quote:


> Originally Posted by *Kana-Maru*
> 
> Quote:
> 
> 
> 
> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Don't feel bad. I'm still rocking 2008-X58 and still have no need to upgrade. I recently retired my GTX 670s 2-Way SLI for a single AMD Fury X. Speed isn't a issue when it comes to SSDs either. PCIe 2.0 still performs great.
Click to expand...

oh I'm not feeling bad about it, but rather happy that the hardware I have is actually going to last longer than I anticipated by a few years. I do plan on going to 390/x if I can get one cheap enough. I can reall put 8gb's to good use tbh.


----------



## PontiacGTX

Quote:


> Originally Posted by *Kana-Maru*
> 
> Don't feel bad. I'm still rocking 2008-X58 and still have no need to upgrade. I recently retired my GTX 670s 2-Way SLI for a single AMD Fury X. Speed isn't a issue when it comes to SSDs either. PCIe 2.0 still performs great.


And essentially the great futureproofing perf of westmere was the slighly higher cache,higher OC,hyperthreading+extra cores,slighly higher IPC and that new games (2011,and newer) indeed have better performance with HT


----------



## Kana-Maru

Quote:


> Originally Posted by *JunkoXan*
> 
> oh I'm not feeling bad about it, but rather happy that the hardware I have is actually going to last longer than I anticipated by a few years. I do plan on going to 390/x if I can get one cheap enough. I can reall put 8gb's to good use tbh.


Same here. I'm glad that the older tech still has legs. Intel is dragging their feet so hopefully AMD Zen can make Intel competitive again. I've seen a lot 390X benchmarks and it performs well.

Quote:


> Originally Posted by *PontiacGTX*
> 
> And essentially the great futureproofing perf of westmere was the slighly higher cache,higher OC,hyperthreading+extra cores,slighly higher IPC and that new games (2011,and newer) indeed have better performance with HT


What up PontiacGTX. Yes Westmere's definitely saved the X58. DX12 is only going to make things better.


----------



## PontiacGTX

Quote:


> Originally Posted by *Kana-Maru*
> 
> Same here. I'm glad that the older tech still has legs. Intel is dragging their feet so hopefully AMD Zen can make Intel competitive again. I've seen a lot 390X benchmarks and it performs well.
> What up PontiacGTX. Yes Westmere's definitely saved the X58. DX12 is only going to make things better.


the only thing that console port might be more limited if they dont have a better Multithreasing for PC.so the extra cores/threads wouldnt help as much id they support more thab 4c/8t,like Crysis 3,Battlefield 4,Dragon Age Inquisition,and others


----------



## PostalTwinkie

Quote:


> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.


I was rocking the 2600K until my board around a month ago died and I just went X99. You are still fine, don't feel any need to move.


----------



## JunkoXan

Quote:


> Originally Posted by *Kana-Maru*
> 
> Quote:
> 
> 
> 
> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Don't feel bad. I'm still rocking 2008-X58 and still have no need to upgrade. I recently retired my GTX 670s 2-Way SLI for a single AMD Fury X. Speed isn't a issue when it comes to SSDs either. PCIe 2.0 still performs great.
Click to expand...

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Quote:
> 
> 
> 
> Originally Posted by *JunkoXan*
> 
> I'm over here still rocking a i7 2700k (ES) Running stock speed of course never really needed to OC it.
> 
> 
> 
> 
> 
> 
> 
> best $200 Investment I ever could have done which also came with a Coolermaster 612 Heat sink (brand new) thrown in for free , aside from my $160 280x.
> 
> 
> 
> 
> 
> 
> 
> 
> helps with BF4,DA:I and other games I play online with.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I was rocking the 2600K until my board around a month ago and I just went X99. You are still fine, don't feel any need to move.
Click to expand...

i'll keep going with this 2700k until my platform isn't up to the task and become a bottleneck.


----------



## SlackerITGuy

Quote:


> Originally Posted by *JunkoXan*
> 
> while I enjoy the mention, FPS isn't my concern entirely. but rather the general smoothness that I get is what is meant, as my I7 when compared to my AMD 955 QC which was clocked at 4.2ghz on same games which I was actually experiencing hitchings in the Multiplayer games. I did however experience a higher average of 10-15% depending on situation in game FPS wise when compared again.
> 
> Mileage varies person to person, set up to set up.
> 
> 
> 
> 
> 
> 
> 
> also, I play at 1680x1050 if that has ANY real bearing.


Yeah but you were going from K10 to Sandy Bridge, that in itself is a huge jump performance/smoothness wise, doesn't mean the difference came from the 4 extra threads.

HT is useful for certain workloads, but certainly not for gaming. Most games, if not all, stop seeing increased performance after ~3 cores, that's just the way DirectX is, with its single threaded rendering.

That will change with DirectX 12 though.


----------



## Clocknut

DirectX12 still do not have perfect multithread in core distribution. U still see Core 0 are at least 10%-20% higher usage, then DirectX12 seems to pretty much 6 core support only.


----------



## SlackerITGuy

Quote:


> Originally Posted by *Clocknut*
> 
> DirectX12 still do not have perfect multithread in core distribution. U still see Core 0 are at least 10%-20% higher usage, then DirectX12 seems to pretty much 6 core support only.


That doesn't sound right, in theory it should scale beyond that, at least that was my understanding from reading so much about Mantle.

Core 0 seeing increased usage vs other Cores, maybe there's a reason behind it.

That sure isn't the case in the early showings of Vulkan:


----------



## PontiacGTX

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Most games, if not all, stop seeing increased performance after ~3 cores, that's just the way DirectX is, with its single threaded rendering.
> 
> .


nope,maybe in the games you know they use up to 2-3 cores but it doesnt have anything to do with DX11,it is how the developers choose how to use those cores,for example

Crysis 3 and Ryse will get good performance increase from going from 4c/4t to ,4 c/8t to 6c/12t,aswell DA:I and Battlefield 4


----------



## SlackerITGuy

Quote:


> Originally Posted by *PontiacGTX*
> 
> nope,maybe in the games you know they use up tp 2-3 cores but it doesnt have anything to do with DX11,it is how the developers choose how to use those cores,for example
> 
> Crysis 3 and Ryse will get good performance increase from goibg with 4 c/8t to 6c/12t,aswell DA:I and Battlefield 4


Most, if not all DirectX 9-11 games stop seeing any significant performance increase after ~3 cores, that has been proven over and over again.

That's certainly the case in Crysis 3 (Ryse should be the same as it uses the same engine):



As for Battlefield 4, it might be the case in *extreme MP scenarios* (like the unoptimized China Rising maps), but in most cases, you won't be seeing significant performance increase with higher core count CPUs.

It's worth remembering that DICE started implementing DirectX 11's Multi-Threaded Rendering into their engine back in 2011 IIRC, but they dropped it completely because they weren't seeing the results they wanted. So it's still single threaded rendering in the latest Frostbite games.

Finally, and I'm gonna mention this again, just because you're seeing a certain game use all of your threads in Task Manager doesn't mean the game is benefiting performance wise from that.

~4 cores/threads is what you really need for today's DirectX 9-11 games, anything over that is not going to get you any significant performance increase.

This should change with Mantle/DirectX 12/Vulkan games.


----------



## Clocknut

Quote:


> Originally Posted by *SlackerITGuy*
> 
> That doesn't sound right, in theory it should scale beyond that, at least that was my understanding from reading so much about Mantle.
> 
> Core 0 seeing increased usage vs other Cores, maybe there's a reason behind it.
> 
> That sure isn't the case in the early showings of Vulkan:


I wasnt sure about Vulkan, mantle, but the benchmark in the net are showing that a diminishing returns as soon as we hit 6 cores for DirectX12.

the higher Core 0 usage on DirectX12 is still a problem. It means other cores are not going to reach 100% usage due to bottlenecks on Core 0


----------



## SlackerITGuy

Quote:


> Originally Posted by *Clocknut*
> 
> I wasnt sure about Vulkan, mantle, but the benchmark in the net are showing that a diminishing returns as soon as we hit 6 cores for DirectX12.
> 
> *the higher Core 0 usage on DirectX12 is still a problem. It means other cores are not going to reach 100% usage due to bottlenecks on Core 0*


Not to be disrespectful mate but that can't be the case.

These low level APIs are just about getting the CPU out of the way, so I don't see how still high Core 0 usage (would like to see some evidence of this btw) would get close to being "a problem" in these scenarios.


----------



## Vesku

Most games still require some sort of "job manager" CPU process that makes sure everything is working properly. In most games there will eventually be some point where only more CPU speed on the core running that task will allow for more performance. What DX12 and Vulkan do is allow for that management to be much more lightweight at least when it comes to handling the graphics portion of the game.


----------



## ZealotKi11er

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Most, if not all DirectX 9-11 games stop seeing any significant performance increase after ~3 cores, that has been proven over and over again.
> 
> It's certainly not the case in Crysis 3 (Ryse should be the same as it uses the same engine):
> 
> 
> 
> As for Battlefield 4, it might be the case in *extreme MP scenarios* (like the unoptimized China Rising maps), but in most cases, you won't be seeing significant performance increase with higher core count CPUs.
> 
> It's worth remembering that DICE started implementing DirectX 11's Multi-Threaded Rendering into their engine back in 2011 IIRC, but they dropped it completely because they weren't seeing the results they wanted. So it's still single threaded rendering in the latest Frostbite games.
> 
> Finally, and I'm gonna mention this again, just because you're seeing a certain game use all of your threads in Task Manager doesn't mean the game is benefiting performance wise from that.
> 
> ~4 cores/threads is what you really need for today's DirectX 9-11 games, anything over that is not going to get you any significant performance increase.
> 
> This should change with Mantle/DirectX 12/Vulkan games.


GTX680 is trash tier. 2 x 290X will eat the CPU alive.


----------



## spacin9

I've been doing more research and some more benches. I used my g-sync 4 k monitor with g-sync off and v-sync off. 355.82 drivers

With the High preset @ 4K in Windows 10 Oc'd Titan X 1500; 3900, 5820K @ 4.35 Ghz, RAM 3000 Mhz--

DX 12 Avg. 58.5 fps.

heavy batches 55 fps.

DX 11 Avg. 67 fps.

heavy batches 58 fps.

So an 8.5 fps jump in the average, 3 fps jump in heavy batches with DX 11.

I'm seeing more CPU usage in DX 11. I'm not sure how that's related but yeah DX 12 isn't looking hot for Maxwell 2. At least not in this bench at the moment. I'm actually surprized at the difference.

The game still works great in DX 12, maybe we'll see some better optimizations for NV coming later. I'm sure there still an advantage somewhere for DX 12 and NV. Not necessarily over AMD, but a better optimization for SLI multi-GPU in DX 12 might make a huge difference. But then again if two Fury X or 390Xs Crossfire are stout without the microstutter, this is not so hot for NV.


----------



## crazycrave

To many pages to read but afew of us have been testing the benchmark over at HardOPC and we found out that the benchmark has no set limit for batches and is in fact unlimited and is ran in real time so comparing front page results is useless.. you have to go into the break down of a scene like Low Vista and look at the demand of batches placed on the system to render as it seems it can be different as the benchmark is testing the system to it's limits and the batch count is all that the system can render and with it's limit you get your avg fps..

So it would look like this..



Look at the Low Vista scene as the benchmark demanded 37031 batches or draw calls and the system rendered it at 25.6 fps

Now just because the fps was lower does not mean the performance was also lower as it could be rending 2 times as many batches as another system with a higher fps avg but less batches.


----------



## diggiddi

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *SlackerITGuy*
> 
> Most, if not all DirectX 9-11 games stop seeing any significant performance increase after ~3 cores, that has been proven over and over again.
> 
> That's certainly the case in Crysis 3 (Ryse should be the same as it uses the same engine):
> 
> 
> 
> As for Battlefield 4, it might be the case in *extreme MP scenarios* (like the unoptimized China Rising maps), but in most cases, you won't be seeing significant performance increase with higher core count CPUs.
> 
> It's worth remembering that DICE started implementing DirectX 11's Multi-Threaded Rendering into their engine back in 2011 IIRC, but they dropped it completely because they weren't seeing the results they wanted. So it's still single threaded rendering in the latest Frostbite games.
> 
> Finally, and I'm gonna mention this again, just because you're seeing a certain game use all of your threads in Task Manager doesn't mean the game is benefiting performance wise from that.
> 
> ~4 cores/threads is what you really need for today's DirectX 9-11 games, anything over that is not going to get you any significant performance increase.
> 
> This should change with Mantle/DirectX 12/Vulkan games.






In BF3 multi-player, 8 cores is always better than 6, no review will trump my own personal experience


----------



## SpeedyVT

Quote:


> Originally Posted by *Clocknut*
> 
> I wasnt sure about Vulkan, mantle, but the benchmark in the net are showing that a diminishing returns as soon as we hit 6 cores for DirectX12.
> 
> the higher Core 0 usage on DirectX12 is still a problem. It means other cores are not going to reach 100% usage due to bottlenecks on Core 0


That's because the there is no overhead like in DX11. However the benefits of more than six cores exist solely in the extent of CPU demand in the game. You've got two halves one for graphics and one for opperations. There are other portions like audio however clearly it's not important as the other variables.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Mahigan*
> 
> A lot of games aren't built with 64-bit memory addressing in mind. I know... 2015 and still not the case
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Open worlds of the sort would require a LOT of RAM and 64-bit memory addressing as a bare minimum. I hear that Star Citizen might even be 64-bit only.
> 
> Sources:
> http://starcitizen.wikia.com/wiki/Star_Citizen
> 
> __
> https://www.reddit.com/r/3eiz4v/64bit_how_much_larger_worlds_could_it/


yeah i never really thought of that i always expected that they had 32 and 64 bit versions but at least for soase that is not the case seems like they abandoned the game for quite some time now and that explains why the game with the stargate mod was so horrible when the volumetric begin to add up on fights....







(tho this game was horrible even when you add up only graphics mods..)
(as for sc i too think 64 bit is the only way for it and for no man's sky too)


----------



## Paul17041993

Quote:


> Originally Posted by *Clocknut*
> 
> DirectX12 still do not have perfect multithread in core distribution. U still see Core 0 are at least 10%-20% higher usage, then DirectX12 seems to pretty much 6 core support only.


That's just the windows scheduler, it will load the first cores mostly first and leave the last ones more open, particularly notable with 8 core CPUs as 6 and 7 will be left idle a lot of the time. In the case of HT it will try to only load one thread for each core until it really peaks out, of which both effects stack and the core 0 usage will always look very high for HT processors.

So just ignore the per-core/thread usage, if you really want to see thread behaviour open up something like process explorer and look at the thread tab for the process.


----------



## spacin9

Quote:


> Originally Posted by *crazycrave*
> 
> To many pages to read but afew of us have been testing the benchmark over at HardOPC and we found out that the benchmark has no set limit for batches and is in fact unlimited and is ran in real time so comparing front page results is useless.. you have to go into the break down of a scene like Low Vista and look at the demand of batches placed on the system to render as it seems it can be different as the benchmark is testing the system to it's limits and the batch count is all that the system can render and with it's limit you get your avg fps..
> 
> So it would look like this..
> 
> 
> 
> Look at the Low Vista scene as the benchmark demanded 37031 batches or draw calls and the system rendered it at 25.6 fps
> 
> Now just because the fps was lower does not mean the performance was also lower as it could be rending 2 times as many batches as another system with a higher fps avg but less batches.


This is a good find thx. I was looking for something like this.









Low Vista for high preset (no AA at all) @ 4K gave me about 22,000 draw calls in DX 11 and DX 12. Most of the batch count is near the same for most tests, some are different, with less batches in DX 12, and one dramatically different "FCC2" with 31089 batches in DX 11 @ 58.6 FPS vs DX 12 only 6758 batches and 47.7 fps. All of my DX 12 tests were bested by DX 11.

The breakdown shows less details in DX 11 vs DX 12. But in all the tests in DX 12, I am completely GPU bound 100 percent. The suggested CPU FPS is much higher than the actual FPS. It looks like a
multi-GPU option would be great for the green team.


----------



## Forceman

So some folks at Beyond3d have started looking into the async compute test with GPUView and have made some interesting discoveries. It appears as though the test is not actually accessing both the graphics and compute queues on Maxwell cards, but instead doing everything in the graphics queue, while using both on GCN. Other games and OpenCL programs do access the compute queue on Maxwell, so it is available, it just isn't working right with this test. But since accessing both queues is a prerequisite for async computing, it may be that test isn't really showing anything one way or the other. Hopefully more to come.


----------



## Silent Scone

Quote:


> Originally Posted by *Forceman*
> 
> So some folks at Beyond3d have started looking into the async compute test with GPUView and have made some interesting discoveries. It appears as though the test is not actually accessing both the graphics and compute queues on Maxwell cards, but instead doing everything in the graphics queue, while using both on GCN. Other games and OpenCL programs do access the compute queue on Maxwell, so it is available, it just isn't working right with this test. But since accessing both queues is a prerequisite for async computing, it may be that test isn't really showing anything one way or the other.


Shocker. Sarcasm aside, it should be plain to all that these canned tests should be taken with several granules.


----------



## SlackerITGuy

Quote:


> Originally Posted by *diggiddi*
> 
> 
> In BF3 multi-player, 8 cores is always better than 6, no review will trump my own personal experience


*Maybe* if you stream or have a lot of background apps while you play, but with a single threaded rendering engine like the Frostbite 2 engine, there's no way you'd see any significant performance increase going from Thuban to Bulldozer.

Proof (and this is on 720p):


----------



## Noufel

Quote:


> Originally Posted by *Forceman*
> 
> So some folks at Beyond3d have started looking into the async compute test with GPUView and have made some interesting discoveries. It appears as though the test is not actually accessing both the graphics and compute queues on Maxwell cards, but instead doing everything in the graphics queue, while using both on GCN. Other games and OpenCL programs do access the compute queue on Maxwell, so it is available, it just isn't working right with this test. But since accessing both queues is a prerequisite for async computing, it may be that test isn't really showing anything one way or the other. Hopefully more to come.


Apparently maxwell 2.0 can do async compute in others physics game and opencl benchs
https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-26#post-1870028


----------



## Xuper

All Slides about async compute were AMD's Slides, now Where is Nv Slide ? If Maxwell really can do AC , then can it perform Parallel workload? We know that AC is an ability to execute a graphics and compute queue in parallel, right ? Why i can't see in any valid test ? I saw result of AC , only in CGN.We waited for at least 1 week and still No respond from NV.


----------



## ToTheSun!

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Quote:
> 
> 
> 
> Originally Posted by *diggiddi*
> 
> 
> In BF3 multi-player, 8 cores is always better than 6, no review will trump my own personal experience
> 
> 
> 
> *Maybe* if you stream or have a lot of background apps while you play, but with a single threaded rendering engine like the Frostbite 2 engine, there's no way you'd see any significant performance increase going from Thuban to Bulldozer.
> 
> Proof (and this is on 720p):
> 
> 
> Spoiler: Warning: Spoiler!
Click to expand...

Yeah, let's ignore the minimum FPS difference between the 3570K and the 3770K.
Let's also ignore the fact that a 60-70 FPS benchmark is meaningless for 100+ Hz monitor users.


----------



## Forceman

Summary of the findings from the GPUView analysis:
Quote:


> In the asynccompute test, I believe the crashing behavior is just a driver bug or something it was never designed to handle. It's seeing a heavy graphics workload (made worse by the fact that the compute is also in the graphics queue) - but there's not any graphics activity on the screen. And since it's running in a window, it has to compete with running the graphics for the windows desktop. In that single command list test you can see the time spent processing the DWM command gets longer and longer as it goes on, and there's a corresponding increase CPU usage in csrss.exe - both things that are tiny sliver when theyre not run alongside the benchmark get stretched out to extraordinary lengths. It's almost as if the driver isn't able to properly preempt the benchmark to run the DWM, and it's just burning away CPU cycles and it switches between the DWM and an ever increasing test load. To me it doesn't look like this is really revealing anything about whether or not Maxwell supports async compute, either its just a bug or a normal reaction to an abnormal load, one that GCN happens to handle more gracefully.
> 
> The primary thing that GPUview revealed is that GCN considers the compute portion of the test as compute, while Maxwell still considers it graphics. This either tells us that A) Maxwell has a smaller or different set of operations that it considers compute, B) Maxwell needs to be addressed a certain/different way to consider it compute or C) it's another corner case or driver bug. And it's possible that whatever was happening in the ashes benchmark that was causing the performance issues is the same thing that's happening here. But we've got enough examples of stuff actually using the compute queue from CUDA to OpenCL, so it's absolutely functional.
> 
> So first we need to find some way to send in DX12 a concurrent graphics workload alongside a compute workload that Maxwell recognizes as compute, and see if there's any async behavior in that context. Unless the async test can be modified to do this, I think it's utility has run it's course and it's revealed a lot along the way.


https://forum.beyond3d.com/posts/1870090/

So it doesn't appear that the async compute test is really telling us anything about whether Maxwell can do async compute or not.


----------



## Cyro999

Quote:


> Originally Posted by *diggiddi*
> 
> 
> In BF3 multi-player, 8 cores is always better than 6, no review will trump my own personal experience


You are using an FX CPU based on bulldozer architecture. If you have "6 cores" enabled, you have 3 modules available and can only run 3 threads at full speed.

By enabling "8 cores", you're turning on the fourth module and even on a 4-threaded workload, your performance would improve.

It's common to see with those CPU's that while a regular CPU (intel, phenom II etc) will stop scaling at 3 cores, the bulldozer-based CPU will need 3 -MODULES-. So it's not a surprise if an Intel CPU needs 4 cores for bf4 but you need 4 modules (which enables 8 threads) on the AMD side.

Your experience would thus not carry on to people with 5+ full cores


----------



## ku4eto

Quote:


> Originally Posted by *ToTheSun!*
> 
> Yeah, let's ignore the minimum FPS difference between the 3570K and the 3770K.
> Let's also ignore the fact that a 60-70 FPS benchmark is meaningless for 100+ Hz monitor users.


Please excuse the vast majority of peasant gamers that are running on 60Hz monitor.


----------



## Cyro999

The lower FPS you want to play with, the less relevant CPU is


----------



## Noufel

if only mixing gpus was possible


----------



## ToTheSun!

Quote:


> Originally Posted by *ku4eto*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> Yeah, let's ignore the minimum FPS difference between the 3570K and the 3770K.
> Let's also ignore the fact that a 60-70 FPS benchmark is meaningless for 100+ Hz monitor users.
> 
> 
> 
> Please excuse the vast majority of peasant gamers that are running on 60Hz monitor.
Click to expand...

I'm not sure how your setup or your comment pertain to the debate, but, ok, you're excused.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Noufel*
> 
> if only mixing gpus was possible


Could never do it. OCD.


----------



## Noufel

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> if only mixing gpus was possible
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Could never do it. OCD.
Click to expand...

Imagine 290 cfx for Async compute stuff and a 980ti for eye candy dx12 features ....


----------



## Devnant

Quote:


> Originally Posted by *Forceman*
> 
> Summary of the findings from the GPUView analysis:
> https://forum.beyond3d.com/posts/1870090/
> 
> So it doesn't appear that the async compute test is really telling us anything about whether Maxwell can do async compute or not.


Yes. Darius and trandoanhung1991 showed concrete evidence Maxwell 2 can do async compute while running Physx titles and OpenCL benchmarks on DX11.

Here is Batman Arkham Knight:



Here is PLAbenchmark:



Notice the hardware queue and compute queue working in parallel?

And this is what happens on that async compute test:



GPUView shows both compute and graphics queues are running on the same hardware queue, with only a tiny brown dot on the compute queue. And that explains why compute + graphics are adding up latency almost perfectly on that test.


----------



## Noufel

Quote:


> Originally Posted by *Devnant*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Forceman*
> 
> Summary of the findings from the GPUView analysis:
> https://forum.beyond3d.com/posts/1870090/
> 
> So it doesn't appear that the async compute test is really telling us anything about whether Maxwell can do async compute or not.
> 
> 
> 
> Yes. Darius and trandoanhung1991 showed concrete evidence Maxwell 2 can do async compute while running Physx titles and OpenCL benchmarks on DX11.
> 
> Here is Batman Arkham Knight:
> 
> 
> 
> Here is PLAbenchmark:
> 
> 
> 
> Notice the hardware queue and compute queue working in parallel?
> 
> And this is what happens on that async compute test:
> 
> 
> 
> GPUView shows both compute and graphics queues are running on the same hardware queue, with only a tiny brown dot on the compute queue. And that explains why compute + graphics are adding up latency almost perfectly on that test.
Click to expand...

Can Mahigan give us his toughts on this ?


----------



## CasualCat

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Could never do it. OCD.


Assuming it is visual, replace the coolers with full blocks and back plates. Now they look the same.


----------



## ToTheSun!

Quote:


> Originally Posted by *CasualCat*
> 
> Now they look the same.


To the untrained eye!


----------



## Devnant

Quote:


> Originally Posted by *crazycrave*
> 
> To many pages to read but afew of us have been testing the benchmark over at HardOPC and we found out that the benchmark has no set limit for batches and is in fact unlimited and is ran in real time so comparing front page results is useless.. you have to go into the break down of a scene like Low Vista and look at the demand of batches placed on the system to render as it seems it can be different as the benchmark is testing the system to it's limits and the batch count is all that the system can render and with it's limit you get your avg fps..
> 
> So it would look like this..
> 
> 
> 
> Look at the Low Vista scene as the benchmark demanded 37031 batches or draw calls and the system rendered it at 25.6 fps
> 
> Now just because the fps was lower does not mean the performance was also lower as it could be rending 2 times as many batches as another system with a higher fps avg but less batches.


Thanks for posting this. I only wish I had access to the benchmark to check the demand for batches on my system.


----------



## airfathaaaaa

so physx is taping into compute and thus is actually making the use of the
Quote:


> Originally Posted by *Devnant*
> 
> Yes. Darius and trandoanhung1991 showed concrete evidence Maxwell 2 can do async compute while running Physx titles and OpenCL benchmarks on DX11.
> 
> Here is Batman Arkham Knight:
> 
> 
> 
> Here is PLAbenchmark:
> 
> 
> 
> Notice the hardware queue and compute queue working in parallel?
> 
> And this is what happens on that async compute test:
> 
> 
> 
> GPUView shows both compute and graphics queues are running on the same hardware queue, with only a tiny brown dot on the compute queue. And that explains why compute + graphics are adding up latency almost perfectly on that test.


any cgn card screenshot of heavy compute use?


----------



## semitope

Quote:


> Originally Posted by *Forceman*
> 
> Summary of the findings from the GPUView analysis:
> https://forum.beyond3d.com/posts/1870090/
> 
> So it doesn't appear that the async compute test is really telling us anything about whether Maxwell can do async compute or not.


I highly doubt batman and the other software being used to show compute working are trying to do the tasks asynchronously. It's not supported in dx11.

Also not sure why its a big deal that the compute queue is working. Seemed they thought it was a big deal in the b3d thread. Maxwell can do compute... duh. Ashes probably will show a similar result since they put the compute tasks through the GPU. They just don't do it asynchronously.


----------



## crazycrave

I have no idea what most of these people are talking about and are way off base from the benchmark of this topic.. now I only wish I had a 980Ti to compare to my 290x in the Low Vista scene or any of the scene's just to see how many batches it is rendering at said FPS as it may be even worst then reports just showing avg fps of heavy batches. but the test would have to be on the same computer just to see which vender is rendering the most batches using the same settings.


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> so physx is taping into compute and thus is actually making the use of the
> any cgn card screenshot of heavy compute use?


Hmmm...

PhysX runs separately from the DX path does it not? It's like having one API and a set of libraries working in parallel afaik. Could be that nVIDIA haven't coded their Work distributor (which I believe is in software or the driver) to function in Parallel under DX12. There are 3 queues, in GCN and DX12, which you feed. Graphic, Compute and Copy. The Work Distributor is supposed to be feeding each queue based on the requests made by the developer in his/her code (for all we know this hasn't been done by the person who coded the test). You're supposed to time your Async coding in between executions. GCN can do this on its own (though you'll get horrible results if you don't code it properly as per many dev statements in talks I've listened too). Maxwell 2 doesn't appear to be able to do that. AMD Mentioned "slow context switching", I think that they mean't that nVIDIA does this in software... this would take up CPU cycles when you have the amount of batches AotS throws at the driver (which transfers the tasks to the GPU). I think there is credence to the software emulation theory but rather than being something malevolent, as some have stated, it is rather due to the fact that since Kepler, nVIDIA has relied on a software driver scheduler. When working on serial tasks, it's quite easy to code a driver to work this way (that's why it works in DX11). You have less CPUs to worry about (less threading).

It could be that the silence from nVIDIA is because they're working on a driver, for DX12, particularly on the Work Distributor portion of the driver. I think nVIDIA has to re-write their entire software scheduler in order to get Async working. I'm not a developer or a programmer. I only understand the hardware engineering side of things. Since there are no White Papers, showing block diagrams of Maxwell 2, I can't see if any changes have been made to the hardware which would allow it to behave more like GCN (the last diagrams I've seen were of changes to the SMM/SMX nothing else). To me, since nVIDIA refer to the Kepler documentation when speaking on Maxwell, it seems that most of the changes made, Maxwell to Maxwell 2, were in software.

I wonder what would happen if they ran Arkham Knight with a card which doesn't support Async (say Kepler but especially Maxwell), if it behaves the same then you're still stuck at the starting point. If it doesn't behave the same then the issue is likely tied to either the way that benchmark was coded or the Software scheduler used by nVIDIA in their DX12 driver.


----------



## Blameless

Quote:


> Originally Posted by *Mahigan*
> 
> PhysX runs separately from the DX path does it not?


Since PhysX doesn't need DX at all, I would assume so.

Most PhysX benchmarks are based around OGL for the graphics portions.


----------



## airfathaaaaa

i have no idea *** i was typing somehow part of my answer on beyond came here lol

but yeah that is what i thought since physx and thus every god awfull nvidia aftermarket api extension isnt part of dx then there is not a single point to actually use this as a credible fact


----------



## Devnant

Quote:


> Originally Posted by *airfathaaaaa*
> 
> i have no idea *** i was typing somehow part of my answer on beyond came here lol
> 
> but yeah that is what i thought since physx and thus every god awfull nvidia aftermarket api extension isnt part of dx then there is not a single point to actually use this as a credible fact


They also ran OpenCL benchmarks and found the same behaviour.

My bad. They said the compute queue was accessed by OpenCL benchmarks, but not the async compute test.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> They also ran OpenCL benchmarks and found the same behaviour.


Is the behavior also found with a GTX 750 Ti? That's what I would test.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> Is the behavior also found with a GTX 750 Ti? That's what I would test.


My bad Mahigan. That was concerning access to the compute queue. I've already corrected my previous post.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> My bad Mahigan. That was concerning access to the compute queue. I've already corrected my previous post.


No need to apologize.









We all make mistakes.


----------



## semitope

If it was so simple to get tasks done asynchronously there'd be no need for the noise about it in dx12. If all you had to do was send a compute and graphics task at the same time in dx11.

A good check would be to see if ashes shows the same pattern on nvidia as batman does, for example. My guess is it would since they do the compute tasks done on GCN as well.


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> If it was so simple to get tasks done asynchronously there'd be no need for the noise about it in dx12. If all you had to do was send a compute and graphics task at the same time in dx11.
> 
> A good check would be to see if ashes shows the same pattern on nvidia as batman does, for example. My guess is it would since they do the compute tasks done on GCN as well.


I like how devoted they are to get to the bottom of this issue.







This is how the PC Gaming and Enthusiast community used to be back in the day.

On a different note... if AMD keeps acting like the ethical and moral folks they'll go out of business...
Quote:


> We are actively promoting HBM and do not collect royalties - AMD


http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-we-are-actively-promoting-usage-of-hbm-and-do-not-collect-royalties/
Quote:


> We are actively encouraging widespread adoption of all HBM associated technology on [Radeon R9] Fury products and there is no IP licensing associated.


----------



## Themisseble

Did anyone tried async shadrs with AMD APU like A10 7850K?


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> I like how devoted they are to get to the bottom of this issue.
> 
> 
> 
> 
> 
> 
> 
> This is how the PC Gaming and Enthusiast community used to be back in the day.
> 
> On a different note... if AMD keeps acting like the ethical and moral folks they'll go out of business...
> http://www.kitguru.net/components/graphic-cards/anton-shilov/amd-we-are-actively-promoting-usage-of-hbm-and-do-not-collect-royalties/


They claimed they aren't collecting royalties, that doesn't mean they aren't collecting revenue generated off HBM sales.


----------



## JohnLai

Quote:


> Originally Posted by *PostalTwinkie*
> 
> They claimed they aren't collecting royalties, that doesn't mean they aren't collecting revenue generated off HBM sales.


How about the recent news on AMD having exclusivity for first batch of HBM2?

Benchmark aside, I hope AOTS campaign will be good.


----------



## Mahigan

Quote:


> Originally Posted by *JohnLai*
> 
> How about the recent news on AMD having exclusivity for first batch of HBM2?
> 
> Benchmark aside, I hope AOTS campaign will be good.


This was in response to that recent news if you read the article. They might get exclusivity, it's their tech after-all, but they aren't collecting any royalties.
Quote:


> Advanced Micro Devices owns a number of patents covering HBM, but as that intellectual property is a part of JEDEC's JESD235 standard, it has to be licensed to applicants desiring to implement the standard "either without compensation or under reasonable terms and conditions that are free of any unfair discrimination.


In other words... like Adaptive Sync, another Open Standard.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> This was in response to that recent news if you read the article. They might get exclusivity, it's their tech after-all, but they aren't collecting any royalties.
> In other words... like Adaptive Sync, another Open Standard.


Don't forget GDDR5.

Speaking of that.....

Did you have something to do with that back in the day? Where you engineering GPUs during the development and move of GDDR5?


----------



## GorillaSceptre

Any developments today? Nvidia respond yet?


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Any developments today? Nvidia respond yet?


Beyond3D have noticed some odd stuff happening with Arkham Knight under DX11. They believe they are seeing evidence of Async compute on Maxwell 2.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Forceman*
> 
> Summary of the findings from the GPUView analysis:
> https://forum.beyond3d.com/posts/1870090/
> 
> So it doesn't appear that the async compute test is really telling us anything about whether Maxwell can do async compute or not.


Okay.. But why is GCN is seeing the workflow as Compute, but Maxwell see's it as graphics?

Doesn't that mean that Nvidia is using some sort of software implementation for Async?

Edit:

Seems the B3D folks are wondering the same.
Quote:


> Originally Posted by *Mahigan*
> 
> Beyond3D have noticed some odd stuff happening with Arkham Knight under DX11. They believe they are seeing evidence of Async compute on Maxwell 2.


How is that happening under DX11?


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Okay.. But why is GCN is seeing the workflow as Compute, but Maxwell see's it as graphics?
> 
> Doesn't that mean that Nvidia is using some sort of software implementation for Async?
> 
> Edit:
> 
> Seems the B3D folks are wondering the same.
> How is that happening under DX11?


Exactly... but we'll see where this goes







I mean I never thought Maxwell 2 couldn't perform Async Compute when I first wrote my theory. I was under the impression is couldn't perform it as well.

When the Oxide developer stated its performance was dreadful (queue in Simon Cowell, but also mirroring my theory) and he had to disable it... followed by "afaik Maxwell can't do Async Compute"... that's when that became the topic.


----------



## provost

Quote:


> Originally Posted by *Forceman*
> 
> So some folks at Beyond3d have started looking into the async compute test with GPUView and have made some interesting discoveries. It appears as though the test is not actually accessing both the graphics and compute queues on Maxwell cards, but instead doing everything in the graphics queue, while using both on GCN. Other games and OpenCL programs do access the compute queue on Maxwell, so it is available, it just isn't working right with this test. But since accessing both queues is a prerequisite for async computing, it may be that test isn't really showing anything one way or the other. Hopefully more to come.


I don't see how their explanation, albeit with my limited understanding, refutes anything regarding the benefit of having async compute/graphics over whatever convoluted scheme Maxwell is using.

Right now, Nvidia has every incentive under the sun to prove Mahigan as a false prophet (sorry mahigan..just making a point







) and oxide as a bunch of nobodies trying to punch above their weight class (allegedly to help out amd), by providing a detailed explanation of dx 12 performance delta that is materially accruing more benefit to AMD cards than Maxwell cards.

But, that is not happening. If there is/was a credible rebuttal to AMD, Mahigan and Oxide's claims, it would be coming directly from Nvidia, and not from any tech site (no matter how talented its members may be in the science and art of graphic card software and drivers)


----------



## Forceman

Quote:


> Originally Posted by *semitope*
> 
> Also not sure why its a big deal that the compute queue is working. Seemed they thought it was a big deal in the b3d thread. Maxwell can do compute... duh. Ashes probably will show a similar result since they put the compute tasks through the GPU. They just don't do it asynchronously.


It's a big deal because it's pretty hard to determine if a card can do compute and graphics asynchronously if the card doesn't know it's supposed to be doing both compute and graphics. And for this test, that is what is happening, the card is handling everything like it is graphics for some reason.

Quote:


> Originally Posted by *Mahigan*
> 
> I like how devoted they are to get to the bottom of this issue.
> 
> 
> 
> 
> 
> 
> 
> This is how the PC Gaming and Enthusiast community used to be back in the day.


It's also a great indication of why people need to be cautious about taking these kinds of initial results (especially from not well understood tests) as the gospel truth and using them as the basis for sweeping proclamations.
Quote:


> Originally Posted by *GorillaSceptre*
> 
> Okay.. But why is GCN is seeing the workflow as Compute, but Maxwell see's it as graphics?
> 
> Doesn't that mean that Nvidia is using some sort of software implementation for Async?
> 
> Edit:
> 
> Seems the B3D folks are wondering the same.


That appears to be the million dollar question. The cards can obviously use the compute queue, so why does this test not access it properly. Probably a coding error of some kind, but could be something in drivers also. But until it gets figured out, it pretty much invalidates this program for Maxwell.
Quote:


> Originally Posted by *provost*
> 
> I don't see how their explanation, albeit with my limited understanding, refutes anything regarding the benefit of having async compute/graphics over whatever convoluted scheme Maxwell is using.


Why is the immediate assumption that Nvidia is using a convoluted scheme here? Seems much more likely that a guy who has little or no experience coding in DX12 simply wrote a program that isn't doing quite what he thinks it is supposed to be doing. More investigation and testing needs to be done before conclusions are drawn.


----------



## CasualCat

Quote:


> Originally Posted by *provost*
> 
> I don't see how their explanation, albeit with my limited understanding, refutes anything regarding the benefit of having async compute/graphics over whatever convoluted scheme Maxwell is using.
> 
> Right now, Nvidia has every incentive under the sun to prove Mahigan as a false prophet (sorry mahigan..just making a point
> 
> 
> 
> 
> 
> 
> 
> ) and oxide as a bunch of nobodies trying to punch above their weight class (allegedly to help out amd), by providing a detailed explanation of dx 12 performance delta that is materially accruing more benefit to AMD cards than Maxwell cards.
> 
> But, that is not happening. If there is/was a credible rebuttal to AMD, Mahigan and Oxide's claims, it would be coming directly from Nvidia, and not from any tech site (no matter how talented its members may be in the science and art of graphic card software and drivers)


And how many people will dismiss the first thing Nvidia says anyhow unless it is to confirm they can't do async compute? There is an outspoken portion of the community who (right or wrong) has made it abundantly clear they don't trust anything Nvidia does or says. Counter to that there'll be people who believe whatever they say without question.

I think third party investigation like this is best for everyone. I'm cheering on the B3D people as they're actively trying to figure it out themselves and appear to have the technical expertise to eventually get to an answer.


----------



## PostalTwinkie

Quote:


> Originally Posted by *CasualCat*
> 
> And how many people will dismiss the first thing Nvidia says anyhow unless it is to confirm they can't do async compute? There is an outspoken portion of the community who (right or wrong) has made it abundantly clear they don't trust anything Nvidia does or says. Counter to that there'll be people who believe whatever they say without question.
> 
> I think third party investigation like this is best for everyone. I'm cheering on the B3D people as they're actively trying to figure it out themselves and appear to have the technical expertise to eventually get to an answer.


Nvidia already said there was an issue with the AoS benchmark, but it was denied by the Dev repeatedly.

Once again, only time will tell us.


----------



## Casey Ryback

Quote:


> Originally Posted by *provost*
> 
> If there is/was a credible rebuttal to AMD, Mahigan and Oxide's claims, it would be coming directly from Nvidia, and not from any tech site (no matter how talented its members may be in the science and art of graphic card software and drivers)


I doubt it, because then people would claim they are biased in the matter and just trying to defend their product.


----------



## semitope

Quote:


> Originally Posted by *Casey Ryback*
> 
> I doubt it, because then people would claim they are biased in the matter and just trying to defend their product.


why? All they have to do is show it supports it.

I don't expect they will do that. They disabled it rather than get it to work well in ashes. They had the opportunity there to make it right (access to the game code and allowed to offer code to the devs and they only removed it). If they could, big bad huge budget software masters nvidia would have. Their dx11 performance is an example of that.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Mahigan*
> 
> Exactly... but we'll see where this goes
> 
> 
> 
> 
> 
> 
> 
> I mean I never thought Maxwell 2 couldn't perform Async Compute when I first wrote my theory. I was under the impression is couldn't perform it as well.
> 
> When the Oxide developer stated its performance was dreadful (queue in Simon Cowell, but also mirroring my theory) and he had to disable it... followed by "afaik Maxwell can't do Async Compute"... that's when that became the topic.


didnt you said that it cant perform 1 compute and 32 graphics ? isnt that we are seeing as "gaps" on the screenshots?


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> didnt you said that it cant perform 1 compute and 32 graphics ? isnt that we are seeing as "gaps" on the screenshots?


1 Graphics and 31 Compute in Parallel (Asynchronously) or 32 Compute.

Like this:


Spoiler: Warning: Spoiler!


----------



## PontiacGTX

Quote:


> Originally Posted by *SlackerITGuy*
> 
> Most, if not all DirectX 9-11 games stop seeing any significant performance increase after ~3 cores, that has been proven over and over again.


Misleading. it is dependent on the game dev how they use those resources
Quote:


> That's certainly the case in Crysis 3 (Ryse should be the same as it uses the same engine):
> 
> 
> Spoiler: Warning: Spoiler!


It seems you dont how to the things works. the avg FPS wont show other than the average between the work for the CPU and GPU. at least having a frametime or a min avg could help more than a test that only shows a GPU bound scenario
and Ryse doesnt use the same version of CE that Crysis 3 uses..
this is a valid comparison.


Spoiler: Crysis 3 Benchmark








Quote:


> As for Battlefield 4, it might be the case in *extreme MP scenarios* (like the unoptimized China Rising maps), but in most cases, you won't be seeing significant performance increase with higher core count CPUs.





Spoiler: Battlefield 4 MP DX11






Quote:


> It's worth remembering that DICE started implementing DirectX 11's Multi-Threaded Rendering into their engine back in 2011 IIRC


Nope, more than a quad core in BF3 is pointless...
Quote:


> So it's still single threaded rendering in the latest Frostbite games.


alll games have benefit with higher IPC..
Quote:


> Finally, and I'm gonna mention this again, just because you're seeing a certain game use all of your threads in Task Manager


not really. because whats the point of a core being used if it sint doing anything in the engine?


Spoiler: Crysis 3 CPU Usage










Spoiler: CPU Usage BF4







Quote:


> ~4 cores/threads is what you really need for today's DirectX 9-11 games, anything over that is not going to get you any significant performance increase.


thats dependent on the developer and if they decide to use 8 cores for all the engine task they will get more performance over a 4 core[/SPOILER]


----------



## Mahigan

After that the Oxide dev was like... It was too slow so NVIDIA asked that we shut it down. And I was like.. That's what I figured. Then Oxide dev was like again Maxwell can't do Async compute.

Beyond3D were like... Yes it can. So they made a test to counter what the Oxide dev stated.

Test was all like.. No it can't.

And I was all like... Yes it can but it's software driven (software compiler). Peeps here were like... There's no evidence it is software driven.

Now Beyond3D is like... Yes it can, we think, let us look into it more...

And that's where we are today.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> After that the Oxide dev was like... It was too slow so NVIDIA asked that we shut it down. And I was like.. That's what I figured. Then Oxide dev was like again Maxwell can't do Async compute.
> 
> Beyond3D were like... Yes it can. So they made a test to counter what the Oxide dev stated.
> 
> Test was all like.. No it can't.
> 
> And I was all like... Yes it can but it's software driven (software compiler). Peeps here were like... There's no evidence it is software driven.
> 
> Now Beyond3D is like... Yes it can, we think, let us look into it more...
> 
> And that's where we are today.


could Oxide have been a little biased toward a certain gpu vendor and didn't put more effort to optimize their engine for another gpu vendor


----------



## ku4eto

Quote:


> Originally Posted by *Mahigan*
> 
> This was in response to that recent news if you read the article. They might get exclusivity, it's their tech after-all, but they aren't collecting any royalties.
> In other words... like Adaptive Sync, another Open Standard.


AMD should have gotten a patent on HBM, moneys are moneys, and they need them for better R&D or software development (aka drivers). Nothing wrong with being Open Sourced, but if you go out of business, whats the point?


----------



## Mahigan

I don't think Oxide was biased. They put in more effort with NVIDIA than AMD.

I think NVIDIA's solution couldn't handle the parallel load or their software scheduler is borked.

Either way, even it it does end up working, it won't work as efficiently at it as GCN does, imo.


----------



## Xuper

wow! Nvidia in Wikipedia is under heavy pressure!

https://en.wikipedia.org/wiki/GeForce_900_series#Limited_DirectX_12_support

https://en.wikipedia.org/wiki/User_talk:210.187.222.146

https://en.wikipedia.org/w/index.php?title=GeForce_900_series&action=history


----------



## Mahigan

I wouldn't be surprised to see the CPU usage spike for NVIDIA's solution if it does work.

Oxide would have had a choice. Limit the CPU load (the amount of batches) thus hindering the gaming experience they want to deliver, or, keep the game as is and code a different vendor ID path to process the post processing effects for NVIDIA. In the end... Both architectures are playable. Oxide would have made the right decision.

NVIDIA was more concerned about looking bad in a benchmark than ruining a game that, for an RTS, is shaping up to be very cool.


----------



## infranoia

And I'm like, Nvidia must just be sadistic, watching everyone spin in circles.









I have a feeling radio silence is just the way it's going to be...


----------



## Mahigan

Quote:


> Originally Posted by *infranoia*
> 
> And I'm like, Nvidia must just be sadistic, watching everyone spin in circles.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I have a feeling radio silence is just the way it's going to be...


Well their customers are panicking. Radio silence doesn't appear to be good customer service on their part. They could answer "We're aware of the issue and we're working to resolve it". Instead we get nothing. People are selling their cards... I'm not making this up... a LOT of people too.


----------



## ku4eto

Quote:


> Originally Posted by *Xuper*
> 
> wow! Nvidia in Wikipedia is under heavy pressure!
> 
> https://en.wikipedia.org/wiki/GeForce_900_series#Limited_DirectX_12_support
> https://en.wikipedia.org/wiki/User_talk:210.187.222.146
> https://en.wikipedia.org/w/index.php?title=GeForce_900_series&action=history


Its a fanboy from Malaysia it seems, the IP belongs to Malaysian ISP, the Whois information of the domain of the name servers doesn't show anything linked with nVidia.


----------



## Mahigan

Quote:


> Originally Posted by *ku4eto*
> 
> Its a fanboy from Malaysia it seems, the IP belongs to Malaysian ISP, the Whois information of the domain of the name servers doesn't show anything linked with nVidia.


I think that was his point. Some guy is trying to change the Wikipedia entries to reflect the current scandal. Way too soon to be doing that.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> I think that was his point. Some guy is trying to change the Wikipedia entries to reflect the current scandal. Way too soon to be doing that.


It really does highlight the idiots that circle around these scenarios, blindly lusting for one side or the other. Which really gets in the way of legitimate discussion.

Once upon a time, the Internet and Web was idiot free. Then I logged on!!!


----------



## CasualCat

Quote:


> Originally Posted by *Mahigan*
> 
> After that the Oxide dev was like... It was too slow so NVIDIA asked that we shut it down. And I was like.. That's what I figured. Then Oxide dev was like again Maxwell can't do Async compute.
> 
> *Beyond3D were like... Yes it can. So they made a test to counter what the Oxide dev stated.*
> 
> Test was all like.. No it can't.
> 
> And I was all like... Yes it can but it's software driven (software compiler). Peeps here were like... There's no evidence it is software driven.
> 
> Now Beyond3D is like... Yes it can, we think, let us look into it more...
> 
> And that's where we are today.


And part of that is a mischaracterization of what occurred. The first post regarding the test was this:
https://forum.beyond3d.com/posts/1868395/
Quote:


> Ok, so here's a little micro benchmark that I wrote. Maybe it will point out something interesting about whether async compute is the culprit for AotS results or not.
> 
> GTX 680:
> 1. 18.01ms 4.12e+000
> 2. 36.40ms 4.12e+000
> 3. 54.93ms 4.12e+000
> 4. 72.65ms 4.12e+000
> 5. 90.29ms 4.12e+000
> 6. 107.10ms 4.12e+000
> So no async compute.
> 
> HD 4600:
> 1. 57.36ms 4.12e+000
> 2. 114.61ms 4.12e+000
> 3. 172.10ms 4.12e+000
> 4. 229.67ms 4.12e+000
> 5. 286.98ms 4.12e+000
> 6. 344.57ms 4.12e+000
> Also no async compute.
> 
> Anyone willing to give it a go on Maxwell or GCN? Not sure it will actually work (Hey it's my first D3D12 app
> 
> 
> 
> 
> 
> 
> 
> ), but if it does you should not see time basically double when it goes from one kernel launch to two kernel launches and so on. It will output a perf.log file in executable directory when run (so you can kill it because it will hog your system if it's running on GPU also running a display).


That shows none of the intent you attribute to them in you summary particularly not the creator of the test.


----------



## Digidi

*New Questions. What are the Batches meaning? Can sombody also put here the outputfile from Ashes Singularity?*


----------



## Assirra

Quote:


> Originally Posted by *PostalTwinkie*
> 
> It really does highlight the idiots that circle around these scenarios, blindly lusting for one side or the other. Which really gets in the way of legitimate discussion.
> 
> *Once upon a time, the Internet and Web was idiot free. Then I logged on!!!*


Are we talking about another dimension here?


----------



## PostalTwinkie

Quote:


> Originally Posted by *Assirra*
> 
> Are we talking about another dimension here?


Nah, just like 25 years ago, when only us "nerds" where getting "online". Although the idiot part of my statement probably hasn't changed.


----------



## HeavyUser

Quote:


> Originally Posted by *Mahigan*
> 
> People are selling their cards... I'm not making this up... a LOT of people too.


LOL, your a funny thing


----------



## Mahigan

Quote:


> Originally Posted by *CasualCat*
> 
> And part of that is a mischaracterization of what occurred. The first post regarding the test was this:
> https://forum.beyond3d.com/posts/1868395/
> That shows none of the intent you attribute to them in you summary particularly not the creator of the test.


I would direct you to the HardOCP thread were the folks there, who are at Beyond3D, did take what I was saying personally. With personal insults being flung towards me. Yes was as a means to try and prove that the Oxide dev was being dishonest in his statement (look at several statements from Jawed for example or Razor1). Many claim that the Oxide dev was biased. Which are the accusations many of them flung his way and my way (people accusing me of being bought, in the thread... that I'm looking to make "a little change" on the side). I didn't make the statements that Maxwell 2 couldn't do Async Compute. The Oxide dev did... but they attacked me for it.

I summarized it as I saw the whole issue blow up. I'm not upset at their testing. I'm actually happy. But their intent was to disprove the Oxide dev's statements. Maybe not the author... but many folks participating in this testing.

The way I see it, the test is needed. To mitigate fears. At the same time I question the way the test was first communicated. That the GCN latency, in the test, was somewhat a sign that GCN was poor at computing. It took Sebbi, stepping in, to re-iterate my concerns (which they brushed off) as to the way the test was programmed. I mentioned that batches needed to be threaded in increments of 64 (256 often being ideal). When someone posted over at Beyond3D that "Mahigan had concerns" someone hit back that I once worked for ATi therefore I was biased and my concerns didn't matter.

There is a lot of hostility towards me. There has been from the moment I posted my theory. I'm not going to change the story, as it unfolded. Now on with the testing.


----------



## airfathaaaaa

i think i found the best picture to summarize what is asynchronous shaders


----------



## Mahigan

Quote:


> Originally Posted by *HeavyUser*
> 
> LOL, your a funny thing


Many people are.. a lot of people. In one Reddit thread, three people in a row discussed how they returned or sold their cards. Many others were on the edge. Discussing how they're going to wait for the next game or a statement from nVIDIA before making a decision. That was in a small thread. On Youtube, many people again in the comment sections. When people are recommending Graphic cards for DX12, this argument is brought up as a reason to choose AMD.

I'm not making this up. Now I'm not saying that everyone is selling their cards but a heck of a lot of people are panicking. Is there reason to panic? No. But it's not helping that nVIDIA haven't released a statement on this matter. And that was the point of my post.


----------



## GorillaSceptre

Quote:


> Originally Posted by *HeavyUser*
> 
> LOL, your a funny thing


This thread alone has nearly 200k views. Of course this situation is going to make people who still have a chance to return their cards think about it.

I myself was about to pull the trigger on a 980 Ti, now there's no way i'm going to until Nvidia clarify what's going on.


----------



## Asmodian

Quote:


> Originally Posted by *Mahigan*
> 
> We're talking Multi-Adapter. It is a far more efficient way to do SLI/Crossfire. It uses SFR (Split Frame Rendering) rather than AFR (Alternate Frame Rendering). The bonus, of using SFR, is that you don't create redundant textures in your Graphic Cards memory buffer. You effectively Split every frame into two segments and all of the textures for split frame a is in GPU1 Memory Buffer, textures for split frame b is in GPU2 Memory buffer. This means that if GPU1 has 4GB RAM and GPU2 has 4GB RAM then you now have 8GB of memory buffer. With AFR you would have still had only 4GB.


What? SFR is very old (the first method used) and does not help with memory like that. What happens if you turn or strafe such that the texture moves to the other half of the screen.

DX12 has new ways of using multiple GPUs where maybe trees are rendered on one and animals on another. This would be entirely on the developer to implement so I am not expecting anything to use it anytime soon.


----------



## PostalTwinkie

Quote:


> Originally Posted by *GorillaSceptre*
> 
> This thread alone has nearly 200k views. Of course this situation is going to make people who still have a chance to return their cards think about it.
> 
> I myself was about to pull the trigger on a 980 Ti, now there's no way i'm going to until Nvidia clarify what's going on.


I wouldn't buy a 900 series right now, just because Pascal is not that far away. At least not at current pricing, drop about $200 off the 980 Ti and I would consider it for a go-between until Pascal. Actually, I probably wouldn't even do that, I would just get another 290X for my own rig - if I needed a GPU right now.

Then again, I have a 780 Ti, so a move to a 980 Ti wouldn't be THAT huge of a difference - probably a little noticeable, but in the long term not enough to justify it.


----------



## provost

Quote:


> Originally Posted by *PostalTwinkie*
> 
> It really does highlight the idiots that circle around these scenarios, blindly lusting for one side or the other. Which really gets in the way of legitimate discussion.
> 
> Once upon a time, the Internet and Web was idiot free. Then I logged on!!!


Ok, since this was a perfect layup, I couldn't resist...

Would you consider your wish for "AMD to just die" one of those idiotic moments? .... Lol

Because, according to a post here a saw last night, it looks like Santa may grant you that wish, but only by year 2020.

Anyway, since we are having a lighthearted chat, consider this post just as such...


----------



## Digidi

*Nobody knows what the Batches are in the Benchmark????*


----------



## PostalTwinkie

Quote:


> Originally Posted by *provost*
> 
> Ok, since this was a perfect layup, I couldn't resist...
> 
> Would you consider your wish for "AMD to just die" one of those idiotic moments? .... Lol
> 
> Because, according to a post here a saw last night, it looks like Santa may grant you that wish, but only by year 2020.
> 
> Anyway, since we are having a lighthearted chat, consider this post just as such...


I do want AMD to just die, because I don't believe their current management and structure will see them through. Going into Administration would help them a lot I think, maybe even new owners. Which could lead to AMD being AMD again. So the "just die" part isn't to say I want them gone, I don't, nor do I think they could go away. I want them to die so that they can be resurrected properly, and do what we know they can do. AMD did it in the past, they can do it again.

EDIT:

AMD holds enough in IP that even if they ran to 0 in the banks, someone would buy them and keep them around. Samsung still seems the most likely buyer.


----------



## Mahigan

Quote:


> Originally Posted by *Asmodian*
> 
> What? SFR is very old (the first method used) and does not help with memory like that. What happens if you turn or strafe such that the texture moves to the other half of the screen.
> 
> DX12 has new ways of using multiple GPUs where maybe trees are rendered on one and animals on another. This would be entirely on the developer to implement so I am not expecting anything to use it anytime soon.


SFR was used in DX11. True. But since the driver didn't know or couldn't tell how to balance the texture load... textures were still replicated into the memory buffer.

Under DX12, that's not a problem, as the developer knows exactly what his code is doing and doesn't need to worry about the driver so much. the developer decides when to submit batches to the GPU, not the driver.



http://wccftech.com/amd-sheds-more-light-on-explicit-multiadapter-in-directx-12-in-new-slides/


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> SFR was used in DX11. True. But since the driver didn't know or couldn't tell how to balance the texture load... textures were still replicated into the memory buffer.
> 
> Under DX12, that's not a problem, as the developer knows exactly what his code is doing and doesn't need to worry about the driver so much. the developer decides when to submit batches to the GPU, not the driver.
> 
> 
> 
> http://wccftech.com/amd-sheds-more-light-on-explicit-multiadapter-in-directx-12-in-new-slides/


Are Devs actually going to use SFR? Or is it one of those hard coded things that DX12 just does with multiple GPUs?


----------



## CasualCat

nm OT


----------



## CasualCat

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I do want AMD to just die, because I don't believe their current management and structure will see them through. Going into Administration would help them a lot I think, maybe even new owners. Which could lead to AMD being AMD again. So the "just die" part isn't to say I want them gone, I don't, nor do I think they could go away. I want them to die so that they can be resurrected properly, and do what we know they can do. AMD did it in the past, they can do it again.
> 
> EDIT:
> 
> AMD holds enough in IP that even if they ran to 0 in the banks, someone would buy them and keep them around. Samsung still seems the most likely buyer.


You longing for old AMD, old ATI, or both?


----------



## Mahigan

Quote:


> Originally Posted by *Digidi*
> 
> *Nobody knows what the Batches are in the Benchmark????*


Nobody here has access to exactly what the benchmark is doing. We're just relying on the talent over at Beyond3D.


----------



## Vesku

Quote:


> Originally Posted by *Noufel*
> 
> Apparently maxwell 2.0 can do async compute in others physics game and opencl benchs
> https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-26#post-1870028


It's not surprising that pure compute programs such as luxmark will fill Maxwell 2 Compute Queues. However how Maxwell 2 determines what is compute seems to be in question since the program B3D users are running doesn't seem to get the luxmark treatment even when only the compute component is run. Perhaps the Async test program is only running one compute "job" rather than the multiple most compute apps would?

I'm wondering if PhysX is actually running in ALL queues including the GPU one and that's why it can fill up the Compute Queues. Given how long Nvidia has been promoting PhysX I also wouldn't be surprised if there is some sort of PhysX scheduler implemented on the GPU die. I guess this is what happens when too many people weren't upset over how the public had to prove the 970 3.5GB issue, Nvidia appears in no hurry to provide more details on how Maxwell 2 is going to handle Async Compute.


----------



## GorillaSceptre

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I wouldn't buy a 900 series right now, just because Pascal is not that far away. At least not at current pricing, drop about $200 off the 980 Ti and I would consider it for a go-between until Pascal. Actually, I probably wouldn't even do that, I would just get another 290X for my own rig - if I needed a GPU right now.
> 
> Then again, I have a 780 Ti, so a move to a 980 Ti wouldn't be THAT huge of a difference - probably a little noticeable, but in the long term not enough to justify it.


Pascal is still probably a year away. September/October 2016 are the current rumors, although some think it will be near the end of 2016.

I'm sure i'm not the only one with an ancient GPU that needs to upgrade, if the Fury drops in price a bit it looks like a solid choice for someone like me. I nearly bought a 290x over a year ago, still kicking myself that i didn't.


----------



## Asmodian

Quote:


> Originally Posted by *Mahigan*
> 
> SFR was used in DX11. True. But since the driver didn't know or couldn't tell how to balance the texture load... textures were still replicated into the memory buffer.
> 
> Under DX12, that's not a problem, as the developer knows exactly what his code is doing and doesn't need to worry about the driver so much. the developer decides when to submit batches to the GPU, not the driver.


SFR and shared memory pools are not the same thing, they are not even related. Those are two separate sections in that article. Being able to access the other GPU's memory is nice but isn't the same as having the texture in local gpu memory. Normal SFR (much older than DX11, think 3DFX) has a smaller memory footprint for frame buffers but textures and other assets still have to be in memory on both GPUs.

SFR helps eliminate microstutter and reduces frame latency but it is not part of the shared memory feature.


----------



## PostalTwinkie

Quote:


> Originally Posted by *CasualCat*
> 
> You longing for old AMD, old ATI, or both?


I want the old AMD back for CPU, and they aren't too far behind in GPU. If AMD can come back on the CPU front, I think that will allow them to come around on the GPU front. Yes, I think AMD screwed up on Fury, Fury X, and Nano, but really that isn't my main concern. I think part of that had to do with just how damn long in the tooth 28nm has become.

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Pascal is still probably a year away. September/October 2016 are the current rumors, although some think it will be near the end of 2016.
> 
> I'm sure i'm not the only one with an ancient GPU that needs to upgrade, if the Fury drops in price a bit it looks like a solid choice for someone like me. I nearly bought a 290x over a year ago, still kicking myself that i didn't.


I just purchased a 290X a few weeks ago for my brother, great card for the money. If you need a GPU now, and can't wait a year for Nvidia's and AMD's new stuff, the 290X is the way to go I think. So much bang for the buck!

EDIT:

Just glanced down and seen you are on a 570. I would get a 290X, maybe a used one off the OCN Market. Even if you go new, it still will be a great upgrade for you. Then if you decide to upgrade with Pascal or AMD's new stuff, you could sell that 290X and probably not take that massive of a hit on the resell value.


----------



## Mahigan

Quote:


> Originally Posted by *Asmodian*
> 
> SFR and shared memory pools are not the same thing, they are not even related. Those are two separate sections in that article. Being able to access the other GPU's memory is nice but isn't the same as having the texture in local gpu memory. Normal SFR (much older than DX11, think 3DFX) has a smaller memory footprint for frame buffers but textures and other assets still have to be in memory on both GPUs.
> 
> SFR helps eliminate microstutter and reduces frame latency but it is not part of the shared memory feature.


It will be under DX12, if implemented that way, because the way DX12 works is that you can assign texture streaming to several GPU sources. Therefore under Split frame rendering you can assign one aspect of the frame to receive certain textures and the other side of the frame to receive other textures. You can do the same for compute, copy and other graphic tasks.

Basically the developer decides when to submit work to the GPU (when to draw etc) rather than the driver.

In other words you can have SFR + Combined memory pools if coded that way.


----------



## GorillaSceptre

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I just purchased a 290X a few weeks ago for my brother, great card for the money. If you need a GPU now, and can't wait a year for Nvidia's and AMD's new stuff, the 290X is the way to go I think. So much bang for the buck!
> 
> EDIT:
> 
> Just glanced down and seen you are on a 570. I would get a 290X, maybe a used one off the OCN Market. Even if you go new, it still will be a great upgrade for you. Then if you decide to upgrade with Pascal or AMD's new stuff, you could sell that 290X and probably not take that massive of a hit on the resell value.


I would def get one if it wasn't such a hassle to get a card where i live, and for a fair price. After going through the hassle of importing i may as well just get the Fury instead. Next year i could always get a second one if i need more performance, and that would probably be a better setup for VR too. Volta will probably be when the real monster chips drop anyway.

But yeah, if i was in the States/UK i would go for a 290x. That thing just keeps on going.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I want the old AMD back for CPU, and they aren't too far behind in GPU. If AMD can come back on the CPU front, I think that will allow them to come around on the GPU front. Yes, I think AMD screwed up on Fury, Fury X, and Nano, but really that isn't my main concern. I think part of that had to do with just how damn long in the tooth 28nm has become.


I would have preferred if AMD had never bought ATi. ATi was highly competitive with nVIDIA. But then again I was biased towards ATi at one point in my life. This bias didn't transfer over to AMD... mostly because AMD promised there would be no lay-offs... shortly after the takeover... lay-offs. Many of my former colleagues were laid off.

There is a degree of bad blood between AMD and I.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> I would have preferred if AMD had never bought ATi. ATi was highly competitive with nVIDIA. But then again I was biased towards ATi at one point in my life. This bias didn't transfer over to AMD... mostly because AMD promised there would be no lay-offs... shortly after the takeover... lay-offs. Many of my former colleagues were laid off.
> 
> There is a degree of bad blood between AMD and I.


Understandable.

How would you feel about AMD reviving the ATi name? Maybe put it on their flagship halo cards? I didn't really comment on ATi in my first reply, because that just leads me back to 3DFX as well and the glory days of the 90s.

.......I miss my CRT!


----------



## mav451

Quote:


> Originally Posted by *Mahigan*
> 
> It could be that the silence from nVIDIA is because they're working on a driver, for DX12, particularly on the Work Distributor portion of the driver. I think nVIDIA has to re-write their entire software scheduler in order to get Async working. I'm not a developer or a programmer. I only understand the hardware engineering side of things. Since there are no White Papers, showing block diagrams of Maxwell 2, I can't see if any changes have been made to the hardware which would allow it to behave more like GCN (the last diagrams I've seen were of changes to the SMM/SMX nothing else).
> 
> *To me, since nVIDIA refer to the Kepler documentation when speaking on Maxwell, it seems that most of the changes made, Maxwell to Maxwell 2, were in software.*
> 
> I wonder what would happen if they ran Arkham Knight with a card which doesn't support Async (say Kepler but especially Maxwell), if it behaves the same then you're still stuck at the starting point. If it doesn't behave the same then the issue is likely tied to either the way that benchmark was coded or the Software scheduler used by nVIDIA in their DX12 driver.


If that's the case, then I'm curious what nVidia can accomplish with the Work Distributor being revised/adapted for DX12.
(Assuming there is something being developed right now - just guessing)

Moreover, you mention that Kepler documentation continues to be referenced for the Maxwell/Maxwell 2 GPUs-
Are you suggesting that _should_ things be revised - they would affect Kepler, Maxwell, and Maxwell 2 cards similarly?

I am of course referencing to how AMD has done the best they could with tessellation performance in their recent drivers. Now I know I'm talking apples to oranges here, but it's a question I have.

What exactly can nVidia accomplish on the software level and what are realistic expectations?

@Postal - If we're going to reminisce about old cards, we should talk Matrox haha (look at my Build history for reference).


----------



## spacin9

Quote:


> Originally Posted by *Digidi*
> 
> *New Questions. What are the Batches meaning? Can sombody also put here the outputfile from Ashes Singularity?*
> 
> Maybe 980 oder Titan x or fury? Im intrested in the Batches because they corespondend with the fps.
> 
> == Hardware Configuration ================================================
> GPU: AMD Radeon R9 200 Series
> CPU: GenuineIntel
> Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> Physical Cores: 4
> Logical Cores: 8
> Physical Memory: 8140 MB
> Allocatable Memory: 134217727 MB
> ==========================================================================
> 
> == Configuration =========================================================
> API: DirectX 12
> Resolution: 1920x1080
> Fullscreen: True
> Bloom Quality: High
> PointLight Quality: High
> Glare Quality: High
> Shading Samples: 16 million
> Terrain Shading Samples: 8 million
> Shadow Quality: High
> Temporal AA Duration: 6
> Temporal AA Time Slice: 2
> Multisample Anti-Aliasing: 1x
> Texture Rank : 1
> ==========================================================================
> 
> == Results ===============================================================
> BenchMark 0
> TestType: Full System Test
> == Sub Mark Normal Batch =================================================
> Total Time: 60.011837 ms per frame
> Avg Framerate: 54.839180 FPS (18.235138 ms)
> Weighted Framerate: 54.173260 FPS (18.459291 ms)
> CPU frame rate (estimated if not GPU bound): 67.433426 FPS (14.829440 ms)
> Percent GPU Bound: 66.996521 %
> *Driver throughput (Batches per ms): 1688.193115 Batches
> Average Batches per frame: 4495.382324* Batches
> == Sub Mark Medium Batch =================================================
> Total Time: 62.007164 ms per frame
> Avg Framerate: 50.913467 FPS (19.641169 ms)
> Weighted Framerate: 50.507751 FPS (19.798943 ms)
> CPU frame rate (estimated if not GPU bound): 65.788498 FPS (15.200226 ms)
> Percent GPU Bound: 89.867798 %
> *Driver throughput (Batches per ms): 2292.695068 Batches
> Average Batches per frame: 7936.875977 Batches*
> == Sub Mark Heavy Batch =================================================
> Total Time: 57.988411 ms per frame
> Avg Framerate: 43.939812 FPS (22.758404 ms)
> Weighted Framerate: 43.560501 FPS (22.956575 ms)
> CPU frame rate (estimated if not GPU bound): 54.549587 FPS (18.331945 ms)
> Percent GPU Bound: 95.388588 %
> *Driver throughput (Batches per ms): 4006.267822 Batches
> Average Batches per frame: 20135.652344 Batches*
> =========================================================================
> =========================================================================


Yeah so...looks like you're using high preset so that's what I tested.

normal: 91.7 fps... driver throughput 2968.7, 5412.2

medium: 81.2 fps... driver throughput 3843.7, 8400.7

heavy: 67.8 fps... driver throughput 5729.7, 20722.4

The batches are about the same, my Titan X OC'd and 5820K OC'd has more driver throughput it seems and a higher framerate. It looks like you're more CPU bound than I am. My CPU FPS AVG. is 127.3 fps.

AMD is killing it in DX 12, but NV Maxwell 2 is just such a monster overclocker that it at least evens the playing field in DX 12, even though DX 11 is faster, where 290x and Fury X don't OC very well.


----------



## Asmodian

Quote:


> Originally Posted by *Mahigan*
> 
> It will be under DX12, if implemented that way, because the way DX12 works is that you can assign texture streaming to several GPU sources. Therefore under Split frame rendering you can assign one aspect of the frame to receive certain textures and the other side of the frame to receive other textures. You can do the same for compute, copy and other graphic tasks.
> 
> Basically the developer decides when to submit work to the GPU (when to draw etc) rather than the driver.
> 
> In other words you can have SFR + Combined memory pools if coded that way.


Right, but SFR isn't what allows combined memory pools and you could do AFR + combined memory pools if you wanted to.


----------



## Serandur

Quote:


> Originally Posted by *Mahigan*
> 
> I would have preferred if AMD had never bought ATi. ATi was highly competitive with nVIDIA. But then again I was biased towards ATi at one point in my life. This bias didn't transfer over to AMD... mostly because AMD promised there would be no lay-offs... shortly after the takeover... lay-offs. Many of my former colleagues were laid off.
> 
> There is a degree of bad blood between AMD and I.


Lay offs... did AMD start slashing funding for ATi so early? I'm of the opinion ATi were awesome and that AMD weren't fit to afford funding ATi hence the relative difficulties in product development and marketing we see now. [Insert obligatory 9700 Pro was awesome sentence here]. I'm curious, do you have any idea of how much of ATi actually remains under AMD today? Were you aware of TeraScale/VLIW's development when you left?

I'm unsure of AMD's direct role in this, but Radeon market share/mindshare seems to have taken its first big hit right after ATi were acquired and Nvidia launched Tesla/the 8800 while the 2900XT wasn't ready in time and came out limping. Then AMD/ATi went with their small-die strategy and the Radeon brand never really recovered. I wonder just how much of that was directly AMD's fault and how much was just coincidental timing as Nvidia started doing their whole big-die/CUDA thing.


----------



## pengs

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Understandable.
> 
> How would you feel about AMD reviving the ATi name? Maybe put it on their flagship halo cards? I didn't really comment on ATi in my first reply, because that just leads me back to 3DFX as well and the glory days of the 90s.
> 
> .......I miss my CRT!


Oh no, not the aesthetics







Call me odd but I like AMD's no frills font with the two simple opposing arrows.

Yeah, 3DFX had it. I think it was the font used and the two tone white or black with yellow of the logo. The box art was always so awesome looking and the product discription was sharp and simple.


Spoiler: Warning: Spoiler!


----------



## CasualCat

Quote:


> Originally Posted by *Serandur*
> 
> Lay offs... did AMD start slashing funding for ATi so early? I'm of the opinion ATi were awesome and that AMD weren't fit to afford funding ATi hence the relative difficulties in product development and marketing we see now. [Insert obligatory 9700 Pro was awesome sentence here]. I'm curious, do you have any idea of how much of ATi actually remains under AMD today? Were you aware of TeraScale/ VLIW's development when you left?
> 
> I'm unsure of AMD's direct role in this, but Radeon market share/mindshare seems to have taken its first big hit right after ATi were acquired and Nvidia launched Tesla/the 8800 while the 2900XT wasn't ready in time and came out limping. Then AMD/ATi went with their small-die strategy and the Radeon brand never really recovered. I wonder just how much of that was directly AMD's fault and how much was just coincidental timing as Nvidia started doing their whole big-die/CUDA thing.


The market share graph (with its decline) definitely seems to align with the acquisition.


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> I would have preferred if AMD had never bought ATi. ATi was highly competitive with nVIDIA. But then again I was biased towards ATi at one point in my life. This bias didn't transfer over to AMD... mostly because AMD promised there would be no lay-offs... shortly after the takeover... lay-offs. Many of my former colleagues were laid off.
> 
> There is a degree of bad blood between AMD and I.


Only one problem, ATi was looking for an out. ATi was going to sell no matter who wanted it. However selling to NVidia would've meant monopoly.


----------



## Kollock

Wow, lots more posts here, there is just too many things to respond to so I'll try to answer what I can.

/inconvenient things I'm required to ask or they won't let me post anymore
Regarding screenshots and other info from our game, we appreciate your support but please refrain from disclosing these until after we hit early access. It won't be long now.
/end

Regarding batches, we use the term batches just because we are counting both draw calls and dispatch calls. Dispatch calls are compute shaders, draw calls are normal graphics shaders. Though sometimes everyone calls dispatchs draw calls, they are different so we thought we'd avoid the confusion by calling everything a draw call.

Regarding CPU load balancing on D3D12, that's entirely the applications responsibility. So if you see a case where it's not load balancing, it's probably the application not the driver/API. We've done some additional tunes to the engine even in the last month and can clearly see usage cases where we can load 8 cores at maybe 90-95% load. Getting to 90% on an 8 core machine makes us really happy. Keeping our application tuned to scale like this definitely on ongoing effort.

Additionally, hitches and stalls are largely the applications responsibility under D3D12. In D3D12, essentially everything that could cause a stall has been removed from the API. For example, the pipeline objects are designed such that the dreaded shader recompiles won't ever have to happen. We also have precise control over how long a graphics command is queued up. This is pretty important for VR applications.

Also keep in mind that the memory model for D3d12 is completely different the D3D11, at an OS level. I'm not sure if you can honestly compare things like memory load against each other. In D3D12 we have more control over residency and we may, for example, intentionally keep something unused resident so that there is no chance of a micro-stutter if that resource is needed. There is no reliable way to do this in D3D11. Thus, comparing memory residency between the two APIS may not be meaningful, at least not until everyone's had a chance to really tune things for the new paradigm.

Regarding SLI and cross fire situations, yes support is coming. However, those options in the ini file probablly do not do what you think they do, just FYI. Some posters here have been remarkably perceptive on different multi-GPU modes that are coming, and let me just say that we are looking beyond just the standard Crossfire and SLI configurations of today. We think that Multi-GPU situations are an area where D3D12 will really shine. (once we get all the kinks ironed out, of course). I can't promise when this support will be unvieled, but we are commited to doing it right.

Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.

We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.

Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).

Thanks!


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kollock*
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.


This should get interesting.


----------



## ku4eto

So it seems that Async Compute was indeed a driver problem
Quote:


> Originally Posted by *PostalTwinkie*
> 
> This should get interesting.


So i guess Maxwell 2 has hardware Async compute ?


----------



## Kand

Thread closure requested. Discussion over.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kand*
> 
> Thread closure requested. Discussion over.


Are you nuts? It is just getting started now.

Nvidia said there was an issue, developer said there wasn't, and now we are back at there being an issue and work being done on it.

EDIT:

In addition to that, the developer is also now saying that AoTS shouldn't be held as the primer example. Even though that is exactly what people have been doing for two weeks, some report that people have returned hardware over this.


----------



## CasualCat

Quote:


> Originally Posted by *Kollock*
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.
> 
> Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).
> 
> Thanks!


Do we have a ballpark time frame of when we might see something? Not looking for a specific date rather is this weeks vs months away in your estimation? Thanks.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kand*
> 
> Thread closure requested. Discussion over.


What?


----------



## Kand

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Are you nuts? It is just getting started now.
> 
> Nvidia said there was an issue, developer said there wasn't, and now we are back at there being an issue and work being done on it.
> 
> EDIT:
> 
> In addition to that, the developer is also now saying that AoTS shouldn't be held as the primer example. Even though that is exactly what people have been doing for two weeks, some report that people have returned hardware over this.


Exactly. It was misinterpreted. Oxide gave a mixed signal in their first post here. AMD pr capitalized and famned the flame.


----------



## Xuper

hmm , one question : To access AC in CGN , You need Driver? I thought in DX12 you have more control over GPU.on other hand you talk direct to GPU.So I think driver problem doesn't mean that Maxwell has Ac?


----------



## SpeedyVT

Quote:


> Originally Posted by *PostalTwinkie*
> 
> This should get interesting.


Let's see if NVidia does anything.


----------



## SpeedyVT

Quote:


> Originally Posted by *Kollock*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Wow, lots more posts here, there is just too many things to respond to so I'll try to answer what I can.
> 
> /inconvenient things I'm required to ask or they won't let me post anymore
> Regarding screenshots and other info from our game, we appreciate your support but please refrain from disclosing these until after we hit early access. It won't be long now.
> /end
> 
> Regarding batches, we use the term batches just because we are counting both draw calls and dispatch calls. Dispatch calls are compute shaders, draw calls are normal graphics shaders. Though sometimes everyone calls dispatchs draw calls, they are different so we thought we'd avoid the confusion by calling everything a draw call.
> 
> Regarding CPU load balancing on D3D12, that's entirely the applications responsibility. So if you see a case where it's not load balancing, it's probably the application not the driver/API. We've done some additional tunes to the engine even in the last month and can clearly see usage cases where we can load 8 cores at maybe 90-95% load. Getting to 90% on an 8 core machine makes us really happy. Keeping our application tuned to scale like this definitely on ongoing effort.
> 
> Additionally, hitches and stalls are largely the applications responsibility under D3D12. In D3D12, essentially everything that could cause a stall has been removed from the API. For example, the pipeline objects are designed such that the dreaded shader recompiles won't ever have to happen. We also have precise control over how long a graphics command is queued up. This is pretty important for VR applications.
> 
> Also keep in mind that the memory model for D3d12 is completely different the D3D11, at an OS level. I'm not sure if you can honestly compare things like memory load against each other. In D3D12 we have more control over residency and we may, for example, intentionally keep something unused resident so that there is no chance of a micro-stutter if that resource is needed. There is no reliable way to do this in D3D11. Thus, comparing memory residency between the two APIS may not be meaningful, at least not until everyone's had a chance to really tune things for the new paradigm.
> 
> Regarding SLI and cross fire situations, yes support is coming. However, those options in the ini file probablly do not do what you think they do, just FYI. Some posters here have been remarkably perceptive on different multi-GPU modes that are coming, and let me just say that we are looking beyond just the standard Crossfire and SLI configurations of today. We think that Multi-GPU situations are an area where D3D12 will really shine. (once we get all the kinks ironed out, of course). I can't promise when this support will be unvieled, but we are commited to doing it right.
> 
> Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.
> 
> Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).
> 
> Thanks!


Will Ashes take advantage of additional shaders and computational power on APUs with dGPUs?


----------



## PostalTwinkie

Quote:


> Originally Posted by *SpeedyVT*
> 
> Let's see if NVidia does anything.


Kollock just said Nvidia is actively working on it with them, so there is that. I am interested to see what happens when Nvidia does respond.

Get the popcorn ready!










Quote:


> Originally Posted by *SpeedyVT*
> 
> Will Ashes take advantage of additional shaders and computational power on APUs with dGPUs?


That is a good question. AMD's APUs have been pretty impressive, at least the ones I have used. It would be cool to see some nifty DX 12 magic utilizing an APU.


----------



## poii

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Are you nuts? It is just getting started now.
> 
> Nvidia said there was an issue, developer said there wasn't, and now we are back at there being an issue and work being done on it.
> 
> EDIT:
> 
> In addition to that, the developer is also now saying that AoTS shouldn't be held as the primer example. Even though that is exactly what people have been doing for two weeks, some report that people have returned hardware over this.


Quote:


> Originally Posted by *Kollock*
> 
> Our use of Async Compute, however, pales with comparisons to some of the things which the console guys are starting to do. Most of those haven't made their way to the PC yet, but I've heard of developers getting 30% GPU performance by using Async Compute. Too early to tell, of course, but it could end being pretty disruptive in a year or so as these GCN built and optimized engines start coming to the PC. I don't think Unreal titles will show this very much though, so likely we'll have to wait to see. Has anyone profiled Ark yet?


The developer said this in his very first post here.


----------



## spacin9

Exciting stuff. Thanks for the comment, Kollock.


----------



## Vesku

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Are you nuts? It is just getting started now.
> 
> Nvidia said there was an issue, developer said there wasn't, and now we are back at there being an issue and work being done on it.


Developer only said it didn't work properly and Nvidia told them to not use it. So apparently it didn't work and we are still in a position where it's up to Nvidia to demonstrate it can work properly on Maxwell 2. Only update from before is that Nvidia has now told the developer that they are actively looking into it.


----------



## Devnant

Future results should get more and more interesting after they reach early access. Wonder if Maxwell 2 will actually get better DX12 than DX11 performance once NVIDIA fix async compute on their drivers.


----------



## Noufel

The table has turned, it's nvidia's turn to be driver limited


----------



## Mahigan

Quote:


> Originally Posted by *Vesku*
> 
> Developer only said it didn't work properly and Nvidia told them to not use it. So apparently it didn't work and we are still in a position where it's up to Nvidia to demonstrate it can work properly on Maxwell 2. Only update from before is that Nvidia has now told the developer that they are actively looking into it.


Quote:


> *It could be that the silence from nVIDIA is because they're working on a driver, for DX12, particularly on the Work Distributor portion of the driver. I think nVIDIA has to re-write their entire software scheduler in order to get Async working*. I'm not a developer or a programmer. I only understand the hardware engineering side of things.


http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2060#post_24378467

So it is software driven, but that's not shocking. The nVIDIA scheduler has two segment in software, the Grid Management Unit and the Work Distributor. The Asynchronous Warp Schedulers are in the SMMs however (hardware).


----------



## PostalTwinkie

Quote:


> Originally Posted by *Noufel*
> 
> The table has turned, it's nvidia's turn to be driver limited


Might want to go back one page and read the post from Oxide.










Someone might be gagging the fat lady right now, so let's not get too far ahead of ourselves.


----------



## Xuper

Quote:


> Originally Posted by *Mahigan*
> 
> http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2060#post_24378467
> 
> So it is software driven, but that's not shocking. The nVIDIA scheduler has two segment in software, the Grid Management Unit and the Work Distributor. The Asynchronous Warp Schedulers are in the SMMs however (hardware).


So It's not like CGN that is Fully hardware?


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Vesku*
> 
> Developer only said it didn't work properly and Nvidia told them to not use it. So apparently it didn't work and we are still in a position where it's up to Nvidia to demonstrate it can work properly on Maxwell 2. Only update from before is that Nvidia has now told the developer that they are actively looking into it.
> 
> 
> 
> Quote:
> 
> 
> 
> *It could be that the silence from nVIDIA is because they're working on a driver, for DX12, particularly on the Work Distributor portion of the driver. I think nVIDIA has to re-write their entire software scheduler in order to get Async working*. I'm not a developer or a programmer. I only understand the hardware engineering side of things.
> 
> Click to expand...
> 
> http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2060#post_24378467
> 
> So it is software driven, but that's not shocking. The nVIDIA scheduler has two segment in software, the Grid Management Unit and the Work Distributor. The Asynchronous Warp Schedulers are in the SMMs however (hardware).
Click to expand...

I don't know if the performance gain will be the same as GCN with a simple driver optimization ???


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> So It's not like CGN that is Fully hardware?


Nope. That's where you get the "Performance per Watt" figures of Kepler and Maxwell/2. Hardware schedulers take up a lot of power.

Things just got a whole lot more interesting


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> I don't know if the performance gain will be the same as GCN with a simple driver optimization ???


Slower context switching... like AMD said. It depends on the work load. At least now it might be working. If it works for Maxwell 2, in theory, they could make it work for Maxwell. Unless Maxwell's Warp Schedulers don't function Asynchronously.

Why didn't nVIDIA just admit to this in the first place? Why stay silent on the matter. Like I said earlier..

"We're aware of the issue and we're working to get it resolved as quickly as possible".


----------



## Mahigan

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *Kollock*
> 
> Wow, lots more posts here, there is just too many things to respond to so I'll try to answer what I can.
> 
> /inconvenient things I'm required to ask or they won't let me post anymore
> Regarding screenshots and other info from our game, we appreciate your support but please refrain from disclosing these until after we hit early access. It won't be long now.
> /end
> 
> Regarding batches, we use the term batches just because we are counting both draw calls and dispatch calls. Dispatch calls are compute shaders, draw calls are normal graphics shaders. Though sometimes everyone calls dispatchs draw calls, they are different so we thought we'd avoid the confusion by calling everything a draw call.
> 
> Regarding CPU load balancing on D3D12, that's entirely the applications responsibility. So if you see a case where it's not load balancing, it's probably the application not the driver/API. We've done some additional tunes to the engine even in the last month and can clearly see usage cases where we can load 8 cores at maybe 90-95% load. Getting to 90% on an 8 core machine makes us really happy. Keeping our application tuned to scale like this definitely on ongoing effort.
> 
> Additionally, hitches and stalls are largely the applications responsibility under D3D12. In D3D12, essentially everything that could cause a stall has been removed from the API. For example, the pipeline objects are designed such that the dreaded shader recompiles won't ever have to happen. We also have precise control over how long a graphics command is queued up. This is pretty important for VR applications.
> 
> Also keep in mind that the memory model for D3d12 is completely different the D3D11, at an OS level. I'm not sure if you can honestly compare things like memory load against each other. In D3D12 we have more control over residency and we may, for example, intentionally keep something unused resident so that there is no chance of a micro-stutter if that resource is needed. There is no reliable way to do this in D3D11. Thus, comparing memory residency between the two APIS may not be meaningful, at least not until everyone's had a chance to really tune things for the new paradigm.
> 
> Regarding SLI and cross fire situations, yes support is coming. However, those options in the ini file probablly do not do what you think they do, just FYI. Some posters here have been remarkably perceptive on different multi-GPU modes that are coming, and let me just say that we are looking beyond just the standard Crossfire and SLI configurations of today. We think that Multi-GPU situations are an area where D3D12 will really shine. (once we get all the kinks ironed out, of course). I can't promise when this support will be unvieled, but we are commited to doing it right.
> 
> Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.
> 
> Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).
> 
> Thanks!






@Kollock Thank You


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Noufel*
> 
> I don't know if the performance gain will be the same as GCN with a simple driver optimization ???
> 
> 
> 
> Slower context switching... like AMD said. It depends on the work load. At least now it might be working. If it works for Maxwell 2, in theory, they could make it work for Maxwell. Unless Maxwell's Warp Schedulers don't function Asynchronously.
> 
> Why didn't nVIDIA just admit to this in the first place? Why stay silent on the matter. Like I said earlier..
> 
> "We're aware of the issue and we're working to get it resolved as quickly as possible".
Click to expand...

they didn't know that it will be a major issue and thnx to people like you Mahigan they admited that


----------



## Xuper

Quote:


> Originally Posted by *Mahigan*
> 
> Nope. That's where you get the "Performance per Watt" figures of Kepler and Maxwell/2. Hardware schedulers take up a lot of power.
> 
> Things just got a whole lot more interesting


I seeeeeeee!


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> I don't know if the performance gain will be the same as GCN with a simple driver optimization ???


Probably not, that's what my theory was about... but it should help the GTX 980 Ti get a boost. Once Oxide work on more Post Processing effects and optimize for the Fury-X, another boost for the GCN parts in that direction as well (something Kollock mentioned).

At least now we know that Asynchronous Compute is in fact software driven for Maxwell 2. We know it can be activated with a driver update, at least that's what nVIDIA have stated. We're still at a wait and see phase imo.

Now watch as the internet explodes once again LOL


----------



## mav451

Quote:


> Originally Posted by *Mahigan*
> 
> Nope. That's where you get the "Performance per Watt" figures of Kepler and Maxwell/2. Hardware schedulers take up a lot of power.
> 
> Things just got a whole lot more interesting


I don't have a 750W PSU for a single CPU/GPU setup for nothing lol.

Also - I must thank Kollock for continuing to monitor and answer questions in this thread.
More importantly, answering with professionalism and restraint - it's a rarity in this thread haha.


----------



## Xuper

OK , Quick Question, Does that mean Nvidia should Implant in every game ? I think Nvidia should Create new Profile for each game that uses heavy AC?


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> OK , Quick Question, Does that mean Nvidia should Implant in every game ? I think Nvidia should Create new Profile for each game that uses heavy AC?


That's up to the developers. If they code a game to make use of a high amount of Asynchronous Compute... then, in theory, it should hit Maxwell 2 harder than GCN. GCN was built to do everything in hardware.

What I'm wondering is how the CPU load will look once nVIDIA activate it in their drivers. For GCN, the driver just send the load to the GPU based on the queue the developer picks (Compute/Graphics/Copy) and the GCN schedulers (ACEs and Graphic Command Processor) handle the rest... it seems that for Maxwell 2... the driver actually plays a larger role in distributing the work to the various elements without the Graphics card being involved in the process. In theory, this should show up as higher CPU overhead for nVIDIA when handling Asynchronous Compute.

It also means more latency... and this is what the VR guys have been talking about. It is also what I mentioned in my original theory.

Now we know nVIDIA didn't lie... well... wait and see before we draw that conclusion. Software support is still valid as many DX12 features are emulated in software in this generation of cards.

Quote:


> Originally Posted by *Noufel*
> 
> they didn't know that it will be a major issue and thnx to people like you Mahigan they admited that


Don't thank me... thank everyone... everyone who made this into a big issue compelling a response


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> Probably not, that's what my theory was about... but it should help the GTX 980 Ti get a boost. Once Oxide work on more Post Processing effects and optimize for the Fury-X, another boost for the GCN parts in that direction as well (something Kollock mentioned).
> 
> At least now we know that Asynchronous Compute is in fact software driven for Maxwell 2. We know it can be activated with a driver update, at least that's what nVIDIA have stated. We're still at a wait and see phase imo.
> 
> Now watch as the internet explodes once again LOL


I don't understand - is it software only? There is nothing in the NV GPU hardware that could carry out such functionality? at all?
It's hard to follow all these posts.


----------



## ku4eto

Quote:


> Originally Posted by *Mahigan*
> 
> That's up to the developers. If they code a game to make use of a high amount of Asynchronous Compute... then, in theory, it should hit Maxwell 2 harder than GCN. GCN was built to do everything in hardware.
> 
> What I'm wondering is how the CPU load will look once nVIDIA activate it in their drivers. For GCN, the driver just send the load to the GPU based on the queue the developer picks (Compute/Graphics/Copy) and the GCN schedulers (ACEs and Graphic Command Processor) handle the rest... it seems that for Maxwell 2... the driver actually plays a larger role in distributing the work to the various elements without the Graphics card being involved in the process. In theory, this should show up as higher CPU overhead for nVIDIA when handling Asynchronous Compute.
> 
> It also means more latency... and this is what the VR guys have been talking about. It is also what I mentioned in my original theory.
> 
> Now we know nVIDIA didn't lie... well... wait and see before we draw that conclusion. Software support is still valid as many DX12 features are emulated in software in this generation of cards.
> Don't thank me... thank everyone... everyone who made this into a big issue compelling a response


Well, they didn't lie TECHNICALLY. If Async Compute is CPU based on Maxwell 2, this would mean problems if the CPU is with low core amount ( i3 or i5), and they are already pushed due to the game character.


----------



## Devnant

Quote:


> Originally Posted by *Mahigan*
> 
> That's up to the developers. If they code a game to make use of a high amount of Asynchronous Compute... then, in theory, it should hit Maxwell 2 harder than GCN. GCN was built to do everything in hardware.
> 
> What I'm wondering is how the CPU load will look once nVIDIA activate it in their drivers. For GCN, the driver just send the load to the GPU based on the queue the developer picks (Compute/Graphics/Copy) and the GCN schedulers (ACEs and Graphic Command Processor) handle the rest... it seems that for Maxwell 2... the driver actually plays a larger role in distributing the work to the various elements without the Graphics card being involved in the process. In theory, this should show up as higher CPU overhead for nVIDIA when handling Asynchronous Compute.
> 
> It also means more latency... and this is what the VR guys have been talking about. It is also what I mentioned in my original theory.
> 
> Now we know nVIDIA didn't lie... well... wait and see before we draw that conclusion. Software support is still valid as many DX12 features are emulated in software in this generation of cards.
> Don't thank me... thank everyone... everyone who made this into a big issue compelling a response


Cool!









But thanks anyway Mahigan!








Quote:


> Originally Posted by *ku4eto*
> 
> Well, they didn't lie TECHNICALLY. If Async Compute is CPU based on Maxwell 2, this would mean problems if the CPU is with low core amount ( i3 or i5), and they are already pushed due to the game character.


So us folks with 12 and 16 threads shouldn't worry AT ALL right?


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> I don't understand - is it software only? There is nothing in the NV GPU hardware that could carry out such functionality? at all?
> It's hard to follow all these posts.


The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there's a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It's quite a few different "parts" working together. A software and a hardware component if you will.

With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine (for Async compute) or Graphic Command Processor (Graphic tasks but can also handle compute), DMA Engines (Copy). The queues, for pending Async work, are held within the ACEs (8 deep each)... and ACEs handle assigning Async tasks to available compute units.

Simplified...

Maxwell 2: Queues in Software, work distributor in software (context switching), Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.
GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware.


----------



## ku4eto

Quote:


> Originally Posted by *Devnant*
> 
> Cool!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> But thanks anyway Mahigan!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> So us folks with 12 and 16 threads shouldn't worry AT ALL right?


I am still on a 45nm Phenom II. I don't think i will see in the near future an AMD octacore, let alone something that costs 1000$.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> *So us folks with 12 and 16 threads shouldn't worry AT ALL right?*


Nope. I don't see it being a problem with a huge CPU like that. I also don't see many people pairing an i3 with a GTX 980 Ti either


----------



## FastEddieNYC

After reading through this thread I have learned a great deal about how DX12 and Gpu's work. I own both brands and buy what gives me the best performance/cost ratio. I am sure team green will optimize their cards through the driver to limit the impact of AS on performance. What bothers me is all they had to do is say whether maxwell has hardware scheduler or is Cpu based. Instead there is silence which results in all the speculation we are seeing,


----------



## Mahigan

Quote:


> Originally Posted by *FastEddieNYC*
> 
> After reading through this thread I have learned a great deal about how DX12 and Gpu's work. I own both brands and buy what gives me the best performance/cost ratio. I am sure team green will optimize their cards through the driver to limit the impact of AS on performance. What bothers me is all they had to do is say whether maxwell has hardware scheduler or is Cpu based. Instead there is silence which results in all the speculation we are seeing,


That's been bothering me from Day 1. We still only heard it from Oxide. Kollock is a great guy







A developer who seems to care about the PC Gaming community. So rare to see this these days. I think I'll be buying all of the games he works on from now on. I know who he is









I've literally got a smile from ear to ear


----------



## UtopiA

Sweet merciful Jesus, I hope this gets sorted out ASAP so we can bring this mess to a close. At least we have some clarity from Oxide.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there's a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It's quite a few different "parts" working together. A software and a hardware component if you will.
> 
> With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine or Graphic Command Processor, DMA Engines. The queues, for pending work, are held within the ACEs (8 deep each)... and ACEs handle assigning tasks to available compute units.
> 
> Simplified...
> 
> Maxwell 2: Queues in Software, work distributor in software (context switching), Asynchronous Warps in hardware, CUDA cores in hardware.
> GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, CUs in hardware.


I see, so:

1. M2 does have Async capabilities, if you can feed the SMMs correctly, but it has its costs, right? That's in contrast to not having this option at all (like the old VLIW)
2. We know what is the level of efficiency of the M2 machanism?
3. Are we sure there is no hardware like Queues and work distributor in HW and it's simply not activated? Do we know for sure the M2 doesn't have the HW itself?
What is the source? NV developer docs?
4. Why won't you sum it all up in some sticky? that's barely useful right now..


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> I see, so:
> 
> 1. M2 does have Async capabilities, if you can feed the SMMs correctly, but it has its costs, right? That's in contrast to not having this option at all (like the old VLIW)
> 2. We know what is the level of efficiency of the M2 machanism?
> 3. Are we sure there is no hardware like Queues and work distributor in HW and it's simply not activated? Do we know for sure the M2 doesn't have the HW itself?
> What is the source? NV developer docs?
> 4. Why won't you sum it all up in some sticky? that's barely useful right now..


Here's the HyperQ documentation...
http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf

Here's HyperQ on Kepler (but don't forget that Maxwell 2 will add 1 Graphic task into this mix):


Spoiler: Warning: Spoiler!








More info here: http://electronicdesign.com/digital-ics/gpu-architecture-improves-embedded-application-support
Hyper-Q allows multiple CPUs to drive a GPU job queue there by reducing GPU idle time.
Quote:


> Hyper-Q is needed because the CPU/GPU combination these days is really a multicore CPU/multicore GPU combination. The CPU can be running multiple tasks that will initiate jobs on the GPU and the GPU is running a large number of relatively or totally independent jobs. Hyper-Q simplifies the programmers job because it eliminates the bottle neck feeding jobs to the GPU.





Spoiler: Warning: Spoiler!




The Blue squares are the CPU Cores, the Black squares are the Multiple tasks held in the Work Distributor and then the Green part is the SMMs on the GPU side (which contain the Asynchronous Warp Schedulers)


----------



## Xuper

*Nvidia VR preemption "possibly catastrophic"*






David Kanter talks to Scott Wasson on the TR Podcast and confirms Nvidia's problems with Maxwell for VR.

He says that in terms of preemption they are "possibly catastrophic" and even behind pre-Skylake Intel.

Source : Forum Anandtech


----------



## semitope

I hope Nvidia isn't just using them to calm the waves. They haven't directly made any promises and have left oxide to communicate that there's hope.

If their driver hackery doesn't measure up who is to blame?


----------



## Kand

Quote:


> Originally Posted by *Xuper*
> 
> *Nvidia VR preemption "possibly catastrophic"*
> 
> 
> 
> 
> 
> 
> David Kanter talks to Scott Wasson on the TR Podcast and confirms Nvidia's problems with Maxwell for VR.
> 
> He says that in terms of preemption they are "possibly catastrophic" and even behind pre-Skylake Intel.
> 
> Source : Forum Anandtech


I don't see VR picking up anytime soon without our minds being linked directly to the interface. No middleman of eyesight.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kand*
> 
> I don't see VR picking up anytime soon without our minds being linked directly to the interface. No middleman of eyesight.


... Really?









That's ridiculous.


----------



## ku4eto

Quote:


> Originally Posted by *Kand*
> 
> I don't see VR picking up anytime soon without our minds being linked directly to the interface. No middleman of eyesight.


Same thing was with Low-API like Mantle, but it was the base for as its now whole another level - DX12 and Vulkan.
VR will become a thing in the future. If AMD makes it first to the scene with good performance, they can cash-in really good.


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> *Nvidia VR preemption "possibly catastrophic"*
> 
> 
> 
> 
> 
> 
> David Kanter talks to Scott Wasson on the TR Podcast and confirms Nvidia's problems with Maxwell for VR.
> 
> He says that in terms of preemption they are "possibly catastrophic" and even behind pre-Skylake Intel.
> 
> Source : Forum Anandtech


Kanter mentioned the software scheduling too 







I feel vindicated.


----------



## Kand

Quote:


> Originally Posted by *ku4eto*
> 
> Same thing was with Low-API like Mantle, but it was the base for as its now whole another level - DX12 and Vulkan.
> VR will become a thing in the future. If AMD makes it first to the scene with good performance, they can cash-in really good.


The current implementation of VR to me is similar to 3d glasses. Relying on someone's eyesight, will not be the way of the future.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Kand*
> 
> The current implementation of VR to me is similar to 3d glasses. Relying on someone's eyesight, will not be the way of the future.


There are a lot of us that can't even use VR for health/medical/comfort reasons.


----------



## Kand

Quote:


> Originally Posted by *PostalTwinkie*
> 
> There are a lot of us that can't even use VR for health/medical/comfort reasons.


This. I'm pretty certain that farsighted people won't be able to make use of this without their corrective lenses.

And no, contacts are not a solution!


----------



## Hattifnatten

VR won't be mainstream any time soon, but it will be the must have thing for enthusiasts. Oculus really blew me away with DK1, I had never experienced anything like it. And now it seems to me that Valve will do the same thing again, with freedom of movement and those nifty hand-held controllers. I do remember seeing someone do the same thing a year or two back with an OR and a Razer Hydra. Worked really, really well in Half Life 2.
Quote:


> Originally Posted by *Kand*
> 
> The current implementation of VR to me is similar to 3d glasses. Relying on someone's eyesight, will not be the way of the future.


I have never had any trouble with my glasses on the DK1 and the DK2. Sure, it was a tight fit, and sometimes more comfortable to take my glasses off (which worked fine, they include different optics for near and far-sighter people), but the experience was worth the small comfort-sacrifice.


----------



## provost

Ok, I have caught up with the last few pages, and I can understand this is the moment for anyone holding one of AMD's newer (or older cards?) to gloat a little. But, since I don't have one of AMD's newer cards, is there any good news here for Nvidia owners, from the perspective of DX12, other than "don't worry about it, we got you covered by the time DX 12 goes full prime time"?

Does Nvidia have a plan? Any visibility on full DX 12 compliance? Heck, if Intel Igpu can get there before Nvidia, then....


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> Here's the HyperQ documentation...
> http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf
> 
> Here's HyperQ on Kepler (but don't forget that Maxwell 2 will add 1 Graphic task into this mix):
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> 
> More info here: http://electronicdesign.com/digital-ics/gpu-architecture-improves-embedded-application-support
> Hyper-Q allows multiple CPUs to drive a GPU job queue there by reducing GPU idle time.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> The Blue squares are the CPU Cores, the Black squares are the Multiple tasks held in the Work Distributor and then the Green part is the SMMs on the GPU side (which contain the Asynchronous Warp Schedulers)


You should really do some big sticky with table of contents and stuff.

So, they had some kind of DX12 functionality even before? I mean, they could do some level of Async processing with CUDA (meaning:compute)?
Still I don't see how we know the distributor is in SW and not HW, but I'm also not sure why it is that bad, except being a bit slower / having higher latency. It does have some advantage in the form of being more flexible.

Frankly, I'm sure I don't see the whole picture, to say the least, though I try to read and understand

And yea, NV's reacting is really low.. they need to switch context quickly and act in a mature way. I don't even understand the big issue - their GPUs are still very fast and they can surely play with the pricing. They have many more iteration to work on


----------



## FastEddieNYC

Quote:


> Originally Posted by *ku4eto*
> 
> Same thing was with Low-API like Mantle, but it was the base for as its now whole another level - DX12 and Vulkan.
> VR will become a thing in the future. If AMD makes it first to the scene with good performance, they can cash-in really good.


DX12 and Vulcan certainly will help AMD. Nvidia has invested heavily optimizing the DX11 performance that AMD simply could not afford to do. With the new API's the developer has more control of the resources and with the current generation of cards it appears AMD GCN can compete far better performance wise.


----------



## GorillaSceptre

Funny thing after Nvidia supposedly "working on DX12 for years" they still weren't ready with a driver, and don't have a solid hardware implementation for it..

I'll put money on Pascal being in the same position as Maxwell too. AMD might finally get a break, i'd say none of this matters if it weren't for the consoles being on GCN too, Nvidia has found themselves between a bit of a rock and a hard place.

If Pascal more or less is a 16nm Maxwell dye shrink, then the green teem might have a difficult year come 2016.


----------



## provost

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Funny thing after Nvidia supposedly "working on DX12 for years" they still weren't ready with a driver, and don't have a solid hardware implementation for it..
> 
> I'll put money on Pascal being in the same position as Maxwell too. AMD might finally get a break, i'd say none of this matters if it weren't for the consoles being on GCN too, Nvidia has found themselves between a bit of a rock and a hard place.
> 
> If Pascal more or less is a 16nm Maxwell dye shrink, then the green teem might have a difficult year come 2016.


Yep, I was hoping that Nvidia would be able to provide some visibility.


----------



## Kand

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Funny thing after Nvidia supposedly "working on DX12 for years" they still weren't ready with a driver, and don't have a solid hardware implementation for it..
> 
> I'll put money on Pascal being in the same position as Maxwell too. AMD might finally get a break, i'd say none of this matters if it weren't for the consoles being on GCN too, Nvidia has found themselves between a bit of a rock and a hard place.
> 
> If Pascal more or less is a 16nm Maxwell dye shrink, then the green teem might have a difficult year come 2016.


They have 6 months until the first DX12 games start rolling out. That's plenty enough time to get their drivers in gear.

Who knows, they might have alphas and betas and the drivers we see out today are ones that have been brewing since last year. It's also likely pure coincidence that something worked right for AMD.


----------



## Vesku

Quote:


> Originally Posted by *provost*
> 
> Ok, I have caught up with the last few pages, and I can understand this is the moment for anyone holding one of AMD's newer (or older cards?) to gloat a little. But, since I don't have one of AMD's newer cards, is there any good news here for Nvidia owners, from the perspective of DX12, other than "don't worry about it, we got you covered by the time DX 12 goes full prime time"?
> 
> Does Nvidia have a plan? Any visibility on full DX 12 compliance? Heck, if Intel Igpu can get there before Nvidia, then....


Actually GCN 1.0 is doing pretty well in this DX 12 game benchmark. Of course as Kollock mentioned they don't use what he'd consider a heavy amount of Async so it's possible GCN 1.0 will run out of steam on any heavy Async Compute titles. But I may have all Maxwell 2 owners as company ;p.

Fingers crossed I'll be able to hold out for the node shrink depending on whether my 7950 can handle remaining DX 11 releases like Fallout 4. Worst case I'll get a 290(X) to tide me over the ~1 year.

As for Async Compute on Maxwell 2, there are going to be some hard limitations on what a software schedule can handle. Going to be a lot of latency involved keeping the CPU updated on the current state of compute jobs. It's going to be difficult to pull off as can be seen from it not working properly even while there are DX 12 AAA titles decently far along in development right now.


----------



## Devnant

An interesting post from Beyond3D. I hardly understand 50% of it but seems like Ext3h confirms everything Mahigan has been saying and even talks about some possible driver workarounds.



Source: https://forum.beyond3d.com/posts/1870218/
Quote:


> I wouldn't expect the Nvidia cards to perform THAT bad in the future, given that there are still possible gains to be made in the driver. I wouldn't exactly overestimate them either, though. AMD has just a far more scalable hardware design in this domain, and the necessity of switching between compute and graphic context in combination with the starvation issue will continue to haunt Nvidia as that isn't a software but a hardware design fault.


Also, from Jawed
Quote:


> After years of working on D3D12, NVidia has finally realised it needs to do async compute.


Funny stuff!


----------



## Paul17041993

Quote:


> Originally Posted by *Kand*
> 
> This. I'm pretty certain that farsighted people won't be able to make use of this without their corrective lenses.
> 
> And no, contacts are not a solution!


you either wear glasses inside the headset, or have the lenses in the headset adjusted


----------



## provost

Quote:


> Originally Posted by *Vesku*
> 
> Actually GCN 1.0 is doing pretty well in this DX 12 game benchmark. Of course as Kollock mentioned they don't use what he'd consider a heavy amount of Async so it's possible GCN 1.0 will run out of steam on any heavy Async Compute titles. But I may have all Maxwell 2 owners as company ;p.
> 
> Fingers crossed I'll be able to hold out for the node shrink depending on whether my 7950 can handle remaining DX 11 releases like Fallout 4. Worst case I'll get a 290(X) to tide me over the ~1 year.


Well, I ended up replacing my 7950 with one of the 690s that I had lying around, if for no other reason than I hated seeing a more expensive card collecting dust rather than being used somewhere... Lol
The 7950 is somewhere in my attic or a drawer , and not sure if I am going to dig it out (but may have to for comparison sake)

I am sorry, I know I haven't updated my sign rig in while which can be confusing.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Devnant*
> 
> An interesting post from Beyond3D. I hardly understand 50% of it but seems like Ext3h confirms everything Mahigan has been saying and even talks about some possible driver workarounds.
> 
> 
> 
> Source: https://forum.beyond3d.com/posts/1870218/
> Also, from Jawed
> Funny stuff!


"After years of working on D3D12, NVidia has finally realised it needs to do async compute."

Just what i was alluding to!

These past few days have been interesting to say the least.


----------



## Clocknut

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Funny thing after Nvidia supposedly "working on DX12 for years" they still weren't ready with a driver, and don't have a solid hardware implementation for it..
> 
> I'll put money on Pascal being in the same position as Maxwell too. AMD might finally get a break, i'd say none of this matters if it weren't for the consoles being on GCN too, Nvidia has found themselves between a bit of a rock and a hard place.
> 
> If Pascal more or less is a 16nm Maxwell dye shrink, then the green teem might have a difficult year come 2016.


The time when Maxwell has taped out, Nvidia didnt think Async Compute will get so much attention. With the way consoles are having underpowered CPU, Async compute is pretty much guaranteed to be heavily use by game developers because game developer need every ounce of compute they can squeeze out of that small console box.

Pascal is clearly a Maxwell 3.0 the moment they announced right after Volta, it seems to me at the time Nvidia seems to think AMD will remain uncompetitive & 16nm + Finfet + HBM combo will bring enough performance gain to worth 1 generation of milking, they think there is no need to bring Volta on an immature node. (Intel's tick-tock strategy)

I just hope Pascal has not completed its design phase b4 all these async compute getting attention. Nvidia have to either live with pascal with no hardware Async compute support or they push back entire Pascal launch and back to drawing board to get pascal have at least some hardware support. Oh well Good luck Nvidia.


----------



## Mahigan

Quote:


> Originally Posted by *Devnant*
> 
> An interesting post from Beyond3D. I hardly understand 50% of it but seems like Ext3h confirms everything Mahigan has been saying and even talks about some possible driver workarounds.
> 
> 
> 
> Source: https://forum.beyond3d.com/posts/1870218/
> Also, from Jawed
> Funny stuff!


That's life. Now I have to apologize to HardOCP staff as well as LinusTechTips, Tech Report and Tomshardware for being so direct towards them. I have to apologize to Ryan Smith of Anandtech as well. And Beyond3D too. Probably need to stop being so harsh towards people but the Glenn Greenwald in me is kinda in your face







. Something tells me... all this will be published now. Maybe that means I get to spend more time with my wife and less time on this issue?

Quote:


> Originally Posted by *Anna Torrent*
> 
> You should really do some big sticky with table of contents and stuff.
> 
> So, they had some kind of DX12 functionality even before? I mean, they could do some level of Async processing with CUDA (meaning:compute)?
> Still I don't see how we know the distributor is in SW and not HW, but I'm also not sure why it is that bad, except being a bit slower / having higher latency. It does have some advantage in the form of being more flexible.
> 
> Frankly, I'm sure I don't see the whole picture, to say the least, though I try to read and understand
> 
> And yea, NV's reacting is really low.. they need to switch context quickly and act in a mature way. I don't even understand the big issue - their GPUs are still very fast and they can surely play with the pricing. They have many more iteration to work on


I should do a sticky write up on the whole thing including everything we've learned since this issue first broke out. Probably reduce it into simple terms in order to help the digestion of the information. I've made so many errors throughout this whole thing... but the point was to get nVIDIA to acknowledge the problem









I'll probably work on that tomorrow as it's 1:20am here.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Clocknut*
> 
> The time when Maxwell has taped out, Nvidia didnt think Async Compute will get so much attention. With the way consoles are having underpowered CPU, Async compute is pretty much guaranteed to be heavily use by game developers because game developer need every ounce of compute they can squeeze out of that small console box.
> 
> Pascal is clearly a Maxwell 3.0 the moment they announced right after Volta, it seems to me at the time Nvidia seems to think AMD will remain uncompetitive & 16nm + Finfet + HBM combo will bring enough performance gain to worth 1 generation of milking, they think there is no need to bring Volta on an immature node. (Intel's tick-tock strategy)
> 
> I just hope Pascal has not completed its design phase b4 all these async compute getting attention. Nvidia have to either live with pascal with no hardware Async compute support or they push back entire Pascal launch and back to drawing board to get pascal have at least some hardware support. Oh well Good luck Nvidia.


At this point i honestly think Nvidia was hardly involved in DX12 at all, despite what Nvidia claims.

It seems like Mantle did exactly what AMD wanted. MS and AMD were obviously working on the X1(probably in 2011) and AMD decided to invest in Mantle to benefit their PC side of things and play the long game.

It's not far fetched to think Nvidia were left out in the cold. Nearly everything they did with Maxwell( and a lot of what we've heard about Pascal) shows they were assuming a DX11 type API was going to be around for a long time. Heck, most of Maxwell's power improvements comes from them throwing out most of the scheduling hardware. That looks to have back fired on them.


----------



## SpeedyVT

Quote:


> Originally Posted by *FastEddieNYC*
> 
> DX12 and Vulcan certainly will help AMD. Nvidia has invested heavily optimizing the DX11 performance that AMD simply could not afford to do. With the new API's the developer has more control of the resources and with the current generation of cards it appears AMD GCN can compete far better performance wise.


I don't think it's a matter of affording, when a company knows the end is nye for one API rather than spending time focusing on something that already has enough performance it's almost better to focus on the upcoming to provide users the smoothest transition. Obviously NVIDIA spent too much time focusing on optimizing the past and not the future.

Do you really need 240+ fps in an old game? Or do you just need like 120ish fps.


----------



## FastEddieNYC

Quote:


> Originally Posted by *Clocknut*
> 
> The time when Maxwell has taped out, Nvidia didnt think Async Compute will get so much attention. With the way consoles are having underpowered CPU, Async compute is pretty much guaranteed to be heavily use by game developers because game developer need every ounce of compute they can squeeze out of that small console box.
> 
> Pascal is clearly a Maxwell 3.0 the moment they announced right after Volta, it seems to me at the time Nvidia seems to think AMD will remain uncompetitive & 16nm + Finfet + HBM combo will bring enough performance gain to worth 1 generation of milking, they think there is no need to bring Volta on an immature node. (Intel's tick-tock strategy)
> 
> I just hope Pascal has not completed its design phase b4 all these async compute getting attention. Nvidia have to either live with pascal with no hardware Async compute support or they push back entire Pascal launch and back to drawing board to get pascal have at least some hardware support. Oh well Good luck Nvidia.


It's almost impossible for Nvidia to backtrack. Changing the design is both costly and time consuming. They already taped out and have samples. If they simply did node optimization and added some cuda cores then it will give AMD a chance match or even pull ahead in total performance. Supplying the APU's for both game consoles appears to be paying off now. It appears that my decision to go Crossfire 290X this time was a good investment.


----------



## Vesku

Fable Legends from Lionhead Studios will be interesting to watch. Lionhead is responsible for providing the first Async Compute support for Unreal 4 engine:
Quote:


> This feature was implemented by Lionhead Studios. We integrated it and indend to make use of it as a tool to optimize the XboxOne rendering.


https://docs.unrealengine.com/latest/INT/Programming/Rendering/ShaderDevelopment/AsyncCompute/index.html


----------



## Tojara

Quote:


> Originally Posted by *GorillaSceptre*
> 
> At this point i honestly think Nvidia was hardly involved in DX12 at all, despite what Nvidia claims.
> 
> It seems like Mantle did exactly what AMD wanted. MS and AMD were obviously working on the X1(probably in 2011) and AMD decided to invest in Mantle to benefit their PC side of things and play the long game.
> 
> It's not far fetched to think Nvidia were left out in the cold. Nearly everything they did with Maxwell( and a lot of what we've heard about Pascal) shows they were assuming a DX11 type API was going to be around for a long time. Heck, most of Maxwell's power improvements comes from them throwing out most of the scheduling hardware. That looks to have back fired on them.


This is pretty much them getting caught with their pants down and unfortunately just before a node change, a memory change and architecture change, all well within the next year and a half. If they started designing Pascal early enough (which some of the tapeouts point towards) it might be a really rough few years ahead of them with massive backtracking to be done. As much as the console wins and change form VLIW to GCN seemed to be unremarkable in any way, shape or form they have turned the table around while Nvidia has tossed out every advantage they had, in the consumer markets, willingly out of the window. Intel eradicating dGPUs both in desktops and mobile isn't going to help.


----------



## PostalTwinkie

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Funny thing after Nvidia supposedly "working on DX12 for years" they still weren't ready with a driver, and don't have a solid hardware implementation for it..
> 
> I'll put money on Pascal being in the same position as Maxwell too. AMD might finally get a break, i'd say none of this matters if it weren't for the consoles being on GCN too, Nvidia has found themselves between a bit of a rock and a hard place.
> 
> If Pascal more or less is a 16nm Maxwell dye shrink, then the green teem might have a difficult year come 2016.


I take it you both missed the part where Oxide came back into the thread and said they identified an issue, and are working on it with Nvidia.

See below.....
Quote:


> Originally Posted by *Kollock*
> 
> ...........We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.


EDIT:

That translates into the last two weeks not meaning a thing in terms of performance. Sure, we all learnt a few great things, but as for the end result? Still completely in the air. As of right now we know how AMD performs in this one specific pre-beta game currently, and that is about it.


----------



## Forceman

Quote:


> Originally Posted by *Tojara*
> 
> This is pretty much them getting caught with their pants down and unfortunately just before a node change, a memory change and architecture change, all well within the next year and a half. If they started designing Pascal early enough (which some of the tapeouts point towards) it might be a really rough few years ahead of them with massive backtracking to be done. As much as the console wins and change form VLIW to GCN seemed to be unremarkable in any way, shape or form they have turned the table around while Nvidia has tossed out every advantage they had, in the consumer markets, willingly out of the window. Intel eradicating dGPUs both in desktops and mobile isn't going to help.


I can't help but think that this is exactly what AMD wants people to believe, and that leads me to wonder if they had a hand in Oxide deciding to release this benchmark so far before actual game release. Seems like AMD is the only one who really stand to gain anything from it.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I can't help but think that this is exactly what AMD wants people to believe, and that leads me to wonder if they had a hand in Oxide deciding to release this benchmark so far before actual game release. Seems like AMD is the only one who really stand to gain anything from it.


While anything is possible... this may lead to more competition. Maybe even leading to a drop in GPU prices.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> While anything is possible... this may lead to more competition. *Maybe even leading to a drop in GPU prices*.












Ah, you are funny sometimes.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I would have preferred if AMD had never bought ATi. ATi was highly competitive with nVIDIA. But then again I was biased towards ATi at one point in my life. This bias didn't transfer over to AMD... mostly because AMD promised there would be no lay-offs... shortly after the takeover... lay-offs. Many of my former colleagues were laid off.
> 
> There is a degree of bad blood between AMD and I.


with GCN1.0/1.1 they are competitive or even more than ATI was,No GPU could be relevant during 2-3 years with an improved architecture from release.
Quote:


> Originally Posted by *Noufel*
> 
> The table has turned, it's nvidia's turn to be driver limited


first time nvidia doesnt release a proper Driver on time,then we know that AMD has better Directx 12 support


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I take it you both missed the part where Oxide came back into the thread and said they identified an issue, and are working on it with Nvidia.
> 
> See below.....
> EDIT:
> 
> That translates into the last two weeks not meaning a thing in terms of performance. Sure, we all learnt a few great things, but as for the end result? Still completely in the air. As of right now we know how AMD performs in this one specific pre-beta game currently, and that is about it.


Now we go back to my theory...

And David Kanter just confirmed it here: http://www.dsogaming.com/news/oculus-employees-preemption-for-context-switches-is-best-on-amd-nvidia-possibly-catastrophic/

Or 1:18:00 in on this podcast:


----------



## Tojara

Quote:


> Originally Posted by *Mahigan*
> 
> While anything is possible... this may lead to more competition. Maybe even leading to a drop in GPU prices.


For a certain portion of people. A large portion of the market is still very heavily swayed by whatever Nvidia calls the next best thing, be it a new propertiary technology or specific performance thing like memory amount. They simply don't care if there is a better product available either due to straight out bias or lack of knowledge.
Quote:


> Originally Posted by *Forceman*
> 
> I can't help but think that this is exactly what AMD wants people to believe, and that leads me to wonder if they had a hand in Oxide deciding to release this benchmark so far before actual game release. Seems like AMD is the only one who really stand to gain anything from it.


Funnily enough you completely miss the main benefit which is developers gaining new tricks to make their games better. When you start using Oxide pushing Nvidia back as a stepping stone it has quite a few wonderful implications, but because that's just complete theorycrafting at that point I'm not going to go further on it.

There were a couple of rather large "if"s there. The one thing stems from one of Nvidia's VPs saying that designing a GPU at this level is *** hard. It only gets worse from there when you try to control a software stack that has developers of various proficiency in programming (and even English) working on it. The other one is Nvidia willing to cut compute on consumer cards to sell their professional cards which is completely understandable from when they were working on the architecture, but what did they choose to do with it in regards to Pascal? Are they going to gimp effects to make it a non-issue via software or will they backtrack in hardware? Do they have enough market share to push through with it or are the consoles important enough to force them to backtrack? Were they even aware they might need it later on? The hardware decisions for a new GPU architecture are going to be in the pipeline for several years, the very first stepping stone for this was Mantle which was not long ago at all.

The whole scenario currently going might only be one guy in the driver team messing something up or it could be a colossal mistake in communication either purposefully or unintentionally, I guess we'll see soon enough.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> Now we go back to my theory...
> 
> And David Kanter just confirmed it here: http://www.dsogaming.com/news/oculus-employees-preemption-for-context-switches-is-best-on-amd-nvidia-possibly-catastrophic/
> 
> Or 1:18:00 in on this podcast:


Except he opens that opinion piece with incorrect information, so you can't put much into the rest of what he is saying. At least if he can't be bothered to be on top of his chosen topic. Obviously I think the video will be older than the latest information, but no reason for him to not have the latest information - if he was indeed following closely - in the text.

The statement about preemption hinges on the fact that Nvidia can't do ASC. That is a problem because we don't know if Nvidia can or can't. Oxide just stated they thought ASC was working with Nvidia, but apparently it isn't.

So again, a whole bunch more assumptions based off incorrect information.

EDIT:

At this point I am more interested in getting to the bottom of this, than the Nano release! I am having a lot of fun with this one, pretty exciting.

EDIT 2:

Oh, and thanks for conducting this train Mahigan. You have been huge in this!


----------



## Vesku

Quote:


> Originally Posted by *PostalTwinkie*
> 
> The statement about preemption hinges on the fact that Nvidia can't do ASC. That is a problem because we don't know if Nvidia can or can't. Oxide just stated they thought ASC was working with Nvidia, but apparently it isn't.


No, Nvidia's pre-emption limitations do not hinge on the fact that Nvidia can't do ASC. The design that resulted in those limitations is one of the reasons implementing Async Compute will be difficult for Nvidia. It's an aspect of their scheduler design. The information coalescing around Async Compute is that Nvidia's software engineers have yet to pull off a major magic trick to make up for hardware limitations which I agree makes things interesting.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Except he opens that opinion piece with incorrect information, so you can't put much into the rest of what he is saying. At least if he can't be bothered to be on top of his chosen topic. Obviously I think the video will be older than the latest information, but no reason for him to not have the latest information - if he was indeed following closely - in the text.
> 
> The statement about preemption hinges on the fact that Nvidia can't do ASC. That is a problem because we don't know if Nvidia can or can't. Oxide just stated they thought ASC was working with Nvidia, but apparently it isn't.
> 
> So again, a whole bunch more assumptions based off incorrect information.
> 
> EDIT:
> 
> At this point I am more interested in getting to the bottom of this, than the Nano release! I am having a lot of fun with this one, pretty exciting.
> 
> EDIT 2:
> 
> Oh, and thanks for conducting this train Mahigan. You have been huge in this!


Slow context switches doesn't hinge on nVIDIA not being able to perform Async. It hinges on the latency issue I mentioned in my theory. Occulus has mentioned it a few times. nVIDIA can do Async, as my theory suggested, but it may not improve performance, might hinder it. Of course this may not be the case for AotS, it only makes mild usage of Asynchronous compute, but if games make heavy use of it... ouch.

That's what David Kanter is suggesting here. Occulus say the same thing.


----------



## semitope

Quote:


> You will find that the vast majority of DX12 titles in 2015/2016 are partnering with AMD. Mantle taught the development world how to work with a low-level API, the consoles use AMD and low-level APIs, and now those seeds are bearing fruit.


Definitely going to be important going forward


__
https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/cuom7cc


----------



## ZealotKi11er

Quote:


> Originally Posted by *GorillaSceptre*
> 
> At this point i honestly think Nvidia was hardly involved in DX12 at all, despite what Nvidia claims.
> 
> It seems like Mantle did exactly what AMD wanted. MS and AMD were obviously working on the X1(probably in 2011) and AMD decided to invest in Mantle to benefit their PC side of things and play the long game.
> 
> It's not far fetched to think Nvidia were left out in the cold. Nearly everything they did with Maxwell( and a lot of what we've heard about Pascal) shows they were assuming a DX11 type API was going to be around for a long time. Heck, most of Maxwell's power improvements comes from them throwing out most of the scheduling hardware. That looks to have back fired on them.


That monster DX11 driver did not come out of thin air.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> Slow context switches doesn't hinge on nVIDIA not being able to perform Async. It hinges on the latency issue I mentioned in my theory. Occulus has mentioned it a few times. nVIDIA can do Async, as my theory suggested, but it may not improve performance, might hinder it. Of course this may not be the case for AotS, it only makes mild usage of Asynchronous compute, but if games make heavy use of it... ouch.
> 
> That's what David Kanter is suggesting here. Occulus say the same thing.


So ASC improves Context Switching, as CS has a large amount of latency? The ASC counters that latency issue. Instead of one being dependent on the other, one greatly helps the other.

Well, I had that completely backwards then!


----------



## Forceman

Quote:


> Originally Posted by *Tojara*
> 
> Funnily enough you completely miss the main benefit which is developers gaining new tricks to make their games better.


I'm not talking about DX12, I'm talking about releasing a benchmark for a pre-alpha game 6 months (?) before release, which conveniently serves to put Nvidia in a bad light soon after the "meh" launch of a new product line. Pretty lucky coincidence for AMD, I guess.


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> Definitely going to be important going forward
> 
> 
> __
> https://www.reddit.com/r/3iwn74/kollock_oxide_games_made_a_post_discussing_dx12/cuom7cc%5B/URL


----------



## Shivansps

Quote:


> Originally Posted by *Kollock*
> 
> Regarding Async compute, a couple of points on this. FIrst, though we are the first D3D12 title, I wouldn't hold us up as the prime example of this feature. There are probably better demonstrations of it. This is a pretty complex topic and to fully understand it will require significant understanding of the particular GPU in question that only an IHV can provide. I certainly wouldn't hold Ashes up as the premier example of this feature.
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.
> 
> Also, we are pleased that D3D12 support on Ashes should be functional on Intel hardware relatively soon, (actually, it's functional now it's just a matter of getting the right driver out to the public).
> 
> Thanks!


I just wanted to ask about that, is not possible if someone has 2 DX12 devices, (lets say a Intel IGP that supports DX12, or a AMD GCN IGP, or maybe another GPU), to just send compute tasks to the devices that are not busy doing graphics?


----------



## Kand

Quote:


> Originally Posted by *PontiacGTX*
> 
> with GCN1.0/1.1 they are competitive or even more than ATI was,No GPU could be relevant during 2-3 years with an improved architecture from release.


8800gt
9800gt (65nm g92)
Gts 250 (55nm g92b)

Hawaii is AMDs g92.


----------



## CasualCat

Quote:


> Originally Posted by *PostalTwinkie*
> 
> There are a lot of us that can't even use VR for health/medical/comfort reasons.


I don't even know if I'll have those issues...need to find somewhere to demo one.









My last 3D goggle type experience was Virtual Boy







and before that SEGA Master System 3D glasses which I really, really enjoyed.

What is weird to me if this is an issue for VR for Nvidia, I seem to remember Nvidia showing off Maxwell tech slides showing they had worked out latency and other issues specifically for VR.


----------



## PontiacGTX

Quote:


> Originally Posted by *Kand*
> 
> 8800gt
> 9800gt (65nm g92)
> Gts 250 (55nm g92b)
> 
> Hawaii is AMDs g92.


and ATI?


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> and ATI?


Radeon 9700 Pro (R300)


----------



## Kand

Quote:


> Originally Posted by *PontiacGTX*
> 
> and ATI?


Honestly, there's more rebrands from the AMD side. Not counting the low end, just the ones that matter.

6870 > 7770 > 260x
6970 > 7870 > 270x
7970 > 280x > 380x
290x > 390x


----------



## Dudewitbow

Quote:


> Originally Posted by *Kand*
> 
> Honestly, there's more rebrands from the AMD side. Not counting the low end, just the ones that matter.
> 
> 6870 > 7770 > 260x
> 6970 > 7870 > 270x
> 7970 > 280x > 380x
> 290x > 390x


the 6870 wasn't GCN, neither was the 6970. both are vliw

should also mention that the 260x was a rebrand of the 7790, and not the 7770


----------



## Clocknut

Quote:


> Originally Posted by *Kand*
> 
> 8800gt
> 9800gt (65nm g92)
> Gts 250 (55nm g92b)
> 
> Hawaii is AMDs g92.


more like 7970 + 7790 + R9 285, these are the first of their GCN kind. Hawaii isnt. It is an oversized 7790


----------



## PontiacGTX

Quote:


> Originally Posted by *Kand*
> 
> Honestly, there's more rebrands from the AMD side. Not counting the low end, just the ones that matter.
> 
> 6870 > 7770 > 260x
> 6970 > 7870 > 270x
> 7970 > 280x > 380x
> 290x > 390x


the 7770 isnt a 6870 or 260/x,nor the 7870 is a 6970...and the 380x should be GCN1.2 2048Sp tonga
Quote:


> Originally Posted by *Mahigan*
> 
> Radeon 9700 Pro (R300)


do you know of something more recent?
.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> the 7770 isnt a 6870 or 260/x,nor the 7870 is a 6970...and the 380x should be GCN1.2 2048Sp tonga
> do you know of something more recent?
> .


Sorry, I thought you guys were talking about GPUs which had a long run because they were built properly.

As for the topic of this thread... Page 31:


Spoiler: Warning: Spoiler!







https://developer.nvidia.com/sites/default/files/akamai/gameworks/vr/GameWorks_VR_2015_Final_handouts.pdf


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> Sorry, I thought you guys were talking about GPUs which had a long run because they were built properly.
> 
> As for the topic of this thread... Page 31:
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> https://developer.nvidia.com/sites/default/files/akamai/gameworks/vr/GameWorks_VR_2015_Final_handouts.pdf


yeah that was exactly


Spoiler: Spoiler



what i asked the GPUs that could have the best performance along the time while having features that would be used in the future


----------



## CasualCat

Quote:


> Originally Posted by *Forceman*
> 
> I'm not talking about DX12, I'm talking about releasing a benchmark for a pre-alpha game 6 months (?) before release, which conveniently serves to put Nvidia in a bad light soon after the "meh" launch of a new product line. Pretty lucky coincidence for AMD, I guess.


But honestly I thought FuryXs are still hard to get and when you can get them you're paying a premium? So how does that help AMD? It'd be different if they were collecting dust on store shelves, but that doesn't appear to be the case.

Stepping down in cost I'd rather have the Fury than the 980 (have never really liked the 980 at the price they're asking).

Maybe it helps with 390/390X?

I just don't see how they benefit financially. At best it is a PR win, but AMD needs sales not PR.


----------



## Kand

Quote:


> Originally Posted by *PontiacGTX*
> 
> the 7770 isnt a 6870 or 260/x,nor the 7870 is a 6970...and the 380x should be GCN1.2 2048Sp tonga
> do you know of something more recent?
> .


I wrote that on the fly. Was expecting to get corrected. Haha.


----------



## Forceman

Quote:


> Originally Posted by *CasualCat*
> 
> But honestly I thought FuryXs are still hard to get and when you can get them you're paying a premium? So how does that help AMD? It'd be different if they were collecting dust on store shelves, but that doesn't appear to be the case.
> 
> Stepping down in cost I'd rather have the Fury than the 980 (have never really liked the 980 at the price they're asking).
> 
> Maybe it helps with 390/390X?
> 
> I just don't see how they benefit financially. At best it is a PR win, but AMD needs sales not PR.


I think Fury are pretty widely available, at least in the States, and Fury X go in and out. Both are available on Newegg right now (or they were, looks like the Sapphire isn't anymore? - maybe I needed a page refresh). Anyway, the big gain is casting doubt on Nvidia, and trying to stop the steamroller of sales they are on. I'm not saying AMD pressured Oxide to release the benchmark (but I wouldn't be surprised if they did), but AMD obviously knows it makes them look good and they need all the positive press they can get. And you've got Oxide in here telling people not to post screenshots or anything, so it doesn't sound like they are looking for free press.

I don't know, just imagining how this would all be portrayed if it was a Nvidia sponsored game that released an Nvidia-favorable benchmark months before release.


----------



## diggiddi

Quote:


> Originally Posted by *SlackerITGuy*
> 
> *Maybe* if you stream or have a lot of background apps while you play, but with a single threaded rendering engine like the Frostbite 2 engine, there's no way you'd see any significant performance increase going from Thuban to Bulldozer.
> 
> Proof (and this is on 720p):
> 
> 
> Spoiler: Warning: Spoiler!











Not trying to be rude, but let me repeat Once again, I don't care what that review or any other says, I tried it for myself,
I disabled 2 cores on my 4.6ghz 8350 and that V6 could not max out my overclocked hd7950 @1165/1300, the V8 could
as could my phenom II X4 at 4.2ghz.

IMO I actually think an Overclocked(if you can get it to 4.2ghz) thuban is a better chip than any current STOCK FX except maybe a 9590 Unfortunately I don't have either one so I cant prove it, but that's just my opinion

In the review you quoted the Thuban is at 3.3ghz which is at an ipc equivalent to 3.6 on the Piledriver, clock it to 4.2 which is 4.6 piledriver IPC, come back and lets talk

Now lets get back on topic


----------



## PostalTwinkie

Quote:


> Originally Posted by *Forceman*
> 
> I think Fury are pretty widely available, at least in the States, and Fury X go in and out. Both are available on Newegg right now (or they were, looks like the Sapphire isn't anymore? - maybe I needed a page refresh). Anyway, the big gain is casting doubt on Nvidia, and trying to stop the steamroller of sales they are on. I'm not saying AMD pressured Oxide to release the benchmark (but I wouldn't be surprised if they did), but AMD obviously knows it makes them look good and they need all the positive press they can get. And you've got Oxide in here telling people not to post screenshots or anything, so it doesn't sound like they are looking for free press.
> 
> I don't know, just imagining how this would all be portrayed if it was a Nvidia sponsored game that released an Nvidia-favorable benchmark months before release.


The gamble is that Nvidia does release a DX 12 driver, and that the issue with ASC and Nvidia is resolved in this benchmark/preview. If that happens, and Nvidia sees any gain, they are once again ahead of AMD and have a bunch of fodder to throw back.If Nvidia can come back and outperform AMD in this specific scenario, people will feel validated in feeling this was a ploy by AMD the whole time.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> I think Fury are pretty widely available, at least in the States, and Fury X go in and out. Both are available on Newegg right now (or they were, looks like the Sapphire isn't anymore? - maybe I needed a page refresh). Anyway, the big gain is casting doubt on Nvidia, and trying to stop the steamroller of sales they are on. I'm not saying AMD pressured Oxide to release the benchmark (but I wouldn't be surprised if they did), but AMD obviously knows it makes them look good and they need all the positive press they can get. And you've got Oxide in here telling people not to post screenshots or anything, so it doesn't sound like they are looking for free press.
> 
> I don't know, just imagining how this would all be portrayed if it was a Nvidia sponsored game that released an Nvidia-favorable benchmark months before release.


Wait till it happens... I'll be the first to call them on their BS too. It doesn't seem that way right now but it will be that way.


----------



## Forceman

Quote:


> Originally Posted by *PostalTwinkie*
> 
> The gamble is that Nvidia does release a DX 12 driver, and that the issue with ASC and Nvidia is resolved in this benchmark/preview. If that happens, and Nvidia sees any gain, they are once again ahead of AMD and have a bunch of fodder to throw back.If Nvidia can come back and outperform AMD in this specific scenario, people will feel validated in feeling this was a ploy by AMD the whole time.


Yeah, I don't know. You already have dozens of people running around shouting about how Nvidia can't do async, they are emulating it in software, their DX12 performance is going to be terrible, etc, etc. I'd say the damage is already done.


----------



## Clocknut

Quote:


> Originally Posted by *GorillaSceptre*
> 
> At this point i honestly think Nvidia was hardly involved in DX12 at all, despite what Nvidia claims.
> 
> It seems like Mantle did exactly what AMD wanted. MS and AMD were obviously working on the X1(probably in 2011) and AMD decided to invest in Mantle to benefit their PC side of things and play the long game.
> 
> It's not far fetched to think Nvidia were left out in the cold. Nearly everything they did with Maxwell( and a lot of what we've heard about Pascal) shows they were assuming a DX11 type API was going to be around for a long time. Heck, most of Maxwell's power improvements comes from them throwing out most of the scheduling hardware. That looks to have back fired on them.


I think it is more like they have involved in base DirectX12 development, but the extension of DirectX12 is what they are missing out. What our DirectX12 now is actually DirectX12 with Mantle feature. AMD basically force Microsoft to play his chess game. Microsoft have no other choice but to make DirectX12 even more GCN compatible, because being an underdog in console, Sony PS4 is a serious threat to XboxOne and sony's own in-house API may very well support all kinds of GCN feature. There is noway Microsoft gonna left any GCN parts left unused to make their Xbox less optimized, they have no other choice but to push DirectX12 to be more GCN compatible. It also helps to make game development easier. Nvidia is the one getting screw over on all these at the end. (collateral damage? oops sorry Nvidia







my Xbox first)
Quote:


> Originally Posted by *Tojara*
> 
> This is pretty much them getting caught with their pants down and unfortunately just before a node change, a memory change and architecture change, all well within the next year and a half. If they started designing Pascal early enough (which some of the tapeouts point towards) it might be a really rough few years ahead of them with massive backtracking to be done. As much as the console wins and change form VLIW to GCN seemed to be unremarkable in any way, shape or form they have turned the table around while Nvidia has tossed out every advantage they had, in the consumer markets, willingly out of the window. Intel eradicating dGPUs both in desktops and mobile isn't going to help.


I kind of think that Kepler's gimp compute might be because Nvidia assume AMD might go on VLIW4 path. VLIW4 rather new architecture & is weak in compute, had AMD go down with VLIW4 in 28nm, Kepler would still be a winner in compute.


----------



## Paul17041993

Quote:


> Originally Posted by *diggiddi*
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Not trying to be rude, but let me repeat Once again, I don't care what that review or any other says, I tried it for myself,
> I disabled 2 cores on my 4.6ghz 8350 and that V6 could not max out my overclocked hd7950 @1165/1300, the V8 could
> as could my phenom II X4 at 4.2ghz.
> 
> IMO I actually think an Overclocked(if you can get it to 4.2ghz) thuban is a better chip than any current STOCK FX except maybe a 9590 Unfortunately I don't have either one so I cant prove it, but that's just my opinion
> 
> In the review you quoted the Thuban is at 3.3ghz which is at an ipc equivalent to 3.6 on the Piledriver, clock it to 4.2 which is 4.6 piledriver IPC, come back and lets talk
> 
> Now lets get back on topic


At the time the first lines of FX were released, it was frequently established that thuban was still just as good and especially in per-core IPC. The main intention for the module system was to allow heaps of high efficiency integer cores that could handle more things at a time, which is ideal for mantle, DX12 and vulkan.

Basically AMD had the idea for all this and with GCN many years back, however everything basically got messed around and we're only just seeing the FX series (and GCN obviously) actually do what was intended of them.

All this debate of DX12 and async makes me want to code up a scaled torture test, though I really should get back to fixing firefox's audio... P:


----------



## PostalTwinkie

Quote:


> Originally Posted by *Forceman*
> 
> Yeah, I don't know. You already have dozens of people running around shouting about how Nvidia can't do async, they are emulating it in software, their DX12 performance is going to be terrible, etc, etc. I'd say the damage is already done.


While I agree that some damage has been done to Nvidia over this, the damage to AMD, if Nvidia comes out ahead after this unknown issue is fixed, will be far greater.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> Yeah, I don't know. You already have dozens of people running around shouting about how Nvidia can't do async, they are emulating it in software, their DX12 performance is going to be terrible, etc, etc. I'd say the damage is already done.


I wouldn't call it damage. Once the big tech websites report on the whole thing people will know exactly what's going on. I'm hoping nVIDIA just come clean about exactly what the issue was or is. If they fix it in their driver then that's even better.

As for me, I'd like to launch that website I was talking about. It'll take a huge effort but I'd like to have a website going which covers Developers as well as GPU/CPU architectures (I figure anyone interested in helping is welcome). Kinda relaying the information from both ends and getting a better idea of future trends. We have benchmark tech sites who focus on the performance of today but that doesn't seem to help us make the best decisions for tomorrow.

One thing is for sure. I'm tired of bad practices from large firms. As long as the info is as objective and information (truthful) as possible... I think it could lead to more informed consumers. I dunno.. it's an idea and I have many people willing to help me make it happen.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> While I agree that some damage has been done to Nvidia over this, the damage to AMD, if Nvidia comes out ahead after this unknown issue is fixed, will be far greater.


AMD was correct when they stated that nVIDIA relied on slow context switching. So I don't see how they could incur damage. AMD also has the majority of game developers lined up for 2015/2016. But AMD is no saint. We all can agree on that. I think that some of the things their PR folks said only inflamed the situation. Maybe they'll eat flakk for that.

If nVIDIA fix this issue... that's a win. But it still doesn't tell us about what will happen when heavier Async titles arrive. With the developers and titles AMD has lined up, DX12, on the onset, may provide a more even competitive market. That is until Pascal and Greenland hit. Who knows then.


----------



## Remij

Quote:


> Originally Posted by *Mahigan*
> 
> AMD was correct when they stated that nVIDIA relied on slow context switching. So I don't see how they could incur damage. AMD also has the majority of game developers lined up for 2015/2016. But AMD is no saint. We all can agree on that. I think that some of the things their PR folks said only inflamed the situation. Maybe they'll eat flakk for that.
> 
> *If nVIDIA fix this issue... that's a win. But it still doesn't tell us about what will happen when heavier Async titles arrive.* With the developers and titles AMD has lined up, DX12, on the onset, may provide a more even competitive market. That is until Pascal and Greenland hit. Who knows then.


If Nvidia can at least mitigate the issue with light/moderate Async compute titles until Pascal and subsequent GPUs launch, then Maxwells deficiencies will be long swept under the rug by the time heave Async compute titles start coming out.

I mean, how likely that extremely heavy compute titles hit within the next year?

It'll be interesting to see how this plays out tbh.


----------



## Shivansps

I did mention this earlier, why games cant just send the compute task to a DX12 IGP? or maybe a secondary DX12 dgpu? im not sure if DX12 helps in improving that or huge amounts of memcpy are still needed.


----------



## Paul17041993

Quote:


> Originally Posted by *Remij*
> 
> If Nvidia can at least mitigate the issue with light/moderate Async compute titles until Pascal and subsequent GPUs launch, then Maxwells deficiencies will be long swept under the rug by the time heave Async compute titles start coming out.
> 
> I mean, how likely that extremely heavy compute titles hit within the next year?
> 
> It'll be interesting to see how this plays out tbh.


weelll... we'll see, my plans are still very loose so there's not much point explaining anything just yet...


----------



## orlfman

Quote:


> Originally Posted by *Remij*
> 
> If Nvidia can at least mitigate the issue with light/moderate Async compute titles until Pascal and subsequent GPUs launch, then Maxwells deficiencies will be long swept under the rug by the time heave Async compute titles start coming out.
> 
> I mean, how likely that extremely heavy compute titles hit within the next year?
> 
> It'll be interesting to see how this plays out tbh.


this is something i kinda figured nvidia will do. with how popular their gameworks is, i can see them for the time being making sure any gameworks dx12 title uses little, if any, async. even with async disabled dx12 does provide a boost in performance for all nvidia cards, not as great as AMD and definitely not as great with async + AMD, but a increase nevertheless.


----------



## Kand

Quote:


> Originally Posted by *orlfman*
> 
> this is something i kinda figured nvidia will do. with how popular their gameworks is, i can see them for the time being making sure any gameworks dx12 title uses little, if any, async. even with async disabled dx12 does provide a boost in performance for all nvidia cards, not as great as AMD and definitely not as great with async + AMD, but a increase nevertheless.


Can we all stop crediting AMD getting a "boost" with async enabled?

Their dx11 drivers should not be compared due to how appauling performance is for this game.

Meaning, for all future titles, they epxect you to be on Windows 10. If you own an AMD gpu and stick with 7, you are SOL.

There are plenty of reasons to not upgrade to 10 as of this time. DX12 is not a appealing enough as of yet.


----------



## Clocknut

Quote:


> Originally Posted by *Kand*
> 
> Can we all stop crediting AMD getting a "boost" with async enabled?
> 
> Their dx11 drivers should not be compared due to how appauling performance is for this game.
> 
> Meaning, for all future titles, they epxect you to be on Windows 10. If you own an AMD gpu and stick with 7, you are SOL.
> 
> There are plenty of reasons to not upgrade to 10 as of this time. DX12 is not a appealing enough as of yet.


windows 7 market is insignificant when u compared to consoles.


----------



## Noufel

Quote:


> Originally Posted by *Clocknut*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Kand*
> 
> Can we all stop crediting AMD getting a "boost" with async enabled?
> 
> Their dx11 drivers should not be compared due to how appauling performance is for this game.
> 
> Meaning, for all future titles, they epxect you to be on Windows 10. If you own an AMD gpu and stick with 7, you are SOL.
> 
> There are plenty of reasons to not upgrade to 10 as of this time. DX12 is not a appealing enough as of yet.
> 
> 
> 
> windows 7 market is insignificant when u compared to consoles.
Click to expand...

Euh what ??! Mmo players on windows 7 are in millions that isn't insignicant for me .


----------



## airfathaaaaa

Quote:


> Originally Posted by *Noufel*
> 
> they didn't know that it will be a major issue and thnx to people like you Mahigan they admited that


how can they didnt know? its not like dx12 came in one year and suprised everyone..
they knew it long ago microsoft already told us
they just didnt want to do anything because they were milking the dx11 cow and i bet giving the shady stuff this company has done i will not be suprise to learn at some point that they were pushing devs not to accept origin till they were ready for it....


----------



## Paul17041993

Quote:


> Originally Posted by *Kand*
> 
> Can we all stop crediting AMD getting a "boost" with async enabled?
> 
> Their dx11 drivers should not be compared due to how appauling performance is for this game.
> 
> Meaning, for all future titles, they epxect you to be on Windows 10. If you own an AMD gpu and stick with 7, you are SOL.
> 
> There are plenty of reasons to not upgrade to 10 as of this time. DX12 is not a appealing enough as of yet.


Because it's totally AMD's fault that microsoft want DX12 to be windows 10 exclusive.


----------



## delboy67

Sorry I thought it was funny


----------



## airfathaaaaa

Quote:


> Originally Posted by *Paul17041993*
> 
> Because it's totally AMD's fault that microsoft want DX12 to be windows 10 exclusive.


yeah nothing to do that they want to force whoever possibly have a card that can support dx12 to be on win 10...
no its amd...
(not to mention that win 10 has some kernel change that better support dx12 too)


----------



## Clocknut

Quote:


> Originally Posted by *Noufel*
> 
> Euh what ??! Mmo players on windows 7 are in millions that isn't insignicant for me .


It is still a small thing compared to console gaming. Microsoft doesnt put windows gaming as their no1. It is Xbox that is no.1 to them.


----------



## Xuper

Perhaps Game with DX12 would have AC setting with option : off/low/medium/high.Just like Tessellation.


----------



## SpeedyVT

Quote:


> Originally Posted by *airfathaaaaa*
> 
> how can they didnt know? its not like dx12 came in one year and suprised everyone..
> they knew it long ago microsoft already told us
> they just didnt want to do anything because they were milking the dx11 cow and i bet giving the shady stuff this company has done i will not be suprise to learn at some point that they were pushing devs not to accept origin till they were ready for it....


It would explain their rush to release a ti variation if they had no confidence in their DX12 compatibility.

Make the quick buck, take the fall out, blame other issues and then release a card with more capatbility the following year. AMD and NVidia are not stupid they plan these things out they'll have many designs pre-planned pre-configured. How many processor designs does AMD mention in a year, usually more fingers than you have on one hand, same goes for video card designs. They don't make all the tidbits public. NVidia knew the wall it was colliding into it's just that they haven't the time to shift physical productions over to the fabrications the additional labeling, marketing, and so on. 980 ti already fabricated in the thousands like last year or longer.

They had choices, waste the cash putting off for a better design or toy with your consumer base to recouporate your loses. The recouporating is a significantly better move to consider when you've got stock holders and market share control. AMD did this when they had the FX series release, but no market share control. That processor was all made out years before release, invested design and fabrication production. They chose to pursue the logical instant resolve with the stock holders which lasts only a second so they can bail with their shares up.

STOCKS KILL COMPANIES!

You're better off a private company as your revenue will remain stronger. NVidia has a tactical advantage and that's market share it'll more than definitely absorb the losses till it can produce a compatible product with the support of the newer API.

PS: This is not an NVidia sucks or AMD sucks thing. It's just how business works and how these businesses are sailing their ships. Technically all ships eventually sink, but these captains will always bail their boat once they realise it.


----------



## spacin9

Quote:


> Originally Posted by *delboy67*
> 
> Sorry I thought it was funny


all time classic lol. You can't not laugh.

I don't know what all the fuss is about. Maxwell OC'd makes up for the deficiencies. 390X and Fury don't OC well. And NV hasn't put out an optimized driver yet.

I love gameworks. Witcher 3 works great for me.









Give me a stutter-free Fury X Crossfire and I'll defect when I dump my Titan Xs next year.


----------



## spacin9

and the dopple-ganger post.. my bad sry


----------



## HiTechPixel

Wow, I didn't realize how much controversy this benchmark has stirred up. Just sit tight and wait until there are more DX12 games out and if it is as bad as people say it is wait for Arctic Islands and/or Pascal. No need to sell your Titan X yet.


----------



## criminala

I think Nvidia never expected Microsoft to take Mantle seriously and implement it in DX12 .
Nvidia probably thought they could ignore Mantle and that it would go away over time (just like GLIDE once tried something similar and died a quiet dead too) .

AMD tries to think ahead and makes things for their users to benefit from , freely . (Mantle,freesync,..) . That in my book is a great thing and I hope they can continue that way . Hopefully more and more people see the healthy , great attitude of this company in the future .


----------



## Klocek001

Quote:


> Originally Posted by *HiTechPixel*
> 
> No need to sell your Titan X yet.


that's what I think too. I mean, if I sold my 980ti, then what ? Buy Fury X ? It's noticeably slower in dx11 and it doesn't scale so well in dx12 too, certainly not well enough to pay the price of two 390s. Two 390s in turn will give you a headache in dx11 because of cfx issues and high temps/power consumption.


----------



## loveuguys

IMHO









Comparing this games benchmark to a really big massive Multiplayer in C&C 4 or SC2 or RA3 on a 5 year old machine...
I just can't figure it out, which visual effects are here so demanding, that a 2015 mid-hig rig with dx12 has problems runing it....

Watching the benchmark on youtube I really can't figure it out. I expected photorealistic graphics or textures or units or i don't know...
But in reality ? Hmmm...
In reality I see a couple more mini units outzoomed like in CIV5 with a unrealisticly awfull lot laser beams and smoke (2D) and a very bad terrain... based on 2015-dx12-very demanding game expectations...

_I don't care about Nvidia or AMD, I just get the feeling that all this hocus pocus bogus around AotS and DX12 is just a very neat way to promote, publicise and attrack a lot of attention basically to the game itself.... i.e. free marketing - clickbaiting..._

Last two weeks news was like about thousand of the same garbage articles about Oxide, dx12, nvidia, amd... but nothing real about the game, gameplay or anything more about towards that... for example the high price (u said you choose the easy way to programme) even the benchmarks is not free...

I hope that maybe someone else can also see through that battle of words and maybe focus on the game & price more.

Oh and all this started with a v0.49 Pre-beta - you have to pay - to test build ?
C'mon...

Final version will be v1.1 and 70$ + once again thousand of articles about the dev, dx12 and not the game?

That my opinion on all this AotS hype, over and out.

I sure hope, that the devs (all game devs) start doing games which will work on both PC markets, green and red if it wants to sell in big numbers.


----------



## HiTechPixel

Quote:


> Originally Posted by *Klocek001*
> 
> that's what I think too. I mean, if I sold my 980ti, then what ? Buy Fury X ? It's noticeably slower in dx11 and it doesn't scale so well in dx12 too, certainly not well enough to pay the price of two 390s. Two 390s in turn will give you a headache in dx11 because of cfx issues and high temps/power consumption.


Fury X has 4GB HBM GEN1, not enough for high resolutions.

390X has 8GB GDDR5 but not enough horsepower.

And I'm also on Windows 7 until Microsoft removes telemetry and data collection in Windows 10.


----------



## Themisseble

Quote:


> Originally Posted by *HiTechPixel*
> 
> Fury X has 4GB HBM GEN1, not enough for high resolutions.
> 
> 390X has 8GB GDDR5 but not enough horsepower.
> 
> And I'm also on Windows 7 until Microsoft removes telemetry and data collection in Windows 10.


R9 390X has as much horsepower as GTX 980TI.

Hmm and here is something too

Fury X had 4GB VRAM and 512 GB/s
GTX 980 TI has 6GB VRAm and 337 GB/s
So NVIDIA has 50% more VRAM but AMD VRAM is 52% faster.


----------



## airfathaaaaa

Quote:


> Originally Posted by *HiTechPixel*
> 
> Fury X has 4GB HBM GEN1, not enough for high resolutions.
> 
> 390X has 8GB GDDR5 but not enough horsepower.
> 
> And I'm also on Windows 7 until Microsoft removes telemetry and data collection in Windows 10.


yeah because otherwise you are so safe from anyone spying on you...you are just a folder on a hard disk just like all of us
also 4 gb for now its enough to play at mid high results there is not a single card from this gen that can actually play any game on 4k with a good fps pace..


----------



## Hattifnatten

Quote:


> Originally Posted by *loveuguys*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> IMHO
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Comparing this games benchmark to a really big massive Multiplayer in C&C 4 or SC2 or RA3 on a 5 year old machine...
> I just can't figure it out, which visual effects are here so demanding, that a 2015 mid-hig rig with dx12 has problems runing it....
> 
> Watching the benchmark on youtube I really can't figure it out. I expected photorealistic graphics or textures or units or i don't know...
> But in reality ? Hmmm...
> In reality I see a couple more mini units outzoomed like in CIV5 with a unrealisticly awfull lot laser beams and smoke (2D) and a very bad terrain... based on 2015-dx12-very demanding game expectations...
> 
> _I don't care about Nvidia or AMD, I just get the feeling that all this hocus pocus bogus around AotS and DX12 is just a very neat way to promote, publicise and attrack a lot of attention basically to the game itself.... i.e. free marketing - clickbaiting..._
> 
> Last two weeks news was like about thousand of the same garbage articles about Oxide, dx12, nvidia, amd... but nothing real about the game, gameplay or anything more about towards that... for example the high price (u said you choose the easy way to programme) even the benchmarks is not free...
> 
> I hope that maybe someone else can also see through that battle of words and maybe focus on the game & price more.
> 
> Oh and all this started with a v0.49 Pre-beta - you have to pay - to test build ?
> C'mon...
> 
> Final version will be v1.1 and 70$ + once again thousand of articles about the dev, dx12 and not the game?
> 
> That my opinion on all this AotS hype, over and out.
> 
> I sure hope, that the devs (all game devs) start doing games which will work on both PC markets, green and red if it wants to sell in big numbers.


The thing about AotS is not photorealistic graphics, but thousands and thousands of units on screen, wiht each and every single shot being fired an independent lightsource (I also believe the smoke is volumetric, as it is perfectly lit by those shots). And unlike most RTSs today, they don't "cheat" by making a squad of 10-15 people into a single unit. And even with those kinds of units, they crawl to a halt when theres about 100 of them on screen. In AotS, this does not happen thanks to low-level APIs and proper multi-threading


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Clocknut*
> 
> windows 7 market is insignificant when u compared to consoles.


Well, there we have it guys. I guess Overclock.net should just shut down and we all go home and stop caring about PC's because, you know, according to Clock here, PC's "is insignificant when u compared to consoles."


----------



## loveuguys

Ok, finished downloading a copy of aots, run it 5x times I think.
IDK exactly what is going on but the scores on normal/high varied from 53-60 fps, 3x times the score was the same and one time the test crashed.

Can't give credibility to this pre-beta something at the moment... I'm sorry oxide.
I wish there will be a c&c 5 or generals 2 in the future to prove my point.
Quote:


> but thousands and thousands of units on screen


Where in the benchmark?

This is 1000 units:


Whis is 10.000 units:


Interesting how big 1000 units would be on 2 megapixels


----------



## KarathKasun

Unless you encrypt EVERYTHING going out over your net connection (technically impossible with the standardized encryption methods used by the industry due to intentional weaknesses in the ciphers) its easier to snoop there than the OS. Spying works best when you look at the information at the hubs rather than the individual, its already gathered up for you there. It sure sucks that your "clean" Windows 7 ISO cant keep the NSA/CIA/FBI from packet sniffing your net connection and running that data through a pattern recognition suite. Which is what they do to pretty much 100% of net traffic in the states now.

Just like RFID tags. Why would someone want to track RFID tags? Most people already carry their tracking device without being asked to, its called a cell phone.

As for the topic at hand, bring the DX12/Vulkan goodness. I have an old 8 core system that would reap the benefits of multi-threading. Seeing ~20% CPU usage in new games is getting old.


----------



## Digidi

Ok thies thread is going crazy


----------



## spacin9

Quote:


> Originally Posted by *Hattifnatten*
> 
> The thing about AotS is not photorealistic graphics, but thousands and thousands of units on screen, wiht each and every single shot being fired an independent lightsource (I also believe the smoke is volumetric, as it is perfectly lit by those shots). *And unlike most RTSs today, they don't "cheat" by making a squad of 10-15 people into a single unit.* And even with those kinds of units, they crawl to a halt when theres about 100 of them on screen. In AotS, this does not happen thanks to low-level APIs and proper multi-threading


Yes and no. Battle groups in the default "Win" senario... you have no choice but to make spam battle groups that operate pretty much on auto attack. In different Victory scenarios, the game could change dramatically, enabling more of a Sup Com feel, if they choose to..then you could have the real-time to split off different groups and have the time to tailor an attack or defense.

Where Sup Com and Sins battle groups are more for organizational purposes- The fast paced "energy zone capture nodes" creates a victory scenario that requires spam battlegroups. This in of itself is very fun and fast-paced, but those of us who want to turtle and hurl nukes, heavy artillery and Titans/Dreadnaughts might be disappointed. The possibilities of different victory scenarios give hope of a more Sup Com game where you can control details in real time and strategize very intricately. At least that's what I'm hoping they have in mind.

The smoke effects are unfortunate in so far as tac missiles are very strategic in Sup Com, whereas with Ashes... it's just missile spam.

It does seem that it's possible to micro-manage a battle and do it better than Sup Com or Sins in terms of FPS and game speed should they choose to "allow" that.


----------



## Noufel

when does Aots is supposed to launch ?


----------



## Xuper

what's latest version of AoTS? 0.49?


----------



## Remij

Quote:


> Originally Posted by *Xuper*
> 
> what's latest version of AoTS? 0.49?


0.50.12113 for me.


----------



## black96ws6

If true, this again shows the importance of the old "buy more than you think you need" saying.
Quote:


> Originally Posted by *delboy67*
> 
> Sorry I thought it was funny


That is hilarious man, thanks for posting that!


----------



## provost

At least some in Tech Journalistic sites are asking the right questions for the benefit of consumers, their viewers and other constituents
http://forums.anandtech.com/showthread.php?t=2446656
Quote:


> AnandthenMan
> 
> The question is why. Maybe with it fully enabled performance will tank, or maybe Nvidia for some mysterious reason doesn't have full driver support. Which would be odd considering Nvidia has said they have been working on and contributing to DX12 for years. Either way the argument that DX12 doesn't matter at all on current hardware is nonsense.


I am assuming this is the site's namesake reviewer , but someone can correct me?

Let's see how far they and others can go i asking the right questions for all constituents involved.


----------



## Mahigan

Quote:


> Originally Posted by *HiTechPixel*
> 
> My Windows 7 ISO from 2009 is completely clean. Sucks to be stuck with a Windows botnet I guess.
> R9 390X has the same horsepower as a 980, not a 980 Ti. And VRAM Bandwidth has been completely irrelevant as shown by Fury X benchmarks.


I share your concerns HiTechPixel, more than you can ever imagine. Some of my personal hero's are Glenn Greenwald, Edward Snowden, Chelsea Manning, Julian Assange, Jacob Applebaum, Gandhi, MLK, Malcom-X etc
This is why I dual boot my system with Linux. It's why I use email encryption (PGP) etc. I'm not sure if you know this but Microsoft is backdating both Windows 7 and 8/8.1 with the same shady spy software. There was an article about this on Ars Technica not long ago. My only suggestion is, don't use Windows for any private conversations. Don't keep your own collection of private material on your Windows file system. Dual boot.

That's the only way around this issue.


----------



## Mahigan

Quote:


> Originally Posted by *provost*
> 
> At least some in Tech Journalistic sites are asking the right questions for the benefit of consumers, their viewers and other constituents
> http://forums.anandtech.com/showthread.php?t=2446656
> I am assuming this is the site's namesake reviewer , but someone can correct me?
> 
> Let's see how far they and others can go i asking the right questions for all constituents involved.


They are, many are supposed pissed off at all of this. I think it sort of caught them off guard. I think we're going to get some really nice articles on the topic once it blows over. I'm hoping there are more articles which discuss DirectX12 as well as Tier support. There's a lot of confusion around with people claiming that GCN fully supports DX12 (not true) and that nVIDIA doesn't (not true).

The curiosity is there, those who write the articles should be able to attract a lot of viewers to their website.


----------



## alancsalt

Please do not respond to rudeness or trolling. Just report it.
Last few pages cleaned.


----------



## Casey Ryback

Quote:


> Originally Posted by *delboy67*
> 
> Sorry I thought it was funny


I've only seen the hitler reacts vids (AMD fury etc)

The one you posted was gold, That guy's laugh is just crazy.


----------



## Themisseble

Quote:


> Originally Posted by *Noufel*
> 
> when does Aots is supposed to launch ?


next year


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> They are, many are supposed pissed off at all of this. I think it sort of caught them off guard. I think we're going to get some really nice articles on the topic once it blows over. I'm hoping there are more articles which discuss DirectX12 as well as Tier support. *There's a lot of confusion around with people claiming that GCN fully supports DX12 (not true)* and that nVIDIA doesn't (not true).
> 
> The curiosity is there, those who write the articles should be able to attract a lot of viewers to their website.


I hadn't counted 12.1 as part of the base of dx12 but it seems people are claiming it necessary for "full" support now. If they keep adding point levels later on, being fully compliant might never happen.

Is full compliance really inclusive of 12.1?


----------



## Mahigan

Quote:


> Originally Posted by *delboy67*
> 
> Sorry I thought it was funny
> 
> 
> Spoiler: Warning: Spoiler!


Oh Internet... you so funny


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> I hadn't counted 12.1 as part of the base of dx12 but it seems people are claiming it necessary for "full" support now. If they keep adding point levels later on, being fully compliant might never happen.
> 
> Is full compliance really inclusive of 12.1?


It's really a mess... compliance that is. Because neither architecture is fully compliant. There are Tier levels which GCN supports and Maxwell 2 does not. Neither of the two are fully compliant. I think that we need Tech journalists to post articles which explain this. WCCFTech posted a good article on this topic here: http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/


----------



## airfathaaaaa

Quote:


> Originally Posted by *Mahigan*
> 
> It's really a mess... compliance that is. Because neither architecture is fully compliant. There are Tier levels which GCN supports and Maxwell 2 does not. Neither of the two are fully compliant. I think that we need Tech journalists to post articles which explain this. WCCFTech posted a good article on this topic here: http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/


just a slight off topic question

what will be the power consumption of a full dx12 card? 300? 400?


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> just a slight off topic question
> 
> what will be the power consumption of a full dx12 card? 300? 400?


Good question...

I think we'd be looking at Greenland as well as Pascal. We don't know anything about them so far but we do know the fabrication processes they'll be using: 16nm FinFET

My guess is a similar power consumption to today's high end cards but packing a lot more punch. I expect Pascal will be re-introducing hardware based scheduling (I hope). If not then Pascal will likely consume less power than Greenland. Of course there's a price to pay, performance wise, under DX12 if that's the case.


----------



## Klocek001

Quote:


> Originally Posted by *airfathaaaaa*
> 
> just a slight off topic question
> 
> what will be the power consumption of a full dx12 card? 300? 400?


are you genuinely asking ? tdp stays the same.
Oh sry, you were probably asking about the next gen nv/amd.


----------



## NuclearPeace

250W is probably going to be the reasonable limit for the highest end of GPUs as it is now

The 290 and the 290x went over that because AMD ran the Hawaii GPU at some rather high clocks for GCN, throwing it out of its range that it achieves the best efficiency. They could have easily backed off the clocks by just a smidge and had a 250w card, but they went all out to make the statement that they meant business with beating the 780. Besides, lowering it to a 250w TDP wasn't going to make the difference with the terrible reference coolers they came with.

The 980 Ti is going over it because its a massive 600mm2 die and its pretty much the best 28nm can do. Past 250w generally the Titan cooler they've been using since its namesake beings to thermally throttle whatever it's cooling, forcing you to make a more aggressive fan curve or just to take it as is.

The next generation of cards are going to be on 16nm AND have HBM, both of which will save lots and lots of power.


----------



## CrazyElf

One big question that I have is, how much can Nvidia really optimize drivers wise for their existing Maxwell? The problem is hardware, not software, so the gains are probably far more limited. The other of course is that DX12's nature doesn't lend itself very well to being optimized.
Quote:


> Originally Posted by *Clocknut*
> 
> The time when Maxwell has taped out, Nvidia didnt think Async Compute will get so much attention. With the way consoles are having underpowered CPU, Async compute is pretty much guaranteed to be heavily use by game developers because game developer need every ounce of compute they can squeeze out of that small console box.
> 
> Pascal is clearly a Maxwell 3.0 the moment they announced right after Volta, it seems to me at the time Nvidia seems to think AMD will remain uncompetitive & 16nm + Finfet + HBM combo will bring enough performance gain to worth 1 generation of milking, they think there is no need to bring Volta on an immature node. (Intel's tick-tock strategy)
> 
> I just hope Pascal has not completed its design phase b4 all these async compute getting attention. Nvidia have to either live with pascal with no hardware Async compute support or they push back entire Pascal launch and back to drawing board to get pascal have at least some hardware support. Oh well Good luck Nvidia.


The rumor is that Pascal has already taped out. If so, then it may very well be Maxwell 3.0 as you note, but with HBM2. In that case, it may still be a relatively "serial" architecture only with massive bandwidth and NVLink. We will no doubt see power efficiency improvements though from the die shrink and perhaps from anything else Nvidia has done (SMX to SMM was impressive, so perhaps they have SMM to something new?).

In that case, we may have to wait for Volta. But when Nvidia goes full parallel, I expect them to do so in a big way and likely overtake AMD in that regard simply due to the sheer amount of funding they have. We don't have enough information as to what is on Pascal to know. All we know is that it will likely have HBM2.

Quote:


> Originally Posted by *Hattifnatten*
> 
> The thing about AotS is not photorealistic graphics, but thousands and thousands of units on screen, wiht each and every single shot being fired an independent lightsource (I also believe the smoke is volumetric, as it is perfectly lit by those shots). And unlike most RTSs today, they don't "cheat" by making a squad of 10-15 people into a single unit. And even with those kinds of units, they crawl to a halt when theres about 100 of them on screen. In AotS, this does not happen thanks to low-level APIs and proper multi-threading


The closest parallel to it I guess would be Supreme Commander and Forged Alliance.

But this game is better multithreaded, so it should use many cores better. That said, the 5960X benchmarks IIRC show that 8 cores isn't that much better.

The PC Perspective review didn't say if their 5960X was overclocked. Do we by any chance have a 5960X vs 4790K review of Ashes with both CPUs at the same clock (ideally say, 4.5 GHz)? Then we'd be able to see per core scaling.

I suspect that DX12 should be able to offer benefits to the HEDT, but I need further evidence. At this point, we're really limited by one game. In theory, an RTS or 4X ought to benefit the most from such cores, although many players on a large FPS or MMO could as well.

Quote:


> Originally Posted by *Mahigan*
> 
> Good question...
> 
> I think we'd be looking at Greenland as well as Pascal. We don't know anything about them so far but we do know the fabrication processes they'll be using: 16nm FinFET
> 
> My guess is a similar power consumption to today's high end cards but packing a lot more punch. I expect Pascal will be re-introducing hardware based scheduling (I hope). If not then Pascal will likely consume less power than Greenland. Of course there's a price to pay, performance wise, under DX12 if that's the case.


Technically it's a hybrid process, but yeah I think that they will end up with higher power consumption. I think though that a 250-400 mm^2 die is likely first, then only after a 600mm^2 die. The early process probably cannot offer good enough yields.


Spoiler: Warning: Spoiler!







If you look at the gate and fin lengths, it's more like a 20nm with 16nm parts in it.

Intel is the only one with a "true" 14nm if you get what I am saying, which is partly why they had so many difficulties. We seem to be heading to the limits of silicon here and each node is getting more difficult. It's also why Intel delayed 10nm. Whether this represents the end or if III V materials will buy us a bit more time remains to be seen.

I love the idea of a conceptual GPU though. It's interesting where the technology is going. The question I have is, if the node shrinks stop, we'll only have architectural gains - but for how long?

Quote:


> Originally Posted by *NuclearPeace*
> 
> 250W is probably going to be the reasonable limit for the highest end of GPUs as it is now
> 
> The 290 and the 290x went over that because AMD ran the Hawaii GPU at some rather high clocks for GCN, throwing it out of its range that it achieves the best efficiency. They could have easily backed off the clocks by just a smidge and had a 250w card, but they went all out to make the statement that they meant business with beating the 780. Besides, lowering it to a 250w TDP wasn't going to make the difference with the terrible reference coolers they came with.
> 
> The 980 Ti is going over it because its a massive 600mm2 die and its pretty much the best 28nm can do. Past 250w generally the Titan cooler they've been using since its namesake beings to thermally throttle whatever it's cooling, forcing you to make a more aggressive fan curve or just to take it as is.
> 
> The next generation of cards are going to be on 16nm AND have HBM, both of which will save lots and lots of power.


Well there was Fermi, but yeah I think this represents the upper limit as to what is possible. Imo, AMD should have with the 290 and 290X shipped with better reference coolers. Had they done so, they would not have been criticized as much.

But I agree that yeah, 250-300W seems like the upper limit as to what is possible. Perhaps with triple slot coolers a bit more is possible, but even then it's a very aggressive target.

I'd love to see the overclocking headroom and voltage scaling that the 7970 or Kepler had though. Conservative heat consumption on release with massive OC headroom. In the case of Kepler, thanks to Nvidia's Greenlight







, we will need an unlocked BIOS. I really don't like Nvidia's business practices the way they are going - no Titan X custom PCB, limited voltage locks, overpriced Titan X, etc. I'm not fond of AMD's restricting the Fury X either and the Fury Nano seems overpriced too.


----------



## provost

Quote:


> Originally Posted by *CrazyElf*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> One big question that I have is, how much can Nvidia really optimize drivers wise for their existing Maxwell? The problem is hardware, not software, so the gains are probably far more limited. The other of course is that DX12's nature doesn't lend itself very well to being optimized.
> The rumor is that Pascal has already taped out. If so, then it may very well be Maxwell 3.0 as you note, but with HBM2. In that case, it may still be a relatively "serial" architecture only with massive bandwidth and NVLink. We will no doubt see power efficiency improvements though from the die shrink and perhaps from anything else Nvidia has done (SMX to SMM was impressive, so perhaps they have SMM to something new?).
> 
> In that case, we may have to wait for Volta. But when Nvidia goes full parallel, I expect them to do so in a big way and likely overtake AMD in that regard simply due to the sheer amount of funding they have. We don't have enough information as to what is on Pascal to know. All we know is that it will likely have HBM2.
> The closest parallel to it I guess would be Supreme Commander and Forged Alliance.
> 
> But this game is better multithreaded, so it should use many cores better. That said, the 5960X benchmarks IIRC show that 8 cores isn't that much better.
> 
> The PC Perspective review didn't say if their 5960X was overclocked. Do we by any chance have a 5960X vs 4790K review of Ashes with both CPUs at the same clock (ideally say, 4.5 GHz)? Then we'd be able to see per core scaling.
> 
> I suspect that DX12 should be able to offer benefits to the HEDT, but I need further evidence. At this point, we're really limited by one game. In theory, an RTS or 4X ought to benefit the most from such cores, although many players on a large FPS or MMO could as well.
> Technically it's a hybrid process, but yeah I think that they will end up with higher power consumption. I think though that a 250-400 mm^2 die is likely first, then only after a 600mm^2 die. The early process probably cannot offer good enough yields.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> If you look at the gate and fin lengths, it's more like a 20nm with 16nm parts in it.
> 
> Intel is the only one with a "true" 14nm if you get what I am saying, which is partly why they had so many difficulties. We seem to be heading to the limits of silicon here and each node is getting more difficult. It's also why Intel delayed 10nm. Whether this represents the end or if III V materials will buy us a bit more time remains to be seen.
> Well there was Fermi, but yeah I think this represents the upper limit as to what is possible. Imo, AMD should have with the 290 and 290X shipped with better reference coolers. Had they done so, they would not have been criticized as much.
> 
> But I agree that yeah, 250-300W seems like the upper limit as to what is possible. Perhaps with triple slot coolers a bit more is possible, but even then it's a very aggressive target.


Thank you for an intelligent, insightful and unbiased commentary.









As a tongue in cheek comment, if all the educated, informed and critical thinkers are "users" and "members" who the heck is running the tech journalistic sites? Is this the case of tail wagging the dog... lol


----------



## PontiacGTX

Quote:


> Originally Posted by *Paul17041993*
> 
> At the time the first lines of FX were released, it was frequently established that thuban was still just as good and especially in per-core IPC. The main intention for the module system was to allow heaps of high efficiency integer cores that could handle more things at a time, which is ideal for mantle, DX12 and vulkan.


you dont really see this in Ashes of the singularity... there is something that is being used on the AMD CPU/platform that is causing bad performance evne on DX12. or maybe this games doesnt really use moe than 4 cores and the engine that uses the game see this CPU as a quad core


----------



## Anna Torrent

Quote:


> Originally Posted by *loveuguys*
> 
> IMHO
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Comparing this games benchmark to a really big massive Multiplayer in C&C 4 or SC2 or RA3 on a 5 year old machine...
> I just can't figure it out, which visual effects are here so demanding, that a 2015 mid-hig rig with dx12 has problems runing it....
> 
> Watching the benchmark on youtube I really can't figure it out. I expected photorealistic graphics or textures or units or i don't know...
> But in reality ? Hmmm...
> In reality I see a couple more mini units outzoomed like in CIV5 with a unrealisticly awfull lot laser beams and smoke (2D) and a very bad terrain... based on 2015-dx12-very demanding game expectations...
> 
> _I don't care about Nvidia or AMD, I just get the feeling that all this hocus pocus bogus around AotS and DX12 is just a very neat way to promote, publicise and attrack a lot of attention basically to the game itself.... i.e. free marketing - clickbaiting..._
> 
> Last two weeks news was like about thousand of the same garbage articles about Oxide, dx12, nvidia, amd... but nothing real about the game, gameplay or anything more about towards that... for example the high price (u said you choose the easy way to programme) even the benchmarks is not free...
> 
> I hope that maybe someone else can also see through that battle of words and maybe focus on the game & price more.
> 
> Oh and all this started with a v0.49 Pre-beta - you have to pay - to test build ?
> C'mon...
> 
> Final version will be v1.1 and 70$ + once again thousand of articles about the dev, dx12 and not the game?
> 
> That my opinion on all this AotS hype, over and out.
> 
> I sure hope, that the devs (all game devs) start doing games which will work on both PC markets, green and red if it wants to sell in big numbers.


That's a good one. Remember we are dealing with consumption companies - they'll tell you the every card is the best card ever (even if it's a rebrand), they'll tell you for years that it's like real life and so on ans so forth..

AoS does have some differences - it can handle a lot more "free" groups of units, acting on their on and generally, more units +
As you've said, a lot of effects, lasers and so on

These things partially come from lower API overhead, leaving the CPU to do more in the form of more cores are effectively utilized with less CPU bottleneck


----------



## Falknir

Quote:


> Originally Posted by *Hattifnatten*
> 
> The thing about AotS is not photorealistic graphics, but thousands and thousands of units on screen, wiht each and every single shot being fired an independent lightsource (I also believe the smoke is volumetric, as it is perfectly lit by those shots). And unlike most RTSs today, they don't "cheat" by making a squad of 10-15 people into a single unit. And even with those kinds of units, they crawl to a halt when theres about 100 of them on screen. In AotS, this does not happen thanks to low-level APIs and proper multi-threading


They have not really managed to get thousands of units on screen thus far and it's mostly in the hundreds (if you want you can personally consider all the units sitting stationary off-screen being to some extent culled in a benchmark). Also do not recall (obvious) light sources on most weapon FX (while some impact FX obviously do), but there are a few particles FX with an exaggerated gradient that might make it appear that way. Most of the smoke is just a typical particle emitter with(out) particle animation. Most of this stuff was already achieved in-games with DX9 and just having adequate hardware and a decent engine. I have also seen far better execution of exhaust plumes in DX9 games and their mods.

Those other strategy games also have to consider: locomotion animation, transitioning, and syncing (terrestrial locomotion is mostly excluded and simplified for this game); far greater path-finding requirements (compared to this game being entirely hover and air craft it circumvents a lot of it); units being object/terrain aware (some games); on-death ragdoll (physics hit) and wreckage (path finding performance hit).


----------



## Xuper

I understand now why maxwell has low power consumption ,Because nvidia removed an important hardware part of block diagram in order to Destroy AMD.I asked

Quote:


> So It's not like CGN that is Fully hardware?


he Replied.

Quote:


> Originally Posted by *Mahigan*
> 
> Nope. That's where you get the "Performance per Watt" figures of Kepler and Maxwell/2. Hardware schedulers take up a lot of power.
> 
> Things just got a whole lot more interesting


----------



## iLeakStuff

Quote:


> Originally Posted by *Mahigan*
> 
> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.


Translation:
We made a huge noise about nothing. Nvidia might fix Async Compute performance with new drivers.

Sorry AMD fans, looks like the DX12 avantage may be short lived


----------



## Mahigan

Wow,

I see now what TaintedSquirrel was talking about. So many people are spreading such weird FUD. I saw one AMD fan state that nVIDIA can't create ACEs in software (Maxwell 2 has Asynchronous Warp Schedulers)... I then had an nVIDIA fan claiming my original theory was wrong because nVIDIA can Async Compute (original theory was based on nVIDIA being capable of performing Asynchronous Compute with 1 Graphic and 31 Compute queues)...

*facepalm*

Tech Journalists... where are you?

Quote:


> Originally Posted by *iLeakStuff*
> 
> Translation:
> We made a huge noise about nothing. Nvidia might fix Async Compute performance with new drivers.
> 
> Sorry AMD fans, looks like the DX12 avantage may be short lived


nVIDIA will try and get their Async Compute working. They borked the software side of their Scheduling in their drivers. That being said... we're back to the original theory which was corroborated by David Kanter last night on Tech Report.

If anyone was honest, it was the AMD PR guy when he stated that Maxwell 2 relied on "Slow context switching" for pre-emption see here: http://wccftech.com/preemption-context-switching-allegedly-best-amd-pretty-good-intel-catastrophic-nvidia/

So it all depends on how much use a developer makes out of Asynchronous Compute. GCN is far more robust at handling Asynchronous Compute tasks.


----------



## ku4eto

Quote:


> Originally Posted by *iLeakStuff*
> 
> Translation:
> We made a huge noise about nothing. Nvidia might fix Async Compute performance with new drivers.
> 
> Sorry AMD fans, looks like the DX12 avantage may be short lived


No actually. This would mean that the so said FIX will actually put extra load on your CPU. And unless you got your hands on some hefty high-end system, you will be getitng pulled back by this.


----------



## iLeakStuff

Quote:


> Originally Posted by *ku4eto*
> 
> No actually. This would mean that the so said FIX will actually put extra load on your CPU. And unless you got your hands on some hefty high-end system, you will be getitng pulled back by this.


Not when DX12 improve CPU overhead greatly over DX11.

I also like to see some documentation that the AOTS drivers from Nvidia (that havent been tested yet) use the CPU so much that you will get a performance hit.


----------



## Mahigan

Quote:


> Originally Posted by *ku4eto*
> 
> No actually. This would mean that the so said FIX will actually put extra load on your CPU. And unless you got your hands on some hefty high-end system, you will be getitng pulled back by this.


That will be interesting to see. While Ashes of the Singularity makes mild use of Asynchronous Compute, which Maxwell 2 ought to be able to handle with this new driver enabling their 1 Graphic and 31 Compute Async capabilities, it also maxes out the CPU cores leaving little room for driver overhead. Oxide will have to work some magic with nVIDIA. Given that Ashes of the Singularity is GPU bottlenecked, they just might be able to pull it off.


----------



## Mahigan

Quote:


> Originally Posted by *iLeakStuff*
> 
> Not when DX12 improve CPU overhead greatly over DX11.
> 
> I also like to see some documentation that the AOTS drivers from Nvidia (that havent been tested yet) use the CPU so much that you will get a performance hit.


Software scheduling since Kepler. A large part of the nVIDIA Kepler/Maxwell and Maxwell 2 architectures relies on Software Scheduling. Anything done in software uses up CPU cycles.

GCN does all of its scheduling in hardware as did Fermi.


----------



## airfathaaaaa

Quote:


> Originally Posted by *iLeakStuff*
> 
> Translation:
> We made a huge noise about nothing. Nvidia might fix Async Compute performance with new drivers.
> 
> Sorry AMD fans, looks like the DX12 avantage may be short lived


do you have insight info somehow?

i also wanna know about the older nvidia cards fermi kepler


----------



## iLeakStuff

Quote:


> Originally Posted by *Mahigan*
> 
> Software scheduling since Kepler. A large part of the nVIDIA Kepler/Maxwell and Maxwell 2 architectures relies on Software Scheduling. Anything done in software uses up CPU cycles.
> 
> GCN does all of its scheduling in hardware as did Fermi.


Yes that is probably all true, but nobody can claim that the driver will harm the system performance vs GCN when nobody have tested said driver in DX12 yet?


----------



## Mahigan

Quote:


> Originally Posted by *iLeakStuff*
> 
> Yes that is probably all true, but nobody can claim that the driver will harm the system performance vs GCN when nobody have tested said driver in DX12 yet?


At this point, we can only speculate, not outright claim. If it were a claim... I'd write a theory around the issue. I'm not doing that. I'm waiting to see what Oxide and nVIDIA end up with because, quite frankly, there are far too many variables at play which are not dependent on the GPU architecture. Since I am neither a developer or a programmer... I'm not at all comfortable with developing any concrete theories on this subject (subject being performance under Ashes of the Singularity relative to the mild usage of Async Compute and the driver overhead from nVIDIA).

We got what we wanted, nVIDIA responded and is addressing the issue. Now it's wait and see imo.

As for preemption and slow context switching, this is now industry-wide knowledge. We can discuss this and feel quite confident that we're not spreading FUD. What I mean by this is heavy usage of Async Shading.

I mean we have to put this into perspective as well:


My 2 cents.


----------



## Shivansps

Quote:


> Originally Posted by *Mahigan*
> 
> Software scheduling since Kepler. A large part of the nVIDIA Kepler/Maxwell and Maxwell 2 architectures relies on Software Scheduling. Anything done in software uses up CPU cycles.
> 
> GCN does all of its scheduling in hardware as did Fermi.


Well, that whould be easy to test since Fermi seems to support DX12 as well, it just cant compute as many threads as Maxwell.

Still, this was ignored many times, but what about just sending the compute task to a DX12 igp? we all have those igp wasting space on there that we can use.


----------



## FastEddieNYC

Quote:


> Originally Posted by *iLeakStuff*
> 
> Yes that is probably all true, but nobody can claim that the driver will harm the system performance vs GCN when nobody have tested said driver in DX12 yet?


List an example where a software solution performed as well or better than hardware.
I'm sure Nvidia has their entire driver team working on this and will only comment after they can demonstrate improved performance and say here, we can do Async, and sidestep the fact that while designing Maxwell to improve power efficiency they chose not to incorporate a hardware scheduler.


----------



## semitope

Quote:


> Originally Posted by *Mahigan*
> 
> It's really a mess... compliance that is. Because neither architecture is fully compliant. There are Tier levels which GCN supports and Maxwell 2 does not. Neither of the two are fully compliant. I think that we need Tech journalists to post articles which explain this. WCCFTech posted a good article on this topic here: http://wccftech.com/nvidia-amd-directx-12-graphic-card-list-features-explained/


I think the point of differentiation is whether or not 12.1 is considered necessary for full dx12 support. Since those features can be done through software or other means, I would say not. For next gen graphics the addition would be more efficient support of these features and more power. The current cards offer support for the features as dictated by Microsoft



https://msdn.microsoft.com/en-us/library/windows/desktop/mt186615(v=vs.85).aspx

Also details on how each architecture supports the feature and how well. Sites have tried to make these articles but hardware details might be lacking. eg. the current situation with async on nvidia and them not being forthcoming with information.


----------



## infranoia

Quote:


> Originally Posted by *spacin9*
> 
> The fast paced "energy zone capture nodes" creates a victory scenario that requires spam battlegroups. This in of itself is very fun and fast-paced, but those of us who want to turtle and hurl nukes, heavy artillery and Titans/Dreadnaughts might be disappointed.


I hope they fix this. Hopefully it's not full-on generate-and-attack streams of units. Sins can get that way a bit when your early game homeworld is getting trounced, but I prefer turtling, it makes the end game far more epic if you're entrenched.


----------



## STEvil

Anyone remember Software T&L vs Hardware T&L?


----------



## Paul17041993

Quote:


> Originally Posted by *iLeakStuff*
> 
> Not when DX12 improve CPU overhead greatly over DX11.
> 
> I also like to see some documentation that the AOTS drivers from Nvidia (that havent been tested yet) use the CPU so much that you will get a performance hit.


manipulating buffers in CPU == latency before the actual call


----------



## Vesku

Software changes won't be able to reduce the latency involved in the actual hardware communication between CPU and GPU nor remove the hardware limitation revealed by Nvidia themselves that they can't pre-empt a draw call but must wait until it finishes.

From an Nvidia presentation on their VR Direct:
Quote:


> Fermi, Kepler, and Maxwell GPUs - basically GeForce GTX 500 series and forward - all manage multiple contexts by time-slicing, with draw-call preemption. This means the GPU can only switch contexts at draw call boundaries! Even with the high-priority context, it's possible for that high-priority work to get stuck behind a long-running draw call on a normal context. If you have a single draw call that takes 5 ms, then async timewarp can get stuck behind it and be delayed for 5 ms, which will mean it misses vsync, causing stuttering or tearing.
> 
> On future GPUs, we're working to enable finer-grained preemption, *but that's still a long way off.*]


Bolding added by me.

Now I'm reasonably confident that Maxwell 2 won't take a nose dive on most DX12 and Vulkan games compared to DX11 implementations. They just may not gain much benefit either while it's looking more and more that AMD GCN GPUs will be reaping a lot more performance gains going with DX12 and Vulkan.

The "that's still a long way off" has me thinking AMD will be better for VR and to a lesser degree DX12/Vulkan at least until Volta. Should have a better indication after the first wave of DX12/Vulkan games.


----------



## DarkBlade6

Quote:


> Originally Posted by *provost*
> 
> May be I am missing something, but isn't the Guru3d article referencing OC.Net and this thread?
> I haven't read what this poster linked, but I found it ironic that you didn't take your own advice, as above...


The quoted OCN part is just to explain how Maxwell *will* handle ACE. But the upper part of the article is the most interesting part. It just state that Nvidia has Not fully implemented ACE via drivers and they are actively working on it with Oxide.
Quote:


> We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more.


----------



## provost

Quote:


> Originally Posted by *DarkBlade6*
> 
> The quoted OCN part is just to explain how Maxwell *will* handle ACE. But the upper part of the article is the most interesting part. It just state that Nvidia has Not fully implemented ACE via drivers and they are actively working on it with Oxide.


may be I was dreaming, but I believe that the Oxide developer did make a detailed post here stating the same? Lol

or perhaps they got their source from Oxide too.....


----------



## mav451

Guru3D having to reference _this very thread_ as a source is a reflection on how slow tech journalism is









I think there is a bit of benefit in users being ahead of the game, but at the same time, people who only rely on tech sites for information will be days to weeks behind on developments. Very much like print vs online vs twitter in release immediacy.

I was right that they would just sit back and rely on users to do the heavy lifting for them


----------



## Mahigan

Quote:


> Originally Posted by *charlievoviii*
> 
> GAME OVER AMD ONCE AGAIN. This is why you Don't jump into conclusion. Waiting for AMD excuses this time.
> 
> http://www.guru3d.com/news-story/nvidia-will-fully-implement-async-compute-via-driver-support.html


Uhhh,

We were the first ones to find out... It's funny when people post links whose source is us and they exclaim "Told ya!!! har har har". You're not the first to do this. Probably won't be the last. When the dust settles people will calm down and figure out that what's been discussed here, though at first on shaky grounds, was pretty damn accurate by the end of it.


----------



## Mahigan

Quote:


> Originally Posted by *mav451*
> 
> Guru3D having to reference _this very thread_ as a source is a reflection on how slow tech journalism is
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I think there is a bit of benefit in users being ahead of the game, but at the same time, people who only rely on tech sites for information will be days to weeks behind on developments. Very much like print vs online vs twitter in release immediacy.
> 
> I was right that they would just sit back and rely on users to do the heavy lifting for them


I was hoping that wouldn't be the case... :/ Where's Anand when you need him? Where's Thomas Pabst? I think Joel is working on something and I'm eager to see what that is. He always covers this stuff and does a great job.

Wait... they quoted me too... *facepalm*. I've got enough hate being leveled at me already. I was hoping for something which was well researched. I guess with nVIDIA being silent about their implementation... I can't blame the tech journalists at this point.

Where is nVIDIA?


----------



## SimBy

Quote:


> Originally Posted by *Mahigan*
> 
> Uhhh,
> 
> We were the first ones to find out... It's funny when people post links whose source is us and they exclaim "Told ya!!! har har har". You're not the first to do this. Probably won't be the last. When the dust settles people will calm down and figure out that what's been discussed here, though at first on shaky grounds, was pretty damn accurate by the end of it.


He probably never read nor understood this thread. And he still has no clue.


----------



## spacin9

Quote:


> Originally Posted by *Falknir*
> 
> Those other strategy games also have to consider: locomotion animation, transitioning, and syncing (terrestrial locomotion is mostly excluded and simplified for this game); far greater path-finding requirements (compared to this game being entirely hover and air craft it circumvents a lot of it); units being object/terrain aware (some games); on-death ragdoll (physics hit) and wreckage (path finding performance hit).


This is very interesting and a good observation. In Sup Com also destroyed units can be reclaimed as resources, also complicating calculations. Lack of naval units simplifies it also.


----------



## Kpjoslee

I already made several post that it is too early to make any conclusive judgement......yep. We did get really deep into AMDs and Nvidia's architecture so it has been a good read nonetheless.
We will get the clearer picture in Q1 of next year if AMD's hardware based async will have an upper hand.


----------



## Stewox

Well the bottom line is the difference is huge with the new APIs, just go back to my posts year ago, i understood the differences will be huge, not just 5-10 % like some were saying, mantle might not be a replacement for DX but it wasn't even planning to be one, the noobs were spreading that and a lot of trendy college-kid developers spread their BS opinions all over the web, I think mantle is the thing that might be perfect for advanced simulation, super low latency, advanced developer editors, and other enterprise solutions.

Indeed, go look at those posters from almost 2 years ago how the stupid teenagers spammed everywhere how mantle will only going to only be max 10%, haha, i talked to some developers even, indie ones, they had no idea what GPU drivers actually do and didn't understand that it's all hacks on the PC where devs didn't had a lot of control over the hardware.

But I haven't been active around this topic since then, i did poke an eye a few times, did watch the AMD GPU stuff around E3 and did notice this "early win for AMD" article, and I actually got a new GPU, not really up there, but it has DX12 and I put win10 on another HDD so I might play with that when I take some time.


----------



## 47 Knucklehead

So bottom line is, with regards to Oxide, Ashes of Singularity, and nVidia Async Compute .... by the time that Oxide ACTUALLY releases the game, nVidia will have a driver that supports Async Compute.

Is that about right?


----------



## ku4eto

Quote:


> Originally Posted by *47 Knucklehead*
> 
> So bottom line is, with regards to Oxide, Ashes of Singularity, and nVidia Async Compute .... by the time that Oxide ACTUALLY releases the game, nVidia will have a driver that supports Async Compute.
> 
> Is that about right?


The level of support is questionable although. And it will be only for this game only, meaning all games will need a driver update if their Async Compute is made in a way, that it tries to run forced on nVidia (no options to run the game without Async Compute off) or the lieks of it (reduce amount of Async). This should put nVidia a bit back. Also, i want to know how much a quad core CPU will suffer from this "Async compute" by nVidia. Even if DX12 lowers ovehead, this is a bit different matter.


----------



## Vesku

Quote:


> Originally Posted by *47 Knucklehead*
> 
> So bottom line is, with regards to Oxide, Ashes of Singularity, and nVidia Async Compute .... by the time that Oxide ACTUALLY releases the game, nVidia will have a driver that supports Async Compute.
> 
> Is that about right?


That entirely depends on what Nvidia can manage to do via software.

Software changes won't be able to reduce the latency involved in the actual hardware communication between CPU and GPU nor remove the hardware limitation revealed by Nvidia themselves that they can't pre-empt a draw call but must wait until it finishes.


----------



## provost

Quote:


> Originally Posted by *sugarhell*
> 
> The only sure after this story:
> Nvidia will release pascal with a dedicated compute engines
> 
> The movement to a more compute based pipeline instead of pure rasterization is a good thing


In addition, assuming what you have stated is true, which I don't know, Nvidia should be providing some visibility and guidance on this. When I stated that I like that poster from Guru3d's sincerity and honesty, it was particularly in reference to his candid comments about Nvidia not providing a detailed and coherent response.

This issue potentially impacts a broader audience, and thus, lack of details from Nvidia is a bit curious.


----------



## GorillaSceptre

Quote:


> Originally Posted by *47 Knucklehead*
> 
> So bottom line is, with regards to Oxide, Ashes of Singularity, and nVidia Async Compute .... by the time that Oxide ACTUALLY releases the game, nVidia will have a driver that supports Async Compute.
> 
> Is that about right?


Maybe. That's what we're waiting to hear.

But how often does a software implementation beat a superior hardware one?

Post edited by Blitz6804 to remove response to flame bait.


----------



## Mahigan

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Also, since the benchmarks have shown that the GTX 980Ti and the Fury X are within 1 FPS (often only 0.7FPS) on the CURRENT benchmark, I take it this means that after this fix, this means that the GTX 980Ti will be BEATING the Fury X now too?
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> I mean at 1080p, the 980Ti already beats the Fury X by 0.6FPS, and at 4K, it only is 0.7FPS slower. So since this whole Async Compute thing is "so damn important", that means that once it is enabled for nVidia, the 980Ti should be stomping all over the Fury X, right?


Not necessarily, possible but not necessarily. Fury-X is also being held back and the Oxide developer mentioned that he is working on that bottleneck as well. As I've stated many times, Ashes of the Singularity makes mild usage of Asynchronous compute according to the Oxide developer.

We may not see performance gains from the GTX 980 Ti with the feature turned on, we may see some mild performance gains. Who knows?

But this is what concerns me:


Spoiler: Warning: Spoiler!


----------



## Stewox

See, nvidia was good this past years because they had their stuff optimized for DX11 serialized pipeline, and AMD began on asyncronous and multi-threaded in the hardware earlier but the APIs weren't sufficently modernized until now.

It's all an API and Driver war, that's what it always was, it was never a real and fair GPU-hardware war.

2 very similarly skilled football teams
A team is one side of football field with a bit longer grass
B Team is on the other one with hard sand
the game starts and goes similarly
one team has to deal with lowered mobility by longer gras
the other one has to deal with hard surface, falls are harder
so far so balanced
oops here comes the rain ...
the sand turns into mud and the B team players sink in 30 cm / 1 feet
but the grass is just wet, it's a bit slippery but still going okay

mtv zombie crowd with their google glasses drunk and high (so they don't really see the big picture): buuu B team sucks they're so bad aaabahawhhaaa
A team keeps bragging how good they are ...
But that's probably not the best comparison

The point is, drivers have to do a lot via hacks what developers should do themselfs. Because of the API limitations, the driver has to instruct the GPU what do to when, PER-SCENARIO ... this means, PER-OS, PER-GAME, PER-GPU, PER-CONFIG, PER-SETTINGS, PER-EVENT, .... it's a mess. Obviously one company can't manage to do hacks for so many companies and so many games, it's an incredibly inefficient way of doing things, especially when they aren't physically close it goes all over email/phone.


----------



## Hattifnatten

Going a bit off the side here, and maybe it has already been aswered.
Is it possible that driver/software-scheduling is the source of Nvidias DX11 performance?


----------



## sugarhell

Quote:


> Originally Posted by *provost*
> 
> In addition, assuming what you have stated is true, which I don't know, Nvidia should be providing some visibility and guidance on this. When I stated that I like that poster from Guru3d's sincerity and honesty, it was particularly in reference to his candid comments about Nvidia not providing a detailed and coherent response.
> 
> This issue potentially impacts a broader audience, and thus, lack of details from Nvidia is a bit curious.


Oh this happens for years.

With amd at least you can download their ISA documents so you can get a bit high level info about their gpus


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> Not necessarily, possible but not necessarily. Fury-X is also being held back and the Oxide developer mentioned that he is working on that bottleneck as well. As I've stated many times, Ashes of the Singularity makes mild usage of Asynchronous compute according to the Oxide developer.
> 
> We may not see performance gains from the GTX 980 Ti with the feature turned on, we may see some mild performance gains. Who knows?
> 
> But this is what concerns me:
> 
> 
> Spoiler: Warning: Spoiler!


I dont think that is the bad thing especially AMD is more desperate side to win back some of the marketshare. They seriously needs better marketing and more sponsored titles this year and the next.


----------



## Remij

Quote:


> Originally Posted by *Mahigan*
> 
> Not necessarily, possible but not necessarily. Fury-X is also being held back and the Oxide developer mentioned that he is working on that bottleneck as well. As I've stated many times, Ashes of the Singularity makes mild usage of Asynchronous compute according to the Oxide developer.
> 
> We may not see performance gains from the GTX 980 Ti with the feature turned on, we may see some mild performance gains. Who knows?
> 
> But this is what concerns me:
> 
> 
> Spoiler: Warning: Spoiler!


There's always something for you to latch on to, isn't there?









Anyway, it wouldn't surprise me, as DICE, OXiDE and whoever else, already had AMD partnerships... it just makes sense that they are AMD partners continuing to push AMD tech. Obviously AMD tech performs better under DX12, and of course AMD is gonna push that agenda with their partners since they obviously have the most to gain from it... however it actually doesn't mean anything about Nvidia's upcoming DX12 performance. Which so far has proven to be on par despite deficiencies.

Post edited by Blitz6804 to remove profanity.


----------



## Mahigan

Quote:


> Originally Posted by *Remij*
> 
> There's always something for you to latch on to, isn't there?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Anyway, it wouldn't surprise me, as DICE, OXiDE and whoever else, already had AMD partnerships... it just makes sense that they are AMD partners continuing to push AMD tech. Obviously AMD tech performs better under DX12, and of course AMD is gonna push that agenda with their partners since they obviously have the most to gain from it... however it actually doesn't mean anything about Nvidia's upcoming DX12 performance. Which so far has proven to be on par despite deficiencies.
> 
> The truth remains though, if Nvidia somehow come out of this ahead... AMD is pretty much F-ed.


It concerns me. Not as something which is positive... rather how this can be used...

If nVIDIA used GameWorks and Tessellation to cripple AMD performance (wasn't huge but it was noticeable) then these partnerships, with Async Compute, could be used for the same reason. If there's one thing I don't like it's this sort of conduct.


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> AMD was correct when they stated that nVIDIA relied on slow context switching. So I don't see how they could incur damage. AMD also has the majority of game developers lined up for 2015/2016. But AMD is no saint. We all can agree on that. I think that some of the things their PR folks said only inflamed the situation. Maybe they'll eat flakk for that.
> 
> If nVIDIA fix this issue... that's a win. But it still doesn't tell us about what will happen when heavier Async titles arrive. With the developers and titles AMD has lined up, DX12, on the onset, may provide a more even competitive market. That is until Pascal and Greenland hit. Who knows then.


By the time heavier ASC titles come out, new hardware will be out. As it has happen time and time again, as new technology has come out. I don't think anyone is worried about up and coming hardware, the entire discussion is around current hardware.

All Nvidia needs to do is show that not only can they keep up with AMD's newest hardware, in the latest API (as they are now), but actually beat them. Even if Nvidia pulls ahead only marginally, they still can claim _"Our older hardware beats AMD's newest hardware, on the newest API"._ AMD can't market against that, especially when they are trying to ask the same money for their hardware.

Then you have this entire debate we have been having on top of that. There will be a whole segment of people that will lash out at Oxide and AMD if Nvidia pulls it out. If not for anything but the fact that there was early confusion around things working or not for Nvidia. Words were exchanged, and in the end, it has turned out things haven't worked right with Nvidia. So now Nvidia is working on a fix, which may show them pull back ahead.

It would be terrible for AMD and their image if that happens.


----------



## Remij

Quote:


> Originally Posted by *Mahigan*
> 
> It concerns me. Not as something which is positive... rather how this can be used...
> 
> If nVIDIA used GameWorks and Tessellation to cripple AMD performance (wasn't huge but it was noticeable) then these partnerships, with Async Compute, could be used for the same reason. If there's one thing I don't like it's this sort of conduct.


Of course AMD is going to be pushing for it. Unfortunately when Nvidia changes their architecture for better parallelism things will be back to square one, because Nvidia will always be pushing their proprietary libraries and technologies.

It makes sense for Nvidia to do so as they have a lot invested in their own technologies to keep their hardware relevant. And I have a hard time believing that AMD wouldn't love to be able to do what Nvidia does.


----------



## NoirWolf

Quote:


> Originally Posted by *Remij*
> 
> Of course AMD is going to be pushing for it. Unfortunately when Nvidia changes their architecture for better parallelism things will be back to square one, because Nvidia will always be pushing their proprietary libraries and technologies.
> 
> It makes sense for Nvidia to do so as they have a lot invested in their own technologies to keep their hardware relevant. And I have a hard time believing that AMD wouldn't love to be able to do what Nvidia does.


Assuming Pascal is capable of it more than Maxwell... considering the first prototypes for Pascal were being flaunted by Nvidia's CEO last year and rumours in June said Pascal had tapped out, which would be in line with their supposed launch window which should be in a year at most, that means either Nvidia's more Async capable architecture is already barrelling down the pipe like a bat out of Hell or... we're getting Maxwell 3.0 with HBM tacked on (considering the node shrink and the new memory I doubt they'd rock the boat significantly with Pascal from Maxwell 2.0... I mean when you think about it this thing tapped out before a game related benchmark using DX 12 popped up, besides those in last winter which showed the 290x throttling a Titan X, whoopsie, in certain respects, and it was being showed around last year so the odds this thing has GCN 1.0 level Async support is low... which means Pascal will need some special driver sauce to get over its inherent glass heel... with Nvidia's money it is possible but I really do hope they planned ahead cause I don't think Nvidia can really take a year without a decent DX12 counter after their latest stunts... the Kepler users in particular will probably jump ship en mass).


----------



## provost

Quote:


> Originally Posted by *NoirWolf*
> 
> Assuming Pascal is capable of it more than Maxwell... considering the first prototypes for Pascal were being flaunted by Nvidia's CEO last year and rumours in June said Pascal had tapped out, which would be in line with their supposed launch window which should be in a year at most, that means either Nvidia's more Async capable architecture is already barrelling down the pipe like a bat out of Hell or... we're getting Maxwell 3.0 with HBM tacked on (considering the node shrink and the new memory I doubt they'd rock the boat significantly with Pascal from Maxwell 2.0... I mean when you think about it this thing tapped out before a game related benchmark using DX 12 popped up, besides those in last winter which showed the 290x throttling a Titan X, whoopsie, in certain respects, and it was being showed around last year so the odds this thing has GCN 1.0 level Async support is low... which means Pascal will need some special driver sauce to get over its inherent glass heel... with Nvidia's money it is possible but I really do hope they planned ahead cause I don't think Nvidia can really take a year without a decent DX12 counter after their latest stunts... the Kepler users in particular will probably jump ship en mass).


Yes, it is about the future, and not the current cards, and this what CrazyElf was eluding to earlier as well. Now, all we need is a confirm from Nvidia, or make an educated guess... lol

Maxwell is in the bag, what is coming down the pike is more relevant ...


----------



## semitope

Keep in mind, implementing it more completely in driver doesn't mean its going to perform well. It could just perform less terribly.


----------



## Remij

Just because people are still on Kepler doesn't mean that they aren't used to the fact that Nvidia introduces and phases out older tech quicker than AMD. Maybe those same people are interested in seeing how Nvidia respond instead of rushing out to buy (old 290x/390x) AMD gpus.

But I think you're being crazy if you think Pascal isn't going to be amazing for DX12.

I'm not worried in the slightest since the best AMD has proven is that they can hang with the best of Nvidia in DX12.









We'll see shortly though.









Post edited by Blitz6804 to remove profanity.


----------



## Mahigan

Quote:


> Originally Posted by *NoirWolf*
> 
> If I remember right: Nvidia in those benchmark scores is basically running DX11 not DX12 even when you shift to DX12 because, at least by what I understood (I read the whole thread so I may be misremembering here, feel free to collect): The game has Async disabled for Nvidia GPUs and it gets its own special code while AMD is running bog standard with its drivers. Come to think of it: What is the difference in AotS for Nvidia between DX11 and DX12?


*Don't quote me on this, don't plaster this on some website or some forum. It's ALL speculation on my part, since that's what we appear to be doing. None of this is fact, just based on the industries direction*

A few things we need to understand. Ashes of the Singularity, currently, runs 30% of its engine in compute. Oxide's developer mentioned that they're aiming for 50% of their engine to be running in compute. He has also hinted about moving the triangle operations into the compute realm. Many other developers, think Dreams, are heading in that direction. Compute appears to be where DX12 is headed. It's not just in one title... many titles down the road have this in mind. Developers are talking about moving almost everything into the compute realm by a year or two's time (check out youtube developer talks).

What this means is that Compute just got a whole lot more important with DX12. *Parallelism* just got a whole lot more important as a result. We're not even talking Asynchronous Compute, that's a bonus ontop of it all, by using breaks in between executions in order to throw even more eye candy onto the screen. We ought to keep this in mind (think Fury-X's theoretical compute figures compared to the competition).

Now back onto the topic at hand, nVIDIA was already able to tap into most of their compute capabilities, with Maxwell 2, under DX11. Therefore Async Compute may not afford them as large of a bump as it does to GCN. If nVIDIA can fix their driver and offload the Post Processing effects onto the Asynchronous Compute pipeline *they ought to get a boost*. How large of a boost? I don't know. That being said, the Oxide developer also hinted at Fury-X getting more optimizations (and I see it happening with his talk on moving 50% of the engine into compute). I firmly believe that by the end of this, at least in Ashes of the Singularity, the Fury-X will be far ahead of the other cards (GTX 980 Ti/390x/290x). Of course I can't prove this... but that's what I'm getting from what's been said so far.

Another bonus from moving various aspects into compute is that you save on memory bandwidth, you save on memory usage as well. If you were to ask my opinion, this is how it will play out. nVIDIA will get Asynchronous Compute working... small bump, engine will then throw more stuff into compute and Fury-X will get a, perhaps substantial, bump. By the end of this we ought to see competition between a 390x and a GTX 980 Ti (GTX 980 Ti will probably come out on top) with Fury-X beating them both under Ashes of the Singularity. That's my take on things as pure speculation.

This may not be the same for every DX12 title, but seeing as AMD has partnered with the majority of titles releasing in 2015/2016 under DX12... things could be very different, in a short while, than they are now under DX11. I'm expecting a whole new ball game for DirectX 12.

Dreams: http://www.theverge.com/2015/6/15/8786889/dreams-game-trailer-ps4-e3-2015


----------



## Clocknut

It wouldnt make a nose dive performance, but definitely it will affect Nvidia performance.

Probably we will see maxwell become just like Kepler, dropping 1 tier performance across the board. For ex. I wont be surprise a 390X sitting on top of 980, while the 980 will be struggle to be on par with 390-non X. The same scenario happen like 290X/290 vs 780Ti


----------



## pengs

Quote:


> Originally Posted by *Mahigan*
> 
> It concerns me. Not as something which is positive... rather how this can be used...
> 
> If nVIDIA used GameWorks and Tessellation to cripple AMD performance (wasn't huge but it was noticeable) then these partnerships, with Async Compute, could be used for the same reason. If there's one thing I don't like it's this sort of conduct.


AMD can thank NV for the lesson in proprietary technologies and marketing. It's apparently worked pretty well for NVIDIA so obviously AMD is going to follow suit and what a better time to do it.

It's a wash with AMD's market share, they are in need of the gain so I'd let this be and maybe in a year or two NVIDIA will be changing GameWork's name to FreeWorks, following suit.


----------



## Mahigan

Quote:


> Originally Posted by *Clocknut*
> 
> It wouldnt make a nose dive performance, but definitely it will affect Nvidia performance.
> 
> Probably we will see maxwell become just like Kepler, dropping 1 tier performance across the board. For ex. I wont be surprise a 390X sitting on top of 980, while the 980 will be struggle to be on par with 390-non X. The same scenario happen like 290X/290 vs 780Ti


Don't under estimate nVIDIAs ability to perform driver voodoo as it pertains to their scheduler. I'm pretty sure they'll get it fixed.


----------



## infranoia

The biggest question for me at this point is whether async scheduler software improvements can improve their preemption latency for VR. It's a related issue I'm sure, as Nvidia improved their power efficiency over AMD by simply ripping out the hardware scheduler.

http://www.dsogaming.com/news/oculus-employees-preemption-for-context-switches-is-best-on-amd-nvidia-possibly-catastrophic/


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Don't under estimate nVIDIAs ability to perform driver voodoo as it pertains to their scheduler. I'm pretty sure they'll get it fixed.


In addition, they have the resources to send engineers to the developers and help then develop Nvidia only code paths if they have to (as they've done with AotS. That may not help them with heavy DX12 titles that may come out late next year, but I'd imagine they could find a way to limit their downside in the near term. They did it with AotS, the can most likely do it with other games, especially considering Gameworks titles will probably not use much async in the first place (and probably will use things that AMD struggles with).


----------



## SimBy

Quote:


> Originally Posted by *Mahigan*
> 
> It concerns me. Not as something which is positive... rather how this can be used...
> 
> If nVIDIA used GameWorks and Tessellation to cripple AMD performance (wasn't huge but it was noticeable) then these partnerships, with Async Compute, could be used for the same reason. If there's one thing I don't like it's this sort of conduct.


I doubt AMD has much say in this. It's not like Gameworks. It also never was the AMD way.

It's up to console devs since most games are console ports anyway. Some games will probably rely on AC a lot, some less but not because AMD says or wants so.

XB1 being weaker than PS4 kinda plays in AMDs hands. They will have to use every trick in the book to make games look as close to PS4 as possible. It's also DX12 so I expect ports should be trivial now.


----------



## FLaguy954

Quote:


> Originally Posted by *PostalTwinkie*
> 
> *By the time heavier ASC titles come out, new hardware will be out.* -snip-


This is not true. We have Gears of War - Ultimate Edition (this is DX12 but I don't think they'll be using async shaders), Mirror's Edge: Catalyst, Rise of the Tomb Raider, Fable Legends, and Deus Ex: Mankind Divided all coming out within the next six months (not to mention any random DX12 game in between). This is months before the new round of hardware releases from both AMD and Nvidia begins.

We will have an accurate picture of where both GPU vendors stand in regards to DX12 long before Pascal or Greenland are released.


----------



## CrazyElf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> So bottom line is, with regards to Oxide, Ashes of Singularity, and nVidia Async Compute .... by the time that Oxide ACTUALLY releases the game, nVidia will have a driver that supports Async Compute.
> 
> Is that about right?


Not necessarily.

IF Mahigan's hypothesis is correct (and it remains a big "if", but the evidence is supporting his hypothesis so far), then there's only so much that can be done with DX12 driver optimizations. The problem Nvidia faces is hardware. AMD's GCN was simply designed to be more parallel due to architectural reasons.

In other words, I expect Nvidia Maxwell GPU owners to benefit from a driver optimization. But because of the way GCN is designed, I expect AMD GPUs running GCN to benefit from any software optimizations even more.

That is likely to change in the coming generations. Whether it comes from Pascal or Volta I do not know. Pascal was however very heavily Compute-oriented, so it's certainly possible that it is vastly more parallel. Nvidia still has huge domination in market and mind-share. Plus they have a lot more money to throw at the problem.

There is one more thing I am going to note and I think it is worthy to draw attention to. You speak as if an Nvidia monopoly is a good thing. Yes, AMD has made a lot of really bad business decisions. But no, we are not any better if AMD goes bankrupt tomorrow. We could end up in a permanent monopoly in the GPU space with Nvidia. Even if we do not, it could take years for a viable GPU competitor to get established. GPUs are very, very complex and the amount of technical expertise to design them is overwhelming. Meanwhile during the monopoly, Nvidia is basically free to charge whatever they wish or do what they wish. And that is assuming a competitor even arises - big "if". You think that the Titan prices are bad? What will happen if Nvidia has a monopoly?

Most of us don't own Nvidia stock. I would urge everyone to recognize that we are best served by having at least 2 and ideally more, GPU companies. I wish that AMD could challenge Intel on the CPU front as well too. Again, it would take years to get any competitor to challenge Intel if AMD goes under.

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> *Don't quote me on this, don't plaster this on some website or some forum. It's ALL speculation on my part, since that's what we appear to be doing. None of this is fact, just based on the industries direction*
> 
> 
> 
> A few things we need to understand. Ashes of the Singularity, currently, runs 30% of its engine in compute. Oxide's developer mentioned that they're aiming for 50% of their engine to be running in compute. He has also hinted about moving the triangle operations into the compute realm. Many other developers, think Dreams, are heading in that direction. Compute appears to be where DX12 is headed. It's not just in one title... many titles down the road have this in mind. Developers are talking about moving almost everything into the compute realm by a year or two's time (check out youtube developer talks).
> 
> What this means is that Compute just got a whole lot more important with DX12. *Parallelism* just got a whole lot more important as a result. We're not even talking Asynchronous Compute, that's a bonus ontop of it all, by using breaks in between executions in order to throw even more eye candy onto the screen. We ought to keep this in mind (think Fury-X's theoretical compute figures compared to the competition).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Now back onto the topic at hand, nVIDIA was already able to tap into most of their compute capabilities, with Maxwell 2, under DX11. Therefore Async Compute may not afford them as large of a bump as it does to GCN. If nVIDIA can fix their driver and offload the Post Processing effects onto the Asynchronous Compute pipeline *they ought to get a boost*. How large of a boost? I don't know. That being said, the Oxide developer also hinted at Fury-X getting more optimizations (and I see it happening with his talk on moving 50% of the engine into compute). I firmly believe that by the end of this, at least in Ashes of the Singularity, the Fury-X will be far ahead of the other cards (GTX 980 Ti/390x/290x). Of course I can't prove this... but that's what I'm getting from what's been said so far.
> 
> Another bonus from moving various aspects into compute is that you save on memory bandwidth, you save on memory usage as well. If you were to ask my opinion, this is how it will play out. nVIDIA will get Asynchronous Compute working... small bump, engine will then throw more stuff into compute and Fury-X will get a, perhaps substantial, bump. By the end of this we ought to see competition between a 390x and a GTX 980 Ti (GTX 980 Ti will probably come out on top) with Fury-X beating them both under Ashes of the Singularity. That's my take on things as pure speculation.
> 
> This may not be the same for every DX12 title, but seeing as AMD has partnered with the majority of titles releasing in 2015/2016 under DX12... things could be very different, in a short while, than they are now under DX11. I'm expecting a whole new ball game for DirectX 12.
> 
> Dreams: http://www.theverge.com/2015/6/15/8786889/dreams-game-trailer-ps4-e3-2015


The question I have for you is, will this be an industry trend? Will most developers try to put as much of their engine onto the shaders? Or is it just Ashes of Singularity and a few titles?

The greatest advantage that the Fury X has of course is the massive shader power - 4096 shaders. Right now we have concluded that the poor triangle performance is bottlenecking the Fury X. For that reason, clock for clock, a Fury does almost as well, and we aren't seeing linear scaling in shaders relative to the 290X. This could be a huge boost to the Fury X. It also has implications for future GPU design - more shaders may be a good use of the increased transistor budget then that a new process will give.

If so, then the Fury X suddenly becomes a much more attractive card. But if this is the only game that does so or if not many games do so, then it's not so good a value proposition. Either way, I think waiting for AMD's Greenland/Nvidia Pascal seems to be the best option for most.

Also consider this, the Oxide developer said this:
Quote:


> Originally Posted by *Kollock*
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. *Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. F*or example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


An entire graphics engine on compute. Will this be the future of the gaming industry? I ask this because we need to know, is Ashes a "one of a kind" or is this the future of the genre - to offload as much to compute as possible? And maybe someday, 100% compute game engines will be the norm?

Suddenly, AMD's decision "High Density Libraries" to cram as many transistors as possible to get as many shaders and as high a compute performance per mm^2 of die as possible seems like a very good decision, at least for DX12. In fact I expect Nvidia will follow in that case.

As far as what will happen elsewhere:

HBM2 will be bringing a lot more bandwidth. Whether or not a 100% compute game needs that or not is open to debate. But it's there if a game needs it. This will probably benefit Nvidia more than AMD simply because right now they have been using available bandwidth more efficiently than AMD. Of course it's the big advantage if 100% compute engines it once was, but it's still there.
HBM2 will also allow for more than 1GB per stack of VRAM, with up to 32 GB. Again, not essential for 100% compute gaming engines, but it's there if you need it, if you get what I am saying. I could see some things like high resolution texture mods though and perhaps large RTS maps benefiting from this though.
Overall compute I think will benefit more than gaming (since it can use the bandwidth better), especially once ECC HBM2 VRAM comes out. I imagine Compute (and by compute I mean Compute proper) cards will use the full 32 GB right off the bat with ECC.
It will also come down to the quality of the GPU memory controller on the die.
It's all dependent though on more games being more compute reliant for their engines. If it remains 30-50%, then it will be more like we have today. But if it goes 100% onto compute, we could be in for a revolution.

I'd be interested Mahigan in your thoughts about this.

Quote:


> Originally Posted by *provost*
> 
> In addition, assuming what you have stated is true, which I don't know, Nvidia should be providing some visibility and guidance on this. When I stated that I like that poster from Guru3d's sincerity and honesty, it was particularly in reference to his candid comments about Nvidia not providing a detailed and coherent response.
> 
> This issue potentially impacts a broader audience, and thus, lack of details from Nvidia is a bit curious.


I think that at this point, Nvidia has come to the conclusion that the damage they would do to themselves by saying something exceeds the damage by keeping quiet.

Admitting that their new architecture will do "x" is more or less an admission that their current architecture isn't going to do it well, especially given the scrutiny this is now getting. They likely won't say it until after they release a GPU better at parallel.

To use Mahigan's example, when the 5870 introduced tessellation, Nvidia didn't try to push it. But then later on, they vastly exceeded AMD in tessellation capabilities to the point where it's a big advantage they have over AMD (and still do to this very day).

I mean sometimes Nvidia does come clean (to their credit they did so during the 3.5GB issue with the GTX 970), but only after the evidence becomes overwhelming. There have been cases where their business ethics have been highly questionable. The Shield Tablet exploding batteries one is an example - they only did so after the evidence became overwhelming.

It's a good reason why we should fear a Nvidia monopoly.

Quote:


> Originally Posted by *Clocknut*
> 
> It wouldnt make a nose dive performance, but definitely it will affect Nvidia performance.
> 
> Probably we will see maxwell become just like Kepler, dropping 1 tier performance across the board. For ex. I wont be surprise a 390X sitting on top of 980, while the 980 will be struggle to be on par with 390-non X. The same scenario happen like 290X/290 vs 780Ti


I agree. Maxwell will be deprecated largely after Pascal comes out. Then Pascal is likely to be when Volta comes out. Part of this is because Nvidia's monetary resources enable frequent architecture updates, but it is also because of their "planned obsolescence" business practices. They certainly have the money that if they wanted to, they could keep older architectures supported - look at their profit statements.

Although to be fair, we will see GCN abandoned someday when AMD abandons GCN as well. The same thing happened with VLIW. The question is when though - we know they are making major changes in Greenland.

For those that bought the 7970 or the 290X (or any of the GPUs in the series), they got a pretty good run (and still are). When the GTX 680 came, it was widely viewed by reviewers as the better card vs the 7970 - and now, not so much. Similarly the 290X has made some pretty good relative gains. I suppose for those who don't upgrade often, the best time to buy is at or the generation after major architectural changes happen to GPUs.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Forceman*
> 
> In addition, they have the resources to send engineers to the developers and help then develop Nvidia only code paths if they have to (as they've done with AotS. That may not help them with heavy DX12 titles that may come out late next year, but I'd imagine they could find a way to limit their downside in the near term. They did it with AotS, the can most likely do it with other games, especially considering Gameworks titles will probably not use much async in the first place (and probably will use things that AMD struggles with).


The cost got to go somewhere. Before they had the advantage of sales over AMD and performance thanks to DX11 drivers. With DX12 AMD does not need to work as hard making Nvidias driver effort seem less rewarding as in DX11.


----------



## Vesku

Quote:


> Originally Posted by *provost*
> 
> Yes, it is about the future, and not the current cards, and this what CrazyElf was eluding to earlier as well. Now, all we need is a confirm from Nvidia, or make an educated guess... lol
> 
> Maxwell is in the bag, what is coming down the pike is more relevant ...


Quote:


> Originally Posted by *infranoia*
> 
> The biggest question for me at this point is whether async scheduler software improvements can improve their preemption latency for VR. It's a related issue I'm sure, as Nvidia improved their power efficiency over AMD by simply ripping out the hardware scheduler.
> 
> http://www.dsogaming.com/news/oculus-employees-preemption-for-context-switches-is-best-on-amd-nvidia-possibly-catastrophic/




"still a long way off" for at least fine grained preemption.


----------



## Forceman

Quote:


> Originally Posted by *ZealotKi11er*
> 
> The cost got to go somewhere. Before they had the advantage of sales over AMD and performance thanks to DX11 drivers. With DX12 AMD does not need to work as hard making Nvidias driver effort seem less rewarding as in DX11.


Nvidia has money coming out their ears (to the tune of $4 billion), and they still have a strong sales lead over AMD. That's not likely to change until at least Pascal/Arctic Islands.


----------



## Clocknut

Quote:


> Originally Posted by *Mahigan*
> 
> Don't under estimate nVIDIAs ability to perform driver voodoo as it pertains to their scheduler. I'm pretty sure they'll get it fixed.


I think they are going to tell you buy Pascal instead. Maxwell is likely to be forgotten by then.


----------



## Kana-Maru

Quote:


> Originally Posted by *Forceman*
> 
> Then don't make prejudicial statements like "been crippled by driver updates".


Well I did test with my old Keplers before recently replacing them. The games and benchmarks didn't change or get updated. I actually started to noticed the degrading performance late last year and earlier this year. The drivers consistently degraded my performance and stable overclock. One driver actually made me think my cards were defective [color bleeding all over the screen, random driver crashing etc] . The performance never really came back unless I used older drivers from last year. That's impossible to do since you need the latest optimized drivers for the newest games. I was fed up and started looking at the 980 Ti and the Fury X. The so called "Kepler Fix" drivers didn't really do anything for me and I was stuck using older drivers for the best performance. I'd have to check my topic for my results.


----------



## SpeedyVT

Quote:


> Originally Posted by *Mahigan*
> 
> It's not that the GTX 780 Ti got slower with driver updates, its that the R9 290x got much faster over time. GameWorks titles also hit the GTX 780 Ti hard. So if you play GameWorks titles, you might get the impression that your GTX 780 Ti has become slower.
> 
> Check this out: http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/70125-gtx-780-ti-vs-r9-290x-rematch.html


I read that a while back along with a few other reviews.

I think it's a bit of both.
Quote:


> Originally Posted by *Assirra*
> 
> Well of course, it's not like they can magically increase the hardware by driver updates.
> At a certain points you simply cannot improve it anymore besides starting to mess around with the hardware.
> 
> It is also 100% common sense that newer hardware will have more improvements over time than older, which already got their improvements.


And by both we see fewer fixes for newer titles even if they employ newer engines.

Between the 290x getting better with drivers and between drivers not sustaining fixes it has set the 780 ti back.

Quote:


> Originally Posted by *Kana-Maru*
> 
> Well I did test with my old Keplers before recently replacing them. The games and benchmarks didn't change or get updated. I actually started to noticed the degrading performance late last year and earlier this year. The drivers consistently degraded my performance and stable overclock. One driver actually made me think my cards were defective [color bleeding all over the screen, random driver crashing etc] . The performance never really came back unless I used older drivers from last year. That's impossible to do since you need the latest optimized drivers for the newest games. I was fed up and started looking at the 980 Ti and the Fury X. The so called "Kepler Fix" drivers didn't really do anything for me and I was stuck using older drivers for the best performance. I'd have to check my topic for my results.


Exactly, same with a friend of mine and what he experienced. Another friend with an even older GPU too. AMD has a few problems though, nothing a DDU can't fix in most occassions.

Perhaps nVidia released bad drivers in error or some sort of occurence. Neither company is bad, neither company is better. There are just has been recent circumstances that seem to have caused either failures or lack of support. I imagine majority of it is stuck because of the constant focus on DX12 and so most workers have been pulled from legacy updates. Not that 780 ti already deserves a legacy title, too soon.


----------



## delboy67

Quote:


> Originally Posted by *Mahigan*
> 
> Don't under estimate nVIDIAs ability to perform driver voodoo as it pertains to their scheduler. I'm pretty sure they'll get it fixed.


Dont underestimate thier marketing team either if they dont, I couldnt be the only one thats noticed the viral marketing on just about every tech forum and news comment page? I'll tell you this, I cant remember when I first started reading about pc tech/discussions on the internet but it was on a packard bell running win95 and I've never seen anything rubbished like amd drivers so much in my life. Started just after you input in this thread as well. I'm vendor agnostic and recently switched 'teams' like always when opportunity arrives and sorry but I just dont see the driver superiority of one side over the other any more.


----------



## Paul17041993

Quote:


> Originally Posted by *pengs*
> 
> AMD can thank NV for the lesson in proprietary technologies and marketing. It's apparently worked pretty well for NVIDIA so obviously AMD is going to follow suit and what a better time to do it.
> 
> It's a wash with AMD's market share, they are in need of the gain so I'd let this be and maybe in a year or two NVIDIA will be changing GameWork's name to FreeWorks, following suit.


Except for freesync being adopted by intel and gsync likely seeing retirement, unless they make it freesync compat...
However that tech is something that AMD were working on for many years beforehand, so that's slightly different I guess...


----------



## SRV

We all know by now that Async compute is way to gain performance in DX12. I wonder what are other ways to gain performance with DX12, if there are any?

Because talk about multiple times smaller CPU overhead is main topic of DX12. Now if that is achieved using Async compute...

How much of DX12 are actually new graphics features/possibilities to implement even more realistic 3D games?


----------



## Cyro999

Quote:


> Originally Posted by *SRV*
> 
> We all know by now that Async compute is way to gain performance in DX12. I wonder what are other ways to gain performance with DX12, if there are any?
> 
> Because talk about multiple times smaller CPU overhead is main topic of DX12. Now if that is achieved using Async compute...
> 
> How much of DX12 are actually new graphics features/possibilities to implement even more realistic 3D games?


Async compute seems to be for increasing GPU performance moreso than reducing CPU overhead. These tests are GPU bound at the moment (aside from the AMD dx11 one which was so far below everything else that it became CPU bound at very low FPS in a graphically intensive test)


----------



## bastian

Quote:


> Originally Posted by *Cyro999*
> 
> Async compute seems to be for increasing GPU performance moreso than reducing CPU overhead.


There is a lot of misinformation going around. And people just seem to be assuming that if you throw ASynch compute into any game it provides magical performance increases. Not every DX12 game is going to use ASync compute, especially the way Ashes does.


----------



## NoirWolf

Quote:


> Originally Posted by *bastian*
> 
> There is a lot of misinformation going around. And people just seem to be assuming that if you throw ASynch compute into any game it provides magical performance increases. Not every DX12 game is going to use ASync compute, especially the way Ashes does.


The way I got it is Async makes more of what there already is. That you can have more information processed which results in many more objects (both AI and just physics controlled) being on screen and in-game active. This means a potential revolution in singleplayer story driven games ( considering the physics angle ) and open world ones ( many more NPCs kicking around ) in addition to strategy games.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *bastian*
> 
> There is a lot of misinformation going around. And people just seem to be assuming that if you throw ASynch compute into any game it provides magical performance increases. Not every DX12 game is going to use ASync compute, especially the way Ashes does.


Bingo.

Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.

Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Bingo.
> 
> Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.
> 
> Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


20% of the dGPU market share (26% of the laptop/desktop GPU market) and 100% of the console market. Oh I wonder who they'll go with considering Nvidia can always tweak their drivers for better performance.


----------



## gamervivek

Quote:


> Originally Posted by *bastian*
> 
> There is a lot of misinformation going around. And people just seem to be assuming that if you throw ASynch compute into any game it provides magical performance increases. Not every DX12 game is going to use ASync compute, especially the way Ashes does.


Indeed, they're going to use way more than what Ashes has currently done.
Quote:


> Originally Posted by *Mahigan*
> 
> It's not that the GTX 780 Ti got slower with driver updates, its that the R9 290x got much faster over time. GameWorks titles also hit the GTX 780 Ti hard. So if you play GameWorks titles, you might get the impression that your GTX 780 Ti has become slower.
> 
> Check this out: http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/70125-gtx-780-ti-vs-r9-290x-rematch.html


https://forum.beyond3d.com/threads/is-nvidia-deliberately-downgrading-kepler-performance-in-favour-of-maxwell.57111/page-4#post-1868787


----------



## drSeehas

Quote:


> Originally Posted by *Paul17041993*
> 
> ... Because it's totally AMD's fault that microsoft want DX12 to be windows 10 exclusive.


There will be Vulkan.


----------



## EightDee8D

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Bingo.
> 
> Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. *If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.*
> 
> Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


ummm no , every console game is running on GCN so obviously those games are optimized for amd already. its nvidia who needs additional work vs amd. without gameworks i think most games will favour GCN (in dx12) .
nvidia couldn't even manage to include hardware asynchronous shader into maxwell2 even though they were working on dx12 since what ? more than 2-3 years ?







and there's less things what an IHV can do than a game dev in dx12 since it's closer to metal. ad it seems nvidia has less metal.


----------



## airfathaaaaa

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Bingo.
> 
> Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.
> 
> Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


developers were gimping games till the day valve decide to have refund system
then batman game came on
buyers bagged it
the game got off steam to be fixed (







)
strangely many "serious" sites is still using it as a benchmark for some reason(like we dont know the reason)
and im gonna guess that since steam has a refund system developers wont be able to just throw a horrible game on customers and expect good reviews backed from a certain company

THIS is why the whole game scenery is about to change and make devs to be more carefull with what they are doing..


----------



## MonarchX

Empire (NVidia) strikes back! I like it! I just hope the performance improvement is a considerable one!


----------



## Noufel

Quote:


> Originally Posted by *MonarchX*
> 
> Empire (NVidia) strikes back! I like it! I just hope the performance improvement is a considerable one!


i don't think it will be a huge gain considering the sheduler will be software based and not hardware like in the GCN arch but who knows perhaps nvidia could do the trick


----------



## TopicClocker

I've read pages and pages about the capabilities of Asynchronous Compute, before this whole DX12 thing blew up.
The capabilities of Asynchronous Compute is astounding!

The fact that AMD implemented Asynchronous Compute Engines within the GCN architecture since it's first conception has really paid off, some impressive foresight to say the least.


----------



## Stewox

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Bingo.
> 
> Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.
> 
> Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


Who gives a rats about entitled developers who suck at optimization, if they don't tweak, their problem, if you can't do it to the level of standards I want, i won't buy your game, so it's not my problem, if you're so low and such a loser you'll keep avoiding the hardcore market and you'll end up in the awesome community of hoards of trendy smartphone script kiddies.

I been reading a few of these pages, you guys seem to be talking like you're market analysts for the companies and it feels like a group of people who talk about football scores, that's what's pretty much gaming discussion is in the new young zombie generations, for example on neogaf most people talk about games and companies like players and teams on a football field, they don't develop or even play games, they talk more about all the stats in a never ending discussion, while not realizing the core limitations and getting to the bottom of those, the API and Drivers are screwing developers over in the PC world, for comparison in football, it's corruption, bribery, and ofcourse the rules of the game it self, there are so many factors that cannot be controlled by skill. In their endless stats discussions they never get to the bottom of it that the game wasn't designed for the players, it was designed for the masses to be distracted by it (to dump their time, energy, money into it). It is simulated warfare. Men are suppose to protect their families and be strong themselfs, not eating fast foods and watching other sweaty men spreading their legs throwing an inflated baloon on a green field with white lines.

If some vendor is given more optimization it's just an artficial hack and/or handicap of the opponent, that's called BS, there no war being won, since the war is not fair, it's not genuine anymore.

DX12 is here, it's a huge improvement, it gives developers what they were asking for years, if some entitled trendy college-kid developers complain, their opinions are totally irrelevant in the big picture, they can keep using DX11 if they're so smart, nobody's forcing them to use DX12.

Plus, only the parasites look at established market share in order to do something, the *real builders* are the people with the "if you build it they will come" attitude and they are the *establishers* of market share, they are the real *risktakers*, not some stupid apple smartphone google glass college trendy kiddy.

So if you see any _Whining Anti-Optimization Entitled College-Kid Random Chat-Room Developers_, they probably are similar to those explained in this video:


----------



## Forceman

Quote:


> Originally Posted by *drSeehas*
> 
> There will be Vulkan.


There was also OpenGL, and no one used it. I'm not holding my breath for a wave of Vulkan games to free us from Microsoft.
Quote:


> Originally Posted by *Noufel*
> 
> i don't think it will be a huge gain considering the sheduler will be software based and not hardware like in the GCN arch but who knows perhaps nvidia could do the trick


Well, their DX11 scheduler is also software based, and that didn't stop them from crushing AMD in DX11 overhead, so I wouldn't write them off completely.


----------



## Stewox

Quote:


> Originally Posted by *Forceman*
> 
> There was also OpenGL, and no one used it. I'm not holding my breath for a wave of Vulkan games to free us from Microsoft.


Rage from Id Software used it, and OGL is hardly a major difference from DX11 - if that was so, Rage would be running 3 times as faster as it did on release.


----------



## NoirWolf

Quote:


> Originally Posted by *Forceman*
> 
> Well, their DX11 scheduler is also software based, and that didn't stop them from crushing AMD in DX11 overhead, so I wouldn't write them off completely.


Considering one was built for parallelization while the other is so far up its own ass in serial ? No one is writing them off but as I and others have said so far: If Pascal wasn't baked with a hardware scheduler from the get or they thought of some absurd way of optimising for their GPUs ( which would require a driver for every single DX 12 game with Async ) they're gonna be adrift for a year and a half in a market that's growing hostile to their GPUs and it will place a card in AMD's marketing department's hand considering even the 270x can get good Async performance ( What was that waste the usual Nvidia fanboy peddled? The 370x = a thrice rebranded card? That in theory can sucker punch a 960 and 970 in Async heavy games... how'd that sit with the market and the die hards? Cause AotS is 20-30% Async, imagine what could happen with more if Nvidia don't have a counter)


----------



## Blitz6804

Thread re-opened. In the future, please refrain from profanity or flaming. If you see something out of line, report it, do not reply to it.


----------



## Themisseble

I still dont get it? How can you support async shaders by software?... so any arch could support it.?


----------



## Klocek001

Quote:


> Originally Posted by *SRV*
> 
> That 80% market share for Nvidia is another story. Despite worse cards than AMD for given price (it is only better power efficiency that they have better compared to Hawaii, and that is case only with Maxwell 2 - I am not taking Titan X, 980Ti and Fury X cards in account because of insane price) and even mining craze, they actually increased their market share. Either it is some kind of foul play (like Intel had long time ago) or people are plain stupid.
> .


as a nvidia owner and an idiot all I can say is

that is the least fancy thing I have ever heard.


----------



## NoirWolf

Quote:


> Originally Posted by *Themisseble*
> 
> I still dont get it? How can you support async shaders by software?... so any arch could support it.?


Basically the way I understood what others said: The same way AMD cards are forced to support Physx: Through the CPU. Something will run off the CPU and something off the GPU and they'll be mixed together by the driver (or something like that, I am more of a tech head). It will not do any favours to the GPU as it will increase latency significantly but it is still possible. What I cannot imagine is how they'll do this across the board without Nvidia having to basically optimize for each game in part that uses Async to the degree AotS does (or more) and keep tweaking it... even so their performance will be lower versus AMD cards by virtue of the latency and possible software gremlins that creep into the code... as I said: I hope Pascal was designed with DX 12 Async in mind.


----------



## SRV

Quote:


> Originally Posted by *Klocek001*
> 
> as a nvidia owner and and idiot all I can say is
> 
> that is the least fancy thing I have ever heard.


Nothing like that, that is mean from you. What I buy for my money is not of your concern. I am not saying Nvidia buyers are idiots, I just don't get such big difference in market share and I know there are people who buy Nvidia just because of the drivers. I personally know few people that buy only Nvidia just because of the driver myth.

If you want the best graphics card, you buy Nvidia. I'm fine with that. They DO have fastest cards on the market know. But that performance gain is not justified and does not relate to whole range sales, because it is just high-end.

It is questionable how one company has 80% market share when other is just as good, realistically with worse performance-power consumption ratio but better price-performance ratio.

Intel is dominant in CPU field and that is justified, their CPUs are much better than AMD's and no wonder they dominate. But it is known fact that in the past Intel did some dishonest deals with companies such as Dell. I wonder if Nvidia did the same.

Finally there was mining craze and not even that helped AMD in market share. What Nvidia cards are so superior to AMD? Titan X, GTX980/980Ti? That is high-end. Most people don't buy high-end, rather $200 card, and I mean not in Bosnia but in the world.


----------



## Forceman

Quote:


> Originally Posted by *Themisseble*
> 
> I still dont get it? How can you support async shaders by software?... so any arch could support it.?


Quote:


> Originally Posted by *NoirWolf*
> 
> Basically the way I understood what others said: The same way AMD cards are forced to support Physx: Through the CPU. Something will run off the CPU and something off the GPU and they'll be mixed together by the driver (or something like that, I am more of a tech head). It will not do any favours to the GPU as it will increase latency significantly but it is still possible. What I cannot imagine is how they'll do this across the board without Nvidia having to basically optimize for each game in part that uses Async to the degree AotS does (or more) and keep tweaking it... even so their performance will be lower versus AMD cards by virtue of the latency and possible software gremlins that creep into the code... as I said: I hope Pascal was designed with DX 12 Async in mind.


It's not that actual compute task that would be software side, it is the scheduling (assuming what people think is true about the architecture is actually true). Which will have an impact, but an unknown one at this time. Anything more than that is just speculation at this point. But it's not the same as GPU Physx at all.


----------



## NoirWolf

Quote:


> Originally Posted by *SRV*
> 
> Nothing like that, that is mean from you. What I buy for my money is not of your concern. I am not saying Nvidia buyers are idiots, I just don't get such big difference in market share and I know there are people who buy Nvidia just because of the drivers. I personally know few people that buy only Nvidia just because of the driver myth.
> 
> If you want the best graphics card, you buy Nvidia. I'm fine with that. They DO have fastest cards on the market know. But that performance gain is not justified and does not relate to whole range sales, because it is just high-end.
> 
> It is questionable how one company has 80% market share when other is just as good, realistically with worse performance-power consumption ratio but better price-performance ratio.
> 
> Intel is dominant in CPU field and that is justified, their CPUs are much better than AMD's and no wonder they dominate. But it is known fact that Intel did some dishonest deals with companies such as Dell. I wonder if Nvidia did the same.
> 
> Finally there was mining craze and not even that helped AMD in market share. What Nvidia cards are so superior to AMD? Titan X, GTX980/980Ti? That is high-end. Most people don't buy high-end, rather $200 card, and I mean not in Bosnia but in the world.


Some markets have little AMD GPUs on offer for a decent/competitive price. I helped quite a few people in the last few weeks locate deals for AMD GPUs and to be blunt... Vietnam is full out of luck and several major markets like Japan and Australia have overpriced AMD vs Nvidia GPUs. I cannot imagine how Nvidia does this to be honest without a massive loss in profits.
Quote:


> Originally Posted by *Forceman*
> 
> It's not that actual compute task that would be software side, it is the scheduling (assuming what people think is true about the architecture is actually true). Which will have an impact, but an unknown one at this time. Anything more than that is just speculation at this point. But it's not the same as GPU Physx at all.


That is also speculation. You don't know how they'll try to fix it.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *airfathaaaaa*
> 
> THIS is why the whole game scenery is about to change and make devs to be more carefull with what they are doing..


I've been hearing this stuff since I started PC gaming back in 1983.

"THIS is what will make Developers do ..."

I'm sorry, I've heard it a million times before. After being burned so many times over the decades on empty promises about what Developers are gonna do, they are going to have to PROVE IT FIRST. That's why I don't bite into "Pre-orders" and other garbage anymore (or for a long time). I'm sick of they hype. Both from software AND hardware developers.


----------



## SRV

Quote:


> Originally Posted by *NoirWolf*
> 
> Some markets have little AMD GPUs on offer for a decent/competitive price. I helped quite a few people in the last few weeks locate deals for AMD GPUs and to be blunt... Vietnam is **** out of luck and several major markets like Japan and Australia have overpriced AMD vs Nvidia GPUs. I cannot imagine how Nvidia does this to be honest without a massive loss in profits.


Well we have no proof for that and I am not saying it has to be some kind of foul play, but GPU market is nothing like CPU market where Intel is clear choice. It is only for people with fear for AMD drivers that Nvidia makes clear choice. I wonder how many are there.









I can understand there are availability differences across the globe...


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Stewox*
> 
> DX12 is here, it's a huge improvement, it gives developers what they were asking for years, if some entitled trendy college-kid developers complain, their opinions are totally irrelevant in the big picture, they can keep using DX11 if they're so smart, nobody's forcing them to use DX12.


And guess what, Developers WILL keep using DirectX 10 and 11 too. Why? Because DirectX 12 is only for Windows 10.

No game developer is going to slash their own throats by making a Windows 10 ONLY game less than 1 year after the launch of Windows 10. There are still a ton of people, especially after the who Windows 10 "snooping on users" thing, that will stay on Windows 7 (and a tiny fraction who will stay on Windows 8.1 ... no one will stay on Windows 8, and those that do, who cares, they are obviously really dumb to not go to at least 8.1 hehehe).


----------



## Blameless

Quote:


> Originally Posted by *STEvil*
> 
> Anyone remember Software T&L vs Hardware T&L?


Yes.

It's why I chose a GeForce 256 DDR over a Voodoo 3 .

However, I did have a Voodoo 4 4500 later on specifically for those last few GLide games.
Quote:


> Originally Posted by *Themisseble*
> 
> I still dont get it? How can you support async shaders by software?... so any arch could support it.?


You can't.

Maxwell 2, at the very least, has _hardware_ support for the queues required for async compute (meaning simultaneous graphics and compute without a mode switch). However, scheduling for these queues is apparently done at the driver level, and NVIDIA's driver lacked full support at the time of the original article.


----------



## Klocek001

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And guess what, Developers WILL keep using DirectX 10 and 11 too. Why? Because DirectX 12 is only for Windows 10.
> 
> No game developer is going to slash their own throats by making a Windows 10 ONLY game less than 1 year after the launch of Windows 10. There are still a ton of people, especially after the who Windows 10 "snooping on users" thing, that will stay on Windows 7 (and a tiny fraction who will stay on Windows 8.1 ... no one will stay on Windows 8, and those that do, who cares, they are obviously really dumb to not go to at least 8.1 hehehe).


you got a point.


----------



## MonarchX

NVidia SHALL improve its AoS performance with new drivers and game developers SHALL optimize games for NVidia because of their market share. None of this changes how incredible of a job AMD did with their hardware. Game developer won't NEED to waste time optimizing for AMD hardware because of 2 thing: AAA console ports are already semi-optimized for AMD hardware and AMD hardware + drivers, by default, are excellent at DirectX 12! They is of course going to be GameWorks, but we all know how gimmicky it turns out and is heavy on performance, unlike similar AMD technologies (Tomb Raider hair technology vs. Witcher 3 HairWorks).

It is kind of interesting how OpenCL can be used to work with DirectX 12. I thought they are entirely different things... Does this mean that to optimize for NVidia's half-hardware / half-software Async Shaders, developers will have to let OpenCL do some rendering.

I am a bit confused about DirectX 12 coming to consoles. IS IT actually coming to BOTH consoles or only Xbox One?


----------



## Forceman

Quote:


> Originally Posted by *NoirWolf*
> 
> That is also speculation. You don't know how they'll try to fix it.


True, but we do know how they fixed it in AotS - by disabling it. So I guess in that respect it is like Physx. However, doing compute tasks developed for a GPU on the CPU is pretty much a non-starter (again, look at Physx) so it seems pretty unlikely that they would even try.


----------



## Mahigan

Quote:


> Originally Posted by *MonarchX*
> 
> NVidia SHALL improve its AoS performance with new drivers and game developers SHALL optimize games for NVidia because of their market share. None of this changes how incredible of a job AMD did with their hardware. Game developer won't NEED to waste time optimizing for AMD hardware because of 2 thing: AAA console ports are already semi-optimized for AMD hardware and AMD hardware + drivers, by default, are excellent at DirectX 12! They is of course going to be GameWorks, but we all know how gimmicky it turns out and is heavy on performance, unlike similar AMD technologies (Tomb Raider hair technology vs. Witcher 3 HairWorks).
> 
> It is kind of interesting how OpenCL can be used to work with DirectX 12. I thought they are entirely different things... Does this mean that to optimize for NVidia's half-hardware / half-software Async Shaders, developers will have to let OpenCL do some rendering.
> 
> I am a bit confused about DirectX 12 coming to consoles. IS IT actually coming to BOTH consoles or only Xbox One?


The PS4 uses its own low level API, the thing is... it's not much of an effort to port it over to DX12. It's about as much of an effort as porting Mantle over to DX12.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Forget DX11 from AMD. Yes AMD does worse in DX11 in in actual games its far less for them to even bother at this point.


Forget DirecX 11 from AMD? Well, I'm sure that all the people who are planning to stay on Windows 7, 8, and 8.1 because they have concerns about Windows 10 and privacy issues (you know, the only OS that runs DirectX 12), aren't too happy to hear that.

That is why having a card that can support not just DirectX 12 good, but DirectX 11, 10, and even 9, is important. Not everyone wants to be on Windows 10, and not everyone only plays games that haven't even been released yet.

Forget DirectX 11. HAHAHA that's funny.

The day that AMD "forgets DirectX 11" is the day they will need to shut their doors and sell to Samsung, Intel, or Bob's Discount Computer store, because you will see their graphics cards market share fall even more and become even less relevant than their CPU division.


----------



## Mahigan

Quote:


> Originally Posted by *Forceman*
> 
> True, but we do know how they fixed it in AotS - by disabling it. So I guess in that respect it is like Physx. However, doing compute tasks developed for a GPU on the CPU is pretty much a non-starter (again, look at Physx) so it seems pretty unlikely that they would even try.


There seems to be confusion as to what the "software scheduling'" means. It doesn't mean that they'll be doing compute tasks over the CPU. It means that there is a limitation added to the way nVIDIAs solution can execute tasks in parallel. It's all about what David Kanter was saying on preemption.

Say you have a graphic shader, in a frame, that is taking a particularly long time to complete, with Maxwell2, you have to wait for this graphic shader to complete before you can execute another task. If a graphic shader takes 16ms, you have to wait till it completes before executing another graphic or compute command. This is what is called "slow context switching". Every millisecond, brings down your FPS for that frame. So if you have 16ms for a graphic shader, 20ms for a compute task and 5 ms for a copy command, you end up with 41ms for that frame. This wasn't important for DX11 and nVIDIA primarily designed their Kepler/Maxwell/Maxwell 2 architectures with DX11 in mind.


Spoiler: Warning: Spoiler!







With GCN, you can execute out of order and the ACEs will check for errors and re-issue, if needed, to correct an error. Out of order means that you don't need to wait for one task to complete before you work on the next. So say, on GCN, that same Graphic shader task takes 24ms, in that same 24ms you can do a bunch of other tasks in parallel (like the compute and copy command above). So your frame ends up being only 24ms.


Spoiler: Warning: Spoiler!







Developers need to be super careful about how they program for Maxwell 2, if they aren't... then far too much latency will be added to the frame. If a particular frame is already high latency... then you can't use Asynchronous Compute on it with Maxwell 2. This is even once nVIDIA fix their driver issue.

From all the sources I've seen, Pascal is set to fix this problem (but wait and see as it could just be speculation). I just don't think nVIDIA thought the industry would jump on DX12 the way it is right now... pretty much every single title, in 2016, will be built around the DX12 API. We'll even get a few titles in a few months in 2015.

I wouldn't underestimate nVIDIAs capacity to fix their driver. But like I've told many people... wait and see on the performance boost from their Asynchronous Compute solution.


----------



## drSeehas

Quote:


> Originally Posted by *Forceman*
> 
> ... I'm not holding my breath for a wave of Vulkan games to free us from Microsoft. ...


With D3D12 you have only Xbox One and Windows 10. (Think about the real market share of Windows 10)
With Vulkan you will have almost all PCs with Windows (starting from Vista) and Linux (e. g. Steam) and a lot of smartphones and tablets.


----------



## Forceman

Quote:


> Originally Posted by *drSeehas*
> 
> With D3D12 you have only Xbox One and Windows 10. (Think about the real market share of Windows 10)
> With Vulkan you will have almost all PCs with Windows (starting from Vista) and Linux (e. g. Steam) and a lot of smartphones and tablets.


Time will tell, but we've heard this all before with Opened (and Mantle). I think if developers don't want to jump to DX12 for some reason, they will say with DX11 like they have so far and not go to Vulkan, but in a year or two we'll know. In either case, it's not going to matter this year. Who knows though, maybe Half-life 3 will come on Vulkan to save us all.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And guess what, Developers WILL keep using DirectX 10 and 11 too. Why? Because DirectX 12 is only for Windows 10.
> 
> No game developer is going to slash their own throats by making a Windows 10 ONLY game less than 1 year after the launch of Windows 10. There are still a ton of people, especially after the who Windows 10 "snooping on users" thing, that will stay on Windows 7 (and a tiny fraction who will stay on Windows 8.1 ... no one will stay on Windows 8, and those that do, who cares, they are obviously really dumb to not go to at least 8.1 hehehe).


The adoption rate for Win 10 is absurd. It's gone from 0 to 17% of the Steam userbase (that deems it fit to complete their survey) in under two months
http://store.steampowered.com/hwsurvey
Does that sound like slashing their throats? Cause I do think Nvidia used the same logic with Maxwell and possibly Pascal... but free is free and if that rate continues as is for the next 2-3 months the main OS on Steam will be Win 10...feel free to disagree but so far the adoption rate has been astronomical.

Quote:


> Originally Posted by *Forceman*
> 
> True, but we do know how they fixed it in AotS - by disabling it. So I guess in that respect it is like Physx. However, doing compute tasks developed for a GPU on the CPU is pretty much a non-starter (again, look at Physx) so it seems pretty unlikely that they would even try.


The Physx comparison though is under different circumstances... DX11 and DX10 games, if they can swing it right (optimize what tasks need the CPU and what don't and keep it down) they can get a decent handle on this. By contrast Physx won't ever work well on AMD GPU bearing PCs because Nvidia doesn't want it to nothing more ( to my knowledge most AMD GPUs can run Physx better than their price bracket Nvidia competition ).


----------



## Xuper

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Forget DirecX 11 from AMD? Well, I'm sure that all the people who are planning to stay on Windows 7, 8, and 8.1 because they have concerns about Windows 10 and privacy issues (you know, the only OS that runs DirectX 12), aren't too happy to hear that.
> 
> That is why having a card that can support not just DirectX 12 good, but DirectX 11, 10, and even 9, is important. Not everyone wants to be on Windows 10, and not everyone only plays games that haven't even been released yet.
> 
> Forget DirectX 11. HAHAHA that's funny.
> 
> The day that AMD "forgets DirectX 11" is the day they will need to shut their doors and sell to Samsung, Intel, or Bob's Discount Computer store, because you will see their graphics cards market share fall even more and become even less relevant than their CPU division.wrong


Wrong.Nvidia has no Plan for DX12.their Maxwell aren't suitable for DX12.For DX12 AMD has upper hand.Also VR and Professional card.every Developer want to get rid of DX11 so AMD is ready for that.


----------



## MonarchX

Quote:


> Originally Posted by *Xuper*
> 
> *Wrong.Nvidia has no Plan for DX12*.their Maxwell aren't suitable for DX12.For DX12 AMD has upper hand.Also VR and Professional card.every Developer want to get rid of DX11 so AMD is ready for that.


Uh huh... They just forgot all about it...


----------



## snes

Hi, i have been partially following this thread, and apologies in advance if this has already been shared but it is of some interest.


__
https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/%5B/URL

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-9#post-1869058

User made benchmarks for async compute efficiency shows Maxwell does indeed have pretty decent level of async compute implemented already. Not sure how credible it is but there are links to the sources and benchmarks for further investigation.


----------



## NoirWolf

Quote:


> Originally Posted by *snes*
> 
> Hi, i have been partially following this thread, and apologies in advance if this has already been shared but it is of some interest.
> 
> http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2390
> 
> User made benchmarks for async compute efficiency shows Maxwell does indeed have pretty decent level of async compute implemented already. Not sure how credible it is but there are links to the sources and benchmarks for further investigation.


The discussion isn't if Maxwell can do it at all but if it can do it on the hardware side or the software side. General consensus is the software side which automatically means it won't run as well as their AMD counterparts in at least as far as latency is concerned. It being software side also means that a year or so after Pascal launches Maxwell's performance in DX12 may begin sliding depending on how much support it needs in newer and newer games (this happened on a less noticeable scale with Kepler ).


----------



## PostalTwinkie

Quote:


> Originally Posted by *Vesku*
> 
> 
> 
> "still a long way off" for at least fine grained preemption.


Uh oh!

Long way off because they don't want to put resources on it to develop it? Or because of a technology limitation?!

Smoking gun?


----------



## Mahigan

Quote:


> Originally Posted by *snes*
> 
> Hi, i have been partially following this thread, and apologies in advance if this has already been shared but it is of some interest.
> 
> 
> __
> https://www.reddit.com/r/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/%5B/URL
> 
> https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-9#post-1869058
> 
> User made benchmarks for async compute efficiency shows Maxwell does indeed have pretty decent level of async compute implemented already. Not sure how credible it is but there are links to the sources and benchmarks for further investigation.


Mis-information for the most part. There were some issues with the way the test was programmed at first. They didn't optimize it for GCN, now they're working on it.

Move to the last few pages of the Beyond3D thread: https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-32

That initial 50ms latency has now dropped to 28ms. The more threads they'll push, per batch, the more it will drop. They're also concerned that they're only using a single ACE right now.

I also think that they didn't take into consideration that Maxwell 2 can only execute a different task at the end of a draw call boundary, but I may be wrong (Software Scheduling issue and AWS lacking error checking capabilities).

Either way, nVIDIAs driver apparently did not function... therefore the testing on Maxwell was also without merit.

Like I keep saying at this point... wait until nVIDIA fix their drivers


----------



## Xuper

Quote:


> Originally Posted by *MonarchX*
> 
> Uh huh... They just forgot all about it...


I know Nvidia Worked on DX12 but Not Effective as AMD.Where is Nvidia's Slide About DX12? All I see is belong to AMD's slides.if you look at History , looks like AMD was prepared for DX12.


----------



## Mahigan

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Uh oh!
> 
> Long way off because they don't want to put resources on it to develop it? Or because of a technology limitation?!
> 
> Smoking gun?


That could mean Pascal won't fix the issue. David Kanter believes it will... hard to tell.


----------



## Shivansps

So let me get this straight, worse case scenario you can send the compute tasks to a CPU? If that can be done i guess it just as possible to send the compute task to a DX12 IGP or a secondary DX12 dGPU, if this is true the lack of AC on GTX900 is a minimal issue.


----------



## Forceman

Quote:


> Originally Posted by *Xuper*
> 
> I know Nvidia Worked on DX12 but Not Effective as AMD.Where is Nvidia's Slide About DX12? All I see is belong to AMD's slides.if you look at History , looks like AMD was prepared for DX12.


Nvidia may have just focused their efforts on a different DX12 features set. They do support features that AMD cards don't, we just have seen benchmarks from titles that use those yet.


----------



## infranoia

Quote:


> Originally Posted by *Mahigan*
> 
> That could mean Pascal won't fix the issue. David Kanter believes it will... hard to tell.


"Now, you know, I'm *sure* they're going to improve that in Pascal... They'll *probably* fix it in Pascal."

It sounds like he doesn't want to put a Benjamin on the line for it though.


----------



## Klocek001

Quote:


> Originally Posted by *Shivansps*
> 
> So let me get this straight, worse case scenario you can send the compute tasks to a CPU? If that can be done i guess it just as possible to send the compute task to a DX12 IGP or a secondary DX12 dGPU, if this is true the lack of AC on GTX900 is a minimal issue.


no, it isn't minimal. and who with 980/980ti would like to send the task to iGPU or another dGPU if they paid more than people with 390X which has no problem processing it without the CPU involved.


----------



## GorillaSceptre

I think we need a new thread with updated and correct info. There's a lot of incorrect statements being thrown around.

Someone want to open a DX12 thread? @Mahigan?


----------



## Vesku

Quote:


> Originally Posted by *infranoia*
> 
> "Now, you know, I'm *sure* they're going to improve that in Pascal... They'll *probably* fix it in Pascal."
> 
> It sounds like he doesn't want to put a Benjamin on the line for it though.


My *guess* is that the rumored ARM cores that were speculated on just before Maxwell launch will actually appear on Pascal. Mainly because I can't imagine Nvidia wanting to throw away all that investment in their software scheduling and experienced software teams. With an ARM core or two on board sharing memory access with the rest of the GPU they can then run their software scheduling on die. This would result in a nice drop in latency when implementing DX 12/Vulkan features. However, I'm also guessing that Nvidia won't have finer grained preemption until at least Volta. Basically I'm thinking Pascal will be a bit of a hybrid DX11/DX12 design for Nvidia.


----------



## Shivansps

Quote:


> Originally Posted by *Klocek001*
> 
> no, it isn't minimal. and who with 980/980ti would like to send the task to iGPU or another dGPU if they paid more than people with 390X which has no problem processing it without the CPU involved.


Nobody cares if can be fixed that easily, its an annoyance, yes, but is a very simple solution, try to fix the lack of ROVs in that way for example. You just cant.

And personally i have that big IGP laying there unused and i did pay for it, like most do.


----------



## Mahigan

Quote:


> Originally Posted by *infranoia*
> 
> "Now, you know, I'm *sure* they're going to improve that in Pascal... They'll *probably* fix it in Pascal."
> 
> It sounds like he doesn't want to put a Benjamin on the line for it though.


I'm hopeful. I don't like it when one company controls too much of a given market.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*
> 
> I think we need a new thread with updated and correct info. There's a lot of incorrect statements being thrown around.
> 
> Someone want to open a DX12 thread? @Mahigan?


Alright... new thread it is


----------



## Vesku

Quote:


> Originally Posted by *Mahigan*
> 
> I'm hopeful. I don't like it when one company controls too much of a given market.


I think AMD needs Pascal to be only partially improved just to get back to a 40/60 or 50/50 AMD/Nvidia marketshare. That's my impression since AMD hasn't even been able to sell off it's 290(X) stock priced under $300 for new warrantied cards.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> Alright... new thread it is


----------



## MonarchX

Quote:


> Originally Posted by *Xuper*
> 
> I know Nvidia Worked on DX12 but Not Effective as AMD.Where is Nvidia's Slide About DX12? All I see is belong to AMD's slides.if you look at History , looks like AMD was prepared for DX12.


Microsoft are the ones who worked hard on DirectX 12 and AMD simply designed GCN to provide high parallelism performance, but I heavily doubt that they were planning SO far ahead that they decided to forget about DirectX 11 optimizations and let NVidia win that round. NVidia on the other hand planned ahead very well and decided to push DirectX 11 cards and performance until DirectX 12 really takes off, right about the time Pascal comes out.


----------



## Mahigan

Quote:


> Originally Posted by *GorillaSceptre*


It's up here: http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing


----------



## NoirWolf

Quote:


> Originally Posted by *MonarchX*
> 
> Microsoft are the ones who worked hard on DirectX 12 and AMD simply designed GCN to provide high parallelism performance, but I heavily doubt that they were planning SO far ahead that they decided to forget about DirectX 11 optimizations and let NVidia win that round. NVidia on the other hand planned ahead very well and decided to push DirectX 11 cards and performance until DirectX 12 really takes off, right about the time Pascal comes out.


AMD pushed Mantle which got Microsoft off its collective ass because they also planned for the market not to be so ******ed as to continue to use a limited and difficult API such as DX11. They weren't thinking far ahead they were just expecting the developers not to settle. And considering Nvidia's current GPU development history and what little we know about Pascal... they seemed indeed to have planned ahead quite a bit but they hedged their bets on allot of wrong things (Pascal is said to have 3x the bandwidth of a Maxwell 2.0 card. does that sound like a good thing? Considering AMD has twice that with a HBM1 card? Now consider which Nvidia cards don't take a dive head first into a dry swimming pool *cough/Fermi/cough* and you'll get the distinct notion that they likely don't have a adequate card ready for Async in the next year, if they did they wouldn't be working on a driver in the 11th hour).


----------



## pengs

Quote:


> Originally Posted by *Mahigan*
> 
> That could mean Pascal won't fix the issue. David Kanter believes it will... hard to tell.


Quote:


> Originally Posted by *infranoia*
> 
> "Now, you know, I'm *sure* they're going to improve that in Pascal... They'll *probably* fix it in Pascal."
> 
> It sounds like he doesn't want to put a Benjamin on the line for it though.


Of course if you look at how far back an architecture is blue printed before it's actually materialized you could be looking at a 3-5 year span. I believe GCN was 'in the works' or probably on the planning board in 2006-7 and realized Q1 2012 which would be a solid 5-6 year span.

I guess the question is if Pascal was finalized before GCN was released - obviously NVIDIA is going to look at AMD's architecture and know exactly where they are going with it and able to counter it if the GPU hasn't been finalized. These are assumptions that I'm making really because if you count the other factors like targeted TDP, power consumption and process node it could be very difficult to change a GPU who's foundation was engineered to accommodate those factors.

Microsoft's late 90* turn towards low level was very abrupt. GCN being planted into the next gens was probably a little road bump for NVIDIA and then all of a sudden Microsoft develops the API which can utilize GCN's async abilities which only was announced at the beginning of last year and Pascal was probably finalized.

Next gens being the cake, GCN being the icing and DX12 the cherry on top.

Do you have any thoughts Mahigan? I didn't work at ATI so I'm sure a lot of what I just said about hardware development was a crap shoot


----------



## MonarchX

Quote:


> Originally Posted by *NoirWolf*
> 
> AMD pushed Mantle which got Microsoft off its collective ass because they also planned for the market not to be so ******ed as to continue to use a limited and difficult API such as DX11. They weren't thinking far ahead they were just expecting the developers not to settle. And considering Nvidia's current GPU development history and what little we know about Pascal... they seemed indeed to have planned ahead quite a bit but they hedged their bets on allot of wrong things (Pascal is said to have 3x the bandwidth of a Maxwell 2.0 card. does that sound like a good thing? Considering AMD has twice that with a HBM1 card? Now consider which Nvidia cards don't take a dive head first into a dry swimming pool *cough/Fermi/cough* and you'll get the distinct notion that they likely don't have a adequate card ready for Async in the next year, if they did they wouldn't be working on a driver in the 11th hour).


Aside from memory bandwidth not improving performance much @ 1080p and showing its glory @ 4K, AMD's implementation of HBM1 wasn't great at all due to slow GPU... Pascal is to use HBM2 and I am sure the GPU will take advantage of it.


----------



## Mahigan

Quote:


> Originally Posted by *MonarchX*
> 
> Aside from memory bandwidth not improving performance much @ 1080p and showing its glory @ 4K, AMD's implementation of HBM1 wasn't great at all due to slow GPU... Pascal is to use HBM2 and I am sure the GPU will take advantage of it.


Under DX11, Fiji is like a 12 cylinder engine with only 1 or 2 cylinders being utilized with any degree of frequency. Under DX12, Fiji is like a 12 cylinder engine roaring at full speed.

I wouldn't call the Fiji GPU "Slow"... it only appears that way right now.


----------



## KarathKasun

Also, Fury's low memory clocks hurt its memory access latency lots. This is partly why it does not do well in 1080 and why overclocking the ram helps so much. Access latency is still important.

A 1Ghz two stack (2048 bit) setup would be more ideal but that would not give you enough memory for a high end card with HBM1.


----------



## Anna Torrent

Quote:


> Originally Posted by *Mahigan*
> 
> Mis-information for the most part. There were some issues with the way the test was programmed at first. They didn't optimize it for GCN, now they're working on it.
> 
> Move to the last few pages of the Beyond3D thread: https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-32
> 
> That initial 50ms latency has now dropped to 28ms. The more threads they'll push, per batch, the more it will drop. They're also concerned that they're only using a single ACE right now.
> 
> I also think that they didn't take into consideration that Maxwell 2 can only execute a different task at the end of a draw call boundary, but I may be wrong (Software Scheduling issue and AWS lacking error checking capabilities).
> 
> Either way, nVIDIAs driver apparently did not function... therefore the testing on Maxwell was also without merit.
> 
> Like I keep saying at this point... wait until nVIDIA fix their drivers


How do we know that M2 can execute a different task only at the end of a draw call boundary?
Is it a software issue? - sorry, I didn't get you there
Quote:


> Originally Posted by *Mahigan*
> 
> I'm hopeful. I don't like it when one company controls too much of a given market.


And two companies is better? it's a bit of game they are playing while we are paying the price (not that I compare it to people that really pay price like with their lives, but stll)


----------



## Stewox

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And guess what, Developers WILL keep using DirectX 10 and 11 too. Why? Because DirectX 12 is only for Windows 10.
> 
> No game developer is going to slash their own throats by making a Windows 10 ONLY game less than 1 year after the launch of Windows 10. There are still a ton of people, especially after the who Windows 10 "snooping on users" thing, that will stay on Windows 7 (and a tiny fraction who will stay on Windows 8.1 ... no one will stay on Windows 8, and those that do, who cares, they are obviously really dumb to not go to at least 8.1 hehehe).


That's why this was started > http://forums.mydigitallife.info/threads/62168-Windows-10-to-Windows-7-Behavior-GUI-plus-tweaks-registery-configs-documentation?s=fdb1a0300e0807b774df46217953d289 and it's not getting the interest as I hope for, but I wanted this started early, maybe you can join the "team" (doesn't exist yet)

EDIT: Sorry you have to login to view certain content.

most people who got win10 started modding it on their own the tweaks one by one, since the practical discovery proces but I'm too paranoid to plug the ethernet cable in unless I first disable a few key connectivity things like telemetry and other cortana BS then i'll start modding all the rest , i already have win10 installed on another HDD ready to start tweaking ... it is a massive job to do for one guy so i don't have the motivation even tho I try, I got into this hugely in win7, but i'm kind of bored to do that every time I reinstall an OS that's why i've been avoiding reinstalls so much since I don't use windows update, I only get the updated version when i reinstall a new OS, and that happens when I change some core hardware or move to a new PC, this is my second win7 install, it's on a modern PC with UEFI bios, i'm on it since early 2013.

You have a point there but that's microsoft's limitation and also, that's more of a thing that developers can't control so you can't use, that's circumstances of this stupid reality, still the DX11 and DX10 would not be meant from those devs as their main DX but for compatability, i was more talking about those who exclusively complain they have to do more tweaking to be competitive with other developers only in the DX12 arena.


----------



## Mahigan

Quote:


> Originally Posted by *Anna Torrent*
> 
> How do we know that M2 can execute a different task only at the end of a draw call boundary?
> Is it a software issue? - sorry, I didn't get you there
> And two companies is better? it's a bit of game they are playing while we are paying the price (not that I compare it to people that really pay price like with their lives, but stll)


We know because nVIDIA mentioned it to developers (Occulus Rift):
http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing#post_24385652

As for two companies, that creates an Oligarchy instead of a monopoly. Generally speaking, Oligarchies exhibit a sort of "Game Theory" profit sharing strategy. While this is also bad for consumers, it's a degree better than a monopoly.

I'd like to see more competition in the GPU market space, but I just don't see any competitors who can tackle AMD and nVIDIA (except maybe Intel but they're a long way off).


----------



## Paul17041993

So many posts to quote, such little time, I think I remember why I retired from these forums last year...


----------



## PostalTwinkie

Quote:


> Originally Posted by *Vesku*
> 
> I think AMD needs Pascal to be only partially improved just to get back to a 40/60 or 50/50 AMD/Nvidia marketshare. That's my impression since AMD hasn't even been able to sell off it's 290(X) stock priced under $300 for new warrantied cards.


If Pascal doesn't handle compute right, and AMD is, and the industry shifts. Well, that could give AMD the traction they need to get back to a position like this. It could also mean we see a quick advancement in GPU technology as well, as Nvidia tries to respond!










We might have ourselves another real GPU fight!!


----------



## HeavyUser

Quote:


> Originally Posted by *Paul17041993*
> 
> So many posts to quote, such little time, I think I remember why I retired from these forums last year...


+Rep


----------



## Forceman

Quote:


> Originally Posted by *Mahigan*
> 
> Under DX11, Fiji is like a 12 cylinder engine with only 1 or 2 cylinders being utilized with any degree of frequency.


I'd say it's using more than 1 or 2 cylinders - I doubt it'll be 6 times faster in DX12. More like one of those engines that shuts a cylinder or two down at highway speed.


----------



## NoirWolf

Quote:


> Originally Posted by *Forceman*
> 
> I'd say it's using more than 1 or 2 cylinders - I doubt it'll be 6 times faster in DX12. More like one of those engines that shuts a cylinder or two down at highway speed.


I think you really need to learn how internal combustion engines work (hint: power isn't a linear dependency on cylinder number so a more correct analogy, for you both, is a 12 cyl that isn't using more than 6 to their fullest potential).


----------



## SpeedyVT

I think people fail to see how much marketing can maintain and keep sales going even if a company is terrible or good.

Color theory states that a professional business is often seen as blue or similar in color ie green. More consumers were to choose a blue brand name over any other color in most cases.

Now I want you to imagine nVidia's logo red and AMD's logo green. Forget AMD and nVidia's previous colors. If you knew nothing about electronics which would you buy?

And a tidbit as to why Google has been so successful was because they used many colors which means it signifies many points of interest, variety. A search engine with variety is always better than one without. This also highlights Window's success when it had the variety of color on it's symbol, an operating system for all sorts of users.


----------



## Anna Torrent

Just thinking about Glide.. 15 years before Mantle


----------



## Forceman

Quote:


> Originally Posted by *NoirWolf*
> 
> I think you really need to learn how internal combustion engines work (hint: power isn't a linear dependency on cylinder number so a more correct analogy, for you both, is a 12 cyl that isn't using more than 6 to their fullest potential).


Sorry, I'll make sure I thoroughly research my next tongue-in-cheek analogy.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *SpeedyVT*
> 
> I think people fail to see how much marketing can maintain and keep sales going even if a company is terrible or good.
> 
> Color theory states that a professional business is often seen as blue or similar in color ie green. More consumers were to choose a blue brand name over any other color in most cases.
> 
> Now I want you to imagine nVidia's logo red and AMD's logo green. Forget AMD and nVidia's previous colors. If you knew nothing about electronics which would you buy?
> 
> And a tidbit as to why Google has been so successful was because they used many colors which means it signifies many points of interest, variety. A search engine with variety is always better than one without. This also highlights Window's success when it had the variety of color on it's symbol, an operating system for all sorts of users.


Yes, color is powerful, but the reality is, Red is a stronger marketing color than blue, and especially green.

McDonald's, Coke, KFC, ESPN, Lego, Kellogg, YouTube, Honda, Canon, Mitsubishi, Virgin, Red Bull, Toyota, Thoshiba, Sharp, CNN, Life ... that are all either primarily or totally red logos ... and arguably the most brand recognized and popular brands on the planet.

So AMD should be among them since they are red.

Compare that to Green, what do we have?

BP, John Deere, Mereck, Land Rover, Starbucks, Heineken, Whole Foods, Hulu. All good companies, but no where near the power of "red".

I'm just saying.


----------



## FastEddieNYC

Looking at the big picture It will benefit everyone if AMD does have superior DX12 performance with this generation cards and hopefully the next. Competition spurs innovation and keeps the prices in line. At 82% Nvidia's dominance is only great for the shareholders. Nvidia makes great products but I'm hoping that AMD's Asynchronous performance helps in real games so they regain market share. Then all enthusiasts benefit.


----------



## semitope

Quote:


> Originally Posted by *FastEddieNYC*
> 
> Looking at the big picture It will benefit everyone if AMD does have superior DX12 performance with this generation cards and hopefully the next. Competition spurs innovation and keeps the prices in line. At 82% Nvidia's dominance is only great for the shareholders. Nvidia makes great products but I'm hoping that AMD's Asynchronous performance helps in real games so they regain market share. Then all enthusiasts benefit.


nvidia is not at 82%. these numbers are quarterly figures for quarters amd had no new products in. they do need too change the quarterly trend


----------



## SpeedyVT

Quote:


> Originally Posted by *semitope*
> 
> nvidia is not at 82%. these numbers are quarterly figures for quarters amd had no new products in. they do need too change the quarterly trend


If they want proper statistics they should go to steam's hardware survey.


----------



## SpeedyVT

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Yes, color is powerful, but the reality is, Red is a stronger marketing color than blue, and especially green.
> 
> McDonald's, Coke, KFC, ESPN, Lego, Kellogg, YouTube, Honda, Canon, Mitsubishi, Virgin, Red Bull, Toyota, Thoshiba, Sharp, CNN, Life ... that are all either primarily or totally red logos ... and arguably the most brand recognized and popular brands on the planet.
> 
> So AMD should be among them since they are red.
> 
> Compare that to Green, what do we have?
> 
> BP, John Deere, Mereck, Land Rover, Starbucks, Heineken, Whole Foods, Hulu. All good companies, but no where near the power of "red".
> 
> I'm just saying.


You're incredibly right but in markets the user knows little about the consumer often chooses aesthetically.

Being a builder for most of my life a customer doesn't actually care what product is in their computer. Half are not even certain of the processor's ability in their system. They over spend expecting they've got a golden egg. GPU or CPU same story. Long as it does what they want they don't care.

So once they see a logo they just judge it purely by that image.

Second from what they've heard. If there name is hard to pronounce it's impossible to secure product awareness. Doesn't help that AMD shares acronym with a serious disease.

Thirdly word of mouth.


----------



## Casey Ryback

Quote:


> Originally Posted by *SpeedyVT*
> 
> If they want proper statistics they should go to steam's hardware survey.


But Nvidia buyers upgrade more often, people that buy AMD buy cards to last 2 years +

A lot of nvidia buyers just buy the latest benchmark winner for epeen on forums such as this one, they probably don't even play games and just run benchmarks and look at their cards through the window on their case.









So even if one user on steam has a 980ti for example, chances are they've bought a 670/680 and a 780ti before that.

Nvidia know how to get repeat customers.

One hardware tactic is to make sure memory bandwidth and memory amount is as low as possible whilst being adequate for the current games being benchmarked.


----------



## provost

Quote:


> Originally Posted by *Casey Ryback*
> 
> But Nvidia buyers upgrade more often, people that buy AMD buy cards to last 2 years +
> 
> A lot of nvidia buyers just buy the latest benchmark winner for epeen on forums such as this one, they probably don't even play games and just run benchmarks and look at their cards through the window on their case.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> So even if one user on steam has a 980ti for example, chances are they've bought a 670/680 and a 780ti before that.
> 
> Nvidia know how to get repeat customers.
> 
> One hardware tactic is to make sure memory bandwidth and memory amount is as low as possible whilst being adequate for the current games being benchmarked.


So, what's wrong with any of the above; I rarely game, I occasionally benchmark, I happen to like having a window for my pc cases.... Lol

But, I don't do it for epeen....


----------



## Casey Ryback

Quote:


> Originally Posted by *provost*
> 
> So, what's wrong with any of the above; I rarely game, I occasionally benchmark, I happen to like having a window for my pc cases.... Lol
> 
> But, I don't do it for epeen....


But you don't buy the latest and greatest GPU do you?


----------



## provost

Quote:


> Originally Posted by *Casey Ryback*
> 
> But you don't buy the latest and greatest GPU do you?


Well, not counting the Maxwell latest greatest.... hmmmm... Lol

I think I am going to plead the fifth and sneak out the back door, before I dig a deeper hole for myself ....


----------



## Forceman

Quote:


> Originally Posted by *Casey Ryback*
> 
> Nvidia know how to get repeat customers.
> 
> One hardware tactic is to make sure memory bandwidth and memory amount is as low as possible whilst being adequate for the current games being benchmarked.


Or, you know, maybe people are just happy with their purchase and want to be repeat buyers. Repeat business doesn't have to be a trick, or a marketing scam.


----------



## Casey Ryback

Quote:


> Originally Posted by *Forceman*
> 
> Or, you know, maybe people are just happy with their purchase and want to be repeat buyers. Repeat business doesn't have to be a trick, or a marketing scam.


Well of course they make good products that perform, I thought that kinda goes without saying..............

There are other tactics that keep buyers on an upgrade cycle though too, like the one I mentioned.

I even think AMD has gone this way with the 380 card, it only has 2GB vram, and then if you want a 4GB version you pay more. They have also dumped the 3GB 280/280X from it's rebrands.

The 980ti with 6GB vram seems perfect when you consider sli users and high resolutions, that kind of setup should last a long time when it comes to max settings/AA etc.


----------



## raghu78

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Bingo.
> 
> Not to mention, if game developers are going to have to sit there and optimize every game for best DirectX 12 performance, like they did with DirectX 11 or 10 or 9, let's remember who has the marketshare and money to entice developers to do that and for what card. DirectX 12 was supposed to be a solution that was hardware agnostic. If developers have to spend time and effort into "tweaking" for each game, we all know who will win this war ... nVidia.
> 
> Developers won't gimp their games in favor of a company that only has 20% marketshare. They will go after the 80% so they can make more money for themselves. If the game is crap for 80% of the people, people won't buy it. Oxide is a business, they aren't a charity.


This 80% market share does not mean 80% of installed PC GPU base. In Q2 2014 AMD had 38% GPU market share. Traditionally the market share of AMD has been 35% and Nvidia 65%. In the last 4 quarters Nvidia have done very well, thanks to Maxwell, and have increased market share and now stand at 80%. We still don't know how AMD will fare in H2 2015 and H1 2016. I am not very optimistic of AMD gaining share but we will have to see. The most important thing you keep forgetting is game developers today develop for the consoles. Even the biggest AAA titles on PC are console ports. AMD has 100% of next gen consoles - Sony PS4 and Microsoft Xbox One (AMD GCN based) and Nintendo Wii U (AMD R600/R700 based). Even Nintendo NX is likely to sport a GCN based APU, though not yet confirmed. AMD already has confirmed a semi-custom gaming win which will start generating revenue from late 2016/2017 time frame and the Nintendo NX fits that timeframe very well.

http://www.extremetech.com/gaming/210243-is-amd-designing-nintendos-next-generation-nx-console

So your reasoning that developers cater to Nvidia because they hold 80% market share( which is not entirely true other than for a specific quarter) does not work at all.


----------



## SpeedyVT

Quote:


> Originally Posted by *raghu78*
> 
> This 80% market share does not mean 80% of installed PC GPU base. In Q2 2014 AMD had 38% GPU market share. Traditionally the market share of AMD has been 35% and Nvidia 65%. In the last 4 quarters Nvidia have done very well, thanks to Maxwell, and have increased market share and now stand at 80%. We still don't know how AMD will fare in H2 2015 and H1 2016. I am not very optimistic of AMD gaining share but we will have to see. The most important thing you keep forgetting is game developers today develop for the consoles. Even the biggest AAA titles on PC are console ports. AMD has 100% of next gen consoles - Sony PS4 and Microsoft Xbox One (AMD GCN based) and Nintendo Wii U (AMD R600/R700 based). Even Nintendo NX is likely to sport a GCN based APU, though not yet confirmed. AMD already has confirmed a semi-custom gaming win which will start generating revenue from late 2016/2017 time frame and the Nintendo NX fits that timeframe very well.
> 
> http://www.extremetech.com/gaming/210243-is-amd-designing-nintendos-next-generation-nx-console
> 
> So your reasoning that developers cater to Nvidia because they hold 80% market share( which is not entirely true other than for a specific quarter) does not work at all.


Between the 30 million GCN based consoles in circulation and the 34-35% in PC gaming AMD holds the most demanded architectural design of our time. Whether or not NVidia realizes it.

This only gets better as time progresses and AMD introduces HBM which will motivate even smaller than slim consoles.


----------



## semitope

Quote:


> Originally Posted by *SpeedyVT*
> 
> Between the 30 million GCN based consoles in circulation and the 34-35% in PC gaming AMD holds the most demanded architectural design of our time. Whether or not NVidia realizes it.
> 
> This only gets better as time progresses and AMD introduces HBM which will motivate even smaller than slim consoles.


The market share figures really are meaningless when this is considered. For PC ports, the path of least resistance is to optimize for consoles and bring the same GCN features to PC. Nvidia is going to have to do extra work to exploit their own hardware if anything. nobody is going to go out of their way to cater to them while leaving GCN behind. In fact that would involve turning things off from the consoles version

Their best hope would be PC exclusives.


----------



## Casey Ryback

Quote:


> Originally Posted by *semitope*
> 
> The market share figures really are meaningless when this is considered. For PC ports, the path of least resistance is to optimize for consoles and bring the same GCN features to PC. Nvidia is going to have to do extra work to exploit their own hardware if anything. nobody is going to go out of their way to cater to them while leaving GCN behind. In fact that would involve turning things off from the consoles version
> 
> Their best hope would be PC exclusives.


Plus many PC gamers wait for sales and pirate software.

The real money is made on consoles.


----------



## Anxifer

Quote:


> Originally Posted by *Casey Ryback*
> 
> Plus many PC gamers wait for sales and pirate software.
> 
> The real money is made on consoles.


Thats a statement by many Publishers and Developers which is actually not very valid. There have been studies about the spread of pirated software on consoles and pc plattform which have lead to the conclusion that many console owners actually do have pirate software. the count is actually much higher than officially stated.


----------



## provost

Quote:


> Originally Posted by *Casey Ryback*
> 
> Well of course they make good products that perform, I thought that kinda goes without saying..............
> 
> There are other tactics that keep buyers on an upgrade cycle though too, like the one I mentioned.
> 
> I even think AMD has gone this way with the 380 card, it only has 2GB vram, and then if you want a 4GB version you pay more. They have also dumped the 3GB 280/280X from it's rebrands.
> 
> The 980ti with 6GB vram seems perfect when you consider sli users and high resolutions, that kind of setup should last a long time when it comes to max settings/AA etc.


Well, part of me does wonder if I had chosen AMD a while back, would I have to be thinking about "planned obsolescence" on three different generational Nvidia cards on 28nm. I also don't believe that the perceived advantage of Nvidia drivers over AMD story holds true any longer. In fact, the opposite may be more accurate.

So, quite frankly, if I had a bought a AMD card, I probably would not need to come to this forum to check what the heck is going on with my NV cards.... may be this is too extreme an example (probably it is), but the point being, I am not willing to put a bet on any Nvidia cards for any generation, as Nvidia is too well run as a company, if that makes any twisted sense...lol
And, If it is the betting thrill that I am looking for, there are other ways to satiate the appetite...









/end mindless post..


----------



## NoirWolf

Quote:


> Originally Posted by *Anxifer*
> 
> Thats a statement by many Publishers and Developers which is actually not very valid. There have been studies about the spread of pirated software on consoles and pc plattform which have lead to the conclusion that many console owners actually do have pirate software. the count is actually much higher than officially stated.


Depends on what studies and how it counts things. These days it wouldn't be weird to see people with a PS4/Xbone and two hacked older gen consoles and if that's the case do you count him as a pirate or a customer? Or both? It is only difficult to get hacked consoles in the first year or so but that also makes the 2nd year console market still better than PC because PC can have pirated and legit games on the same system (only Win 10 on Xbone has issues with hacked games) because people with multiple machines (hacked or not) are more liable to buy a game they like (may not be on launch week but how many people have you heard of buying Skyrim on both older gen consoles?).


----------



## Casey Ryback

Quote:


> Originally Posted by *Anxifer*
> 
> Thats a statement by many Publishers and Developers which is actually not very valid. There have been studies about the spread of pirated software on consoles and pc plattform which have lead to the conclusion that many console owners actually do have pirate software. the count is actually much higher than officially stated.


It may get exaggerated but it's still valid.

A lot of console gamers are the kind of people that just buy things for full retail, they aren't aware of cheap game keys or pirating software either.


----------



## SpeedyVT

Quote:


> Originally Posted by *Casey Ryback*
> 
> It may get exaggerated but it's still valid.
> 
> A lot of console gamers are the kind of people that just buy things for full retail, they aren't aware of cheap game keys or pirating software either.


More like they also enjoy the controlled platform. No dabbling with drivers, cleaning viruses out or even bothering to tweak settings. A console is a controlled environment that guarantees a unified player experience. I love PC gaming, but I'm not going to be stupid and deny everything I just said. Of course there are also some console viruses or even other stuff, but it doesn't change the fact it's controlled.


----------



## Casey Ryback

Quote:


> Originally Posted by *SpeedyVT*
> 
> More like they also enjoy the controlled platform. No dabbling with drivers, cleaning viruses out or even bothering to tweak settings. A console is a controlled environment that guarantees a unified player experience. I love PC gaming, but I'm not going to be stupid and deny everything I just said. Of course there are also some console viruses or even other stuff, but it doesn't change the fact it's controlled.


Consoles serve their purpose very well, so easy to use, can relax on the couch.

I own an xbone and it's performance is pretty decent honestly.

Pity about the expensive software, somewhat lack of titles and the xbox live costs, but then again it was pretty cheap to buy it with a bunch of games.


----------



## thomjak

Quote:


> Originally Posted by *Stewox*
> 
> That's why this was started > http://forums.mydigitallife.info/threads/62168-Windows-10-to-Windows-7-Behavior-GUI-plus-tweaks-registery-configs-documentation?s=fdb1a0300e0807b774df46217953d289 and it's not getting the interest as I hope for, but I wanted this started early, maybe you can join the "team" (doesn't exist yet)
> 
> EDIT: Sorry you have to login to view certain content.
> 
> most people who got win10 started modding it on their own the tweaks one by one, since the practical discovery proces but I'm too paranoid to plug the ethernet cable in unless I first disable a few key connectivity things like telemetry and other cortana BS then i'll start modding all the rest , i already have win10 installed on another HDD ready to start tweaking ... it is a massive job to do for one guy so i don't have the motivation even tho I try, I got into this hugely in win7, but i'm kind of bored to do that every time I reinstall an OS that's why i've been avoiding reinstalls so much since I don't use windows update, I only get the updated version when i reinstall a new OS, and that happens when I change some core hardware or move to a new PC, this is my second win7 install, it's on a modern PC with UEFI bios, i'm on it since early 2013.
> 
> You have a point there but that's microsoft's limitation and also, that's more of a thing that developers can't control so you can't use, that's circumstances of this stupid reality, still the DX11 and DX10 would not be meant from those devs as their main DX but for compatability, i was more talking about those who exclusively complain they have to do more tweaking to be competitive with other developers only in the DX12 arena.


Why don't you just image a tweaked windows 7 so you don't have to go thru the same process again if you reinstall?


----------



## Stewox

Quote:


> Originally Posted by *thomjak*
> 
> Why don't you just image a tweaked windows 7 so you don't have to go thru the same process again if you reinstall?


I thought about that a day ago, it would involve preparing the win sysprep utility, I have recently recalled about that feature, which might actually be way more practical, and much less work, if it does prove to be a better alternative.

If it's going to work fine on almost any hardware then it's golden


----------



## Glottis

Quote:


> Originally Posted by *Themisseble*
> 
> With Dx12 AMD is gaining rep back..


isn't it a bit early to make such outrageous claims? maybe we should wait for about 10 major DX12 games and see how they run on AMD and Nvidia GPUs before deciding what is performing better in DX12. also, might I remind you that 980Ti is going toe to toe with FuryX in Ashes benchmark? so with exception of 290X we don't really see any big gains on AMD cards vs Nvidia cards in Ashes benchmark.


----------



## GorillaSceptre

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *Mahigan*
> 
> While it is too early for outrageous claims... take note of what Kollock stated:
> As it stands, only 20% (not 30% as I had stated previously) of the graphics pipeline occurs in compute shaders. They're projecting this to be more than 50% on the next iteration of their engine. This is on par with the various talks, I've seen, from developers. There appears to be a move towards Compute rather than Graphics. As it stands, in Ashes of the Singularity, I am quite certain that what is holding back the Fury-X has to do with the Gtris/s rate of that card (this is further compounded by the shading and terrain shading samples as explained in the R9 290x presentation below).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> 
> This iteration of the Nitrous engine has 20% of its graphics pipeline occurring in compute shaders, that's without the Post Processing effects being taken into account. My take is that the Fury-X will begin to shine in quite a few DX12 titles, where, the graphics pipeline will be occurring, to a larger degree, in compute shaders. Therefore, once again, we have a Graphics card which is likely to have a long life. Lasting for several years more than the competition. Therefore like a Hawaii based 290 series, we're likely to see Fury/X/Nano performing well for years to come. These cards have the available compute resources to be quite competitive down the road as well as quite competitive once the first DX12 titles begin to hit. That's my take at least.






Yup, the Fury X is a 8.6 Tflop monster. Hopefully that is the way of the future.

There's been so much wasted potential on the PC side of thing's, about time some changes started happening.


----------



## Olivon

Quote:


> Originally Posted by *Casey Ryback*
> 
> The real money is made on consoles.


Is it a good choice ? I mean margins are very low in the consoles world and AMD talked about double-digit % but in the low range (10-20%).
And when you know that AMD is now earning more money with consoles (semi-custom branch) than anything else
(Source), it's easy to understand why times are difficult.
Dominated on the CPU and GPU fronts, the consoles card is a good one, but it's really difficult to gain big profit on it and it's not sufficient to feed the family.


----------



## 47 Knucklehead

When people say "The real money is on Consoles", they mean the XBox One. Even though the PS4 uses mostly the same hardware, it uses a totally different OS and DirectX 12 is irrelevant for it. The only Console that will benefit from DirectX 12 at all is Microsofts, which is getting hammered in market.


----------



## delboy67

Quote:


> Originally Posted by *Olivon*
> 
> Is it a good choice ? I mean margins are very low in the consoles world and AMD talked about double-digit % but in the low range (10-20%).
> And when you know that AMD is now earning more money with consoles (semi-custom branch) than anything else
> (Source), it's easy to understand why times are difficult.
> Dominated on the CPU and GPU fronts, the consoles card is a good one, but it's really difficult to gain big profit on it and it's not sufficient to feed the family.


Like I said years ago







this thread is the exact reason amd undercut everyone for the consoles, it was never to make money on any scale selling apus.


----------



## wierdo124

Thread cleaned. Keep it on topic.


----------



## Vesku

Quote:


> Originally Posted by *PostalTwinkie*
> 
> Yet now we have Oxide recanting that statement, and working with Nvidia to fix what is actually broken.


Can you cite this? AFAIK, Oxide developer mentions Maxwell 2 driver said it supported Async Compute but it did not work as expected so they disabled it. Then after this blew up on the internet Nvidia confirmed to Oxide, not to the public, it wasn't currently working on Maxwell 2 and that they will look into fixing it. If there is some statement from Oxide or Oxide dev recanting the initial claim please link to it.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Vesku*
> 
> Can you cite this? AFAIK, Oxide developer mentions Maxwell 2 driver said it supported Async Compute but it did not work as expected so they disabled it. Then after this blew up on the internet Nvidia confirmed it wasn't currently working on Maxwell 2 and that they will look into fixing it. If there is some statement from Oxide or Oxide dev recanting the initial claim please link to it.


NVIDIA Will Fully Implement Async Compute Via Driver Support
Quote:


> And they've got Oxide from Ashes of Singularity to confirm that. Oxide's developer "Kollock" wrote that NVIDIA has not fully implemented yet Async Compute in its driver, Oxide is working closely with them in order to achieve that.
> 
> "We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was. We are working closely with them as they fully implement Async Compute. We'll keep everyone posted as we learn more."


----------



## Vesku

Quote:


> Originally Posted by *47 Knucklehead*
> 
> NVIDIA Will Fully Implement Async Compute Via Driver Support


There is no recanting, though. Oxide said Async Compute didn't work right on Maxwell 2 even though it reported via driver that the feature was supported. Nvidia then confirms to Oxide that it's not currently working but says they will look into getting it to work.

Where is the basis for "Oxide recants" "Oxide was wrong"?


----------



## Serios

Quote:


> Originally Posted by *47 Knucklehead*
> 
> NVIDIA Will Fully Implement Async Compute Via Driver Support


First Post
http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1200#post_24356995
Quote:


> "There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because *Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally."*


Quote:


> "*Personally, I think one could just as easily make the claim that we were biased toward Nvidia* as the only 'vendor' specific code is for Nvidia where we had to shutdown async compute. By vendor specific, I mean a case where we look at the Vendor ID and make changes to our rendering path. *Curiously, their driver reported this feature was functional but attempting to use it was an unmitigated disaster* in terms of performance and conformance so we shut it down on their hardware. *As far as I know, Maxwell doesn't really have Async Compute so I don't know why their driver was trying to expose that*. The only other thing that is different between them is that Nvidia does fall into Tier 2 class binding hardware instead of Tier 3 like AMD which requires a little bit more CPU overhead in D3D12, but I don't think it ended up being very significant. This isn't a vendor specific path, as it's responding to capabilities the driver reports."


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Vesku*
> 
> There is no recanting, though. Oxide said Async Compute didn't work right on Maxwell 2 even though it reported via driver that the feature was supported. Nvidia then confirms to Oxide that it's not currently working but says they will look into getting it to work.
> 
> Where is the basis for "Oxide recants" "Oxide was wrong"?


I'm not going to speak for another user about his choice of words, in this case, "recant", but let's get real here. Obviously Oxide and nVidia have talked and it is obvious that nVidia set Oxide straight on what their hardware is capable of. Now given the fact that Oxide now said "We are working closely with them as they fully implement Async Compute." that means to just about anyone who uses English as their primary language, that the hardware is in place and nVidia just needs to work with Oxide on a driver the way they like it so that it will be enabled.

Now if Oxide had said "We actually just chatted with Nvidia about Async Compute" and that there is no way that it can do it. Then you would have a point. I mean if there is nothing to work with, what is Oxide going to "working closely with them" if there is nothing to do?

But the reality is, that because Oxide said that nVidia couldn't do Async Compute and now they are saying they are working with nVidia to fully implement it, I don't see that the other posters use of the term "recant" is all that far off.

Recant:
verb (used with object)
1. to withdraw or disavow (a statement, opinion, etc.), especially formally; retract.


----------



## Mahigan

Quote:


> Originally Posted by *47 Knucklehead*
> 
> NVIDIA Will Fully Implement Async Compute Via Driver Support


The pressure we put on nVIDIA is what allowed for a change. We kinda created a media storm, forcing nVIDIA to fix the issue. I'd say this is a win for nVIDIA customers









We, as in the PC Gaming community.


----------



## Vesku

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Obviously Oxide and nVidia have talked and it is obvious that nVidia set Oxide straight on what their hardware is capable of. Now given the fact that Oxide now said "We are working closely with them as they fully implement Async Compute." that means to just about anyone who uses English as their primary language, that the hardware is in place and nVidia just needs to work with Oxide on a driver the way they like it so that it will be enabled.


No it doesn't mean the "fix" will be hardware based. It simply means what it says: Nvidia has confirmed with the developer that the feature they advertised when selling their GPUs and said was available via the driver is not currently working and they will now try to remedy that. Nvidia is not guaranteeing any fix publicly it is not saying it will work great if and when it is available. Nvidia has simply told a game developer, after some mild statements blew up on enthusiast sites, that they will work to get this advertised feature functional for this developer.

Let's be clear even if Nvidia fails to enable Async Compute in any performance meaningful manner they will not have lied in regards to there discussion with Oxide. Just as Oxide didn't lie when they said the feature wasn't currently working on Maxwell 2.


----------



## Mahigan

Why is everyone arguing?

Nobody recanted any statements. The Statements made were based on the information, as it stood, at that point in time. For Kollock, the nVIDIA driver was reporting a feature he could not get to work. When he brought this to the attention of nVIDIA, they "pressured" him to disable the Asynchronous Compute feature (for both AMD and nVIDIA hardware) according to his words. He refused and instead worked with nVIDIA to implement a work around by using a Vendor ID specific path. From Kollock's perspective, as far as he knew, Maxwell 2 couldn't do Async Compute. That was from his perspective. I mean think about it... rather than work with Oxide to get the issue resolved... nVIDIA worked with Oxide in order to implement a work around. It's quite reasonable to assume that for Kollock... this appeared to indicate a lack of support for the feature.

Media storm was created... people began discussing this topic across the web.

nVIDIA re-approached Oxide and is now willing to get the issue resolved. This is a win for nVIDIA customers. Everyone ought to be happy about this turn of events. I don't see why this is leading to people getting angry towards one another. I'd say CONGRATS







You all did a fantastic job getting the message out.

As for the performance benefits... wait and see.

For now, we can at least be sure that nVIDIA will support Asynchronous Compute. While there are differences between their implementation, and that of AMDs, this will ensure that all other developers will be able to implement the feature without worrying about nVIDIAs "lack of driver support".

Celebrate







Don't hate.


----------



## Themisseble

Quote:


> Originally Posted by *Vesku*
> 
> No it doesn't mean the "fix" will be hardware based. It simply means what it says: Nvidia has confirmed with the developer that the feature they advertised when selling their GPUs and said was available via the driver is not currently working and they will now try to remedy that. Nvidia is not guaranteeing any fix publicly it is not saying it will work great, simply telling a game developer that they will work to get this advertised feature functional for this developer.


Sarcasm:
You just dont understand do you? Nvdia will "magically" change hardware via software. So NVIDIA will fully support async shaders.

Anyway this is good news for XboX and PS4.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> The pressure we put on nVIDIA is what allowed for a change. We kinda created a media storm, forcing nVIDIA to fix the issue. I'd say this is a win for nVIDIA customers
> 
> 
> 
> 
> 
> 
> 
> 
> 
> We, as in the PC Gaming community.


I agree. I was mainly addressing the people who seem to think that nVidia isn't going to be able to do this.

I mean seriously, if there is nothing there for nVidia to do Async Compute in their cards, then why would Oxide bother to work with them closely?

That's all I'm saying.

Async Compute, like all features, is a benefit, including the features that AMD/GCN doesn't have but nVidia does. The more options and features that can be used, the better. The larger the market (ie customers) that can use those features, the better.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Vesku*
> 
> No it doesn't mean the "fix" will be hardware based.


Did I say it was hardware? No, I didn't. Thank you very much. If you bothered to read what has already been posted, some parts are hardware and are already there.

Quote:


> Originally Posted by *Themisseble*
> 
> Sarcasm:
> You just dont understand do you? Nvdia will "magically" change hardware via software. So NVIDIA will fully support async shaders.
> 
> Anyway this is good news for XboX and PS4.


This.

Apparently nVidia and Oxide are magical. They can wave their software wand and poof! Async hardware just appears!


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> Why is everyone arguing?
> 
> Nobody recanted any statements. The Statements made were based on the information, as it stood, at that point in time. For Kollock, the nVIDIA driver was reporting a feature he could not get to work. When he brought this to the attention of nVIDIA, they "pressured" him to disable the Asynchronous Compute feature (for both AMD and nVIDIA hardware) according to his words. He refused and instead worked with nVIDIA to implement a work around by using a Vendor ID specific path. From Kollock's perspective, as far as he knew, Maxwell 2 couldn't do Async Compute. That was from his perspective. I mean think about it... rather than work with Oxide to get the issue resolved... nVIDIA worked with Oxide in order to implement a work around. It's quite reasonable to assume that for Kollock... this appeared to indicate a lack of support for the feature.
> 
> Media storm was created... people began discussing this topic across the web.


Maybe next time, instead of jumping off half cocked and starting a media storm, Kollock could call up nVidia BEFORE going off half cocked and ask them a question, instead of assuming (wrongly apparently) something. As far as Kollock and his "quite reasonable to assume", well, we all know about making assumptions.

I'm just saying.

And yes, this is a good thing for everyone.


----------



## GorillaSceptre

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Did I say it was hardware? No, I didn't. Thank you very much. If you bothered to read what has already been posted, some parts are hardware and are already there.
> This.
> 
> *Apparently nVidia and Oxide are magical.* They can wave their software wand and poof! Async hardware just appears!


http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing


----------



## Vesku

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Did I say it was hardware? No, I didn't. Thank you very much. If you bothered to read what has already been posted, some parts are hardware and are already there.
> This.
> 
> Apparently nVidia and Oxide are magical. They can wave their software wand and poof! Async hardware just appears!


Quote:


> Originally Posted by *47 Knucklehead*
> 
> that means to just about anyone who uses English as their primary language, *that the hardware is in place and nVidia just needs to work with Oxide* on a driver the way they like it so that it will be enabled.


The hardware can actually be fundamentally incapable of Async Compute or have a bug that slipped through tape out and makes the hardware bits unusable for Async Compute. That wouldn't invalidate Nvidia's statements to Oxide. Nvidia can technically implement a fix by running all Async Compute tasks on the CPU so that their Software Scheduler isn't hindered by PCIe communication and it will check the "Async Compute" feature box even if the features performance is abysmal when compared to a GPU implementation. Just have to wait and see what the 'fix' ends up being.


----------



## Mahigan

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Maybe next time, instead of jumping off half cocked and starting a media storm, Kollock could call up nVidia BEFORE going off half cocked and ask them a question, instead of assuming (wrongly apparently) something. As far as Kollock and his "quite reasonable to assume", well, we all know about making assumptions.
> 
> I'm just saying.
> 
> And yes, this is a good thing for everyone.


I'd say, don't concern yourself too much about how nVIDIA feels or about their reputation. I'd say the same for AMD. What ought to concern you is that because of his statements, not in spite of them, and because of the ensuing media storm... nVIDIA, which was reluctant to fix the issue prior to the statements and was remaining silent on the matter, is now working to fix the issue and implement Asynchronous Compute.

I'd argue that what ought to concern us, as customers, is our own interests. Since we're all PC Gamers, then we have shared interests. We're brothers in arms... not foes. There is no Green Team, there is no Red Team... there's the PC Gaming team... and even then, with the merger of Consoles, dare I say that the "PC Gaming Master Race" or whatever moniker will also fall shortly. We're all consumers and gamers.

What hurts nVIDIA and AMD ought to only concern us if it hurts "US"... collectively. As a community. But that's my perspective.


----------



## FastEddieNYC

Quote:


> Originally Posted by *Mahigan*
> 
> I'd say, don't concern yourself too much about how nVIDIA feels or about their reputation. I'd say the same for AMD. What ought to concern you is that because of his statements, not in spite of them, and because of the ensuing media storm... nVIDIA, which was reluctant to fix the issue prior to the statements and was remaining silent on the matter, is now working to fix the issue and implement Asynchronous Compute.
> 
> I'd argue that what ought to concern us, as customers, is our own interests. Since we're all PC Gamers, then we have shared interests. We're brothers in arms... not foes. There is no Green Team, there is no Red Team... there's the PC Gaming team... and even then, with the merger of Consoles, dare I say that the "PC Gaming Master Race" or whatever moniker will also fall shortly. We're all consumers and gamers.
> 
> What hurts nVIDIA and AMD ought to only concern us if it hurts "US"... collectively. As a community. But that's my perspective.


I totally agree. I own both brands and I trust the marketing info when making a decision what to buy. What concerns me in all this is Nvidia not being completely truthful with us about DX12 support. Technically they didn't lie but there is a difference between full hardware support and software. That information does affect my buying decisions.
We see this behavior with most companies that have a dominant market share. The pressures to maintain and increase margins from investors usually results in the loss of ethical standards.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> I'd say, don't concern yourself too much about how nVIDIA feels or about their reputation. I'd say the same for AMD. What ought to concern you is that because of his statements, not in spite of them, and because of the ensuing media storm... nVIDIA, which was reluctant to fix the issue prior to the statements and was remaining silent on the matter, is now working to fix the issue and implement Asynchronous Compute.
> 
> I'd argue that what ought to concern us, as customers, is our own interests. Since we're all PC Gamers, then we have shared interests. We're brothers in arms... not foes. *There is no Green Team, there is no Red Team*... there's the PC Gaming team... and even then, with the merger of Consoles, dare I say that the *"PC Gaming Master Race" or whatever moniker will also fall shortly.* We're all consumers and gamers.
> 
> What hurts nVIDIA and AMD ought to only concern us if it hurts "US"... collectively. As a community. But that's my perspective.


I get what you are saying, but seriously, you must be new to OCN and the internet.









(I'm just playing, nothing personal.)


----------



## Asmodian

Quote:


> Originally Posted by *FastEddieNYC*
> 
> I totally agree. I own both brands and I trust the marketing info when making a decision what to buy. What concerns me in all this is Nvidia not being completely truthful with us about DX12 support. Technically they didn't lie but there is a difference between full hardware support and software. That information does affect my buying decisions.


AMD also uses software emulation for some DX12 features and Asyc Compute is not a requirement for any tier of DX12.

This is like getting mad at AMD for saying they fully support DX12 when they don't support Conservative Rasterization and Raster Order Views.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> I get what you are saying, but seriously, you must be new to OCN and the internet.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> (I'm just playing, nothing personal.)


Even hereteks can be useful to the Gaben Emperor. That said: If you ***** about anything north of the 7000 series (or 500 series for Nvidia) not having proper DX9 support on multi-GPU configurations... I see the issue with your machine is the CPU not the GPU.
Quote:


> Originally Posted by *Asmodian*
> 
> AMD also uses software emulation for some DX12 features and Asyc Compute is not a requirement for any tier of DX12.
> 
> This is like getting mad at AMD for saying they fully support DX12 when they don't support Conservative Rasterization and Raster Order Views.


That falls under tier 12.1 while Async does under 12.0. And just as a fun fact: They never said fully support, to my knowledge on the subject they only ever said "GCN is compatible with DX12." .


----------



## Mahigan

Quote:


> Originally Posted by *47 Knucklehead*
> 
> I get what you are saying, but seriously, you must be new to OCN and the internet.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> (I'm just playing, nothing personal.)


LOL It's all good









Anyone remember this?
Quote:


> Nvidia's Sr. Vice President, Investor Relations, Mike Hara, has played down the significance of DirectX 11 at the Deutsche Bank Securities Technology Conference. Instead, Mr. Hard insists technologies like CUDA, PhysX and Stereo 3D Vision are the future.
> 
> "DirectX 11 by itself is not going be the defining reason to buy a new GPU. It will be one of the reasons. This is why Microsoft is in work with the industry to allow more freedom and more creativity in how you build content, which is always good, and the new features in DirectX 11 are going to allow people to do that. But that no longer is the only reason, we believe, consumers would want to invest in a GPU," explains Mr. Hara.
> 
> "Now, we know, people are doing a lot in the area of video, people are going to do more and more in the area of photography&#8230; I think that the things we are doing would allow the GPU to be a co-processor to the CPU and deliver better user experience, better battery life and make that computers little bit more optimized."
> 
> It is clear Mr. Hara is pushing CUDA and compute shader performance over gaming performance.
> 
> "Graphics industry, I think, is on the point that microprocessor industry was several years ago, when AMD made the public confession that frequency does not matter anymore and it is more about performance per watt. I think we are the same crossroad with the graphics world: framerate and resolution are nice, but today they are very high and going from 120fps to 125fps is not going to fundamentally change end-user experience. But I think the things that we are doing with Stereo 3D Vision, PhysX, about making the games more immersive, more playable is beyond framerates and resolutions. Nvidia will show with the next-generation GPUs that the compute side is now becoming more important that the graphics side," concluded Mr. Hara.


http://vr-zone.com/articles/nvidia-directx-11-is-not-important/7674.html?doc=7674

Back then ATi(AMD) was pushing their HD5000 series and extolling the virtues of DirectX11. nVIDIA were looking towards a more compute heavy future. Somewhere, along the way (probably Kepler and GCN) the two companies switched roles.

It's when I look back at these kind of statements that I realize I was being played myself. I remember arguing in favor of the HD5000 series (Performance per Watt) only to switch arguments once the HD7900 series was released. It is clear to me that I was under the influence of marketing and PR rather than staying true to any principles. Was ATi right to pursue Graphic over Compute performance back then? Yes. Was nVIDIA right to do the same since Kepler? Yes.

With DX12, what is the correct strategy? It seems to me that nVIDIAs vision (in the comments above) is what is right for DX12. Compute performance overtaking the importance of Graphics performance, appears to be where we're headed with DX12. Probably not right from the get-go... but in a few years time at least.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Asmodian*
> 
> AMD also uses software emulation for some DX12 features and Asyc Compute is not a requirement for any tier of DX12.
> 
> This is like getting mad at AMD for saying they fully support DX12 when they don't support Conservative Rasterization and Raster Order Views.


I thought Async is a point update with DX12? Whereas Conservative Rasterization etc, are features.


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> Even hereteks can be useful to the Gaben Emperor. That said: If you ***** about anything north of the 7000 series (or 500 series for Nvidia) not having proper DX9 support on multi-GPU configurations... I see the issue with your machine is the CPU not the GPU.
> That falls under tier 12.1 while Async does under 12.0. And just as a fun fact: They never said fully support, to my knowledge on the subject they only ever said "GCN is compatible with DX12." .


Async isn't in tier 12.0, Async is an optional feature not required for any tier of DX12.


----------



## NoirWolf

Quote:


> Originally Posted by *Asmodian*
> 
> Async isn't in tier 12.0, Async is an optional feature not required for any tier of DX12.


Re-read what I said. Maybe you'll understand when you stop putting words into my mouth.


----------



## PostalTwinkie

I find it slightly entertaining that people are complaining about one or the other not "truly" or "fully" supporting DX 12, while arguing about what in the Hell DX 12 support really is to begin with!


----------



## NoirWolf

Quote:


> Originally Posted by *PostalTwinkie*
> 
> I find it slightly entertaining that people are complaining about one or the other not "truly" or "fully" supporting DX 12, while arguing about what in the Hell DX 12 support really is to begin with!


If it cannot render a realistic twinkie and show how it melts in your mouth it ain't DX12 compatible.


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> Re-read what I said. Maybe you'll understand when you stop putting words into my mouth.


Maybe I missed something? I was only saying getting mad at Nvidia for misleading statements about fully supporting DX12 when they didn't support Async was as silly as getting mad at AMD for saying they support DX12 when they didn't support Conservative Rasterization and Raster Order Views. Neither is a legitimate complaint.
Quote:


> Originally Posted by *NoirWolf*
> 
> That falls under tier 12.1 while Async does under 12.0.


I don't think I put words in your mouth? Async isn't part of the 12.0 feature level, it isn't part of any feature level. You could implement Async support but only support feature level 11.1 if you wanted to, you could also implement Raster Order Views but only support feature level 11.1.

You can not support Async at all and still have full support for DirectX 12 feature level 12.1.


----------



## Vesku

Does it matter what part of DX12 Async Compute is in when Nvidia has published things like this?

http://international.download.nvidia.com/geforce-com/international/images/nvidia-geforce-gtx-980-ti/nvidia-geforce-gtx-980-ti-directx-12-advanced-api-support.png


----------



## NoirWolf

Quote:


> Originally Posted by *Asmodian*
> 
> Maybe I missed something? I was only saying getting mad at Nvidia for misleading statements about fully supporting DX12 when they didn't support Async was as silly as getting mad at AMD for saying they support DX12 when they didn't support Conservative Rasterization and Raster Order Views. Neither is a legitimate complaint.
> I don't think I put words in your mouth? Async isn't part of the 12.0 feature level, it isn't part of any feature level. You could implement Async support but only support feature level 11.1 if you wanted to, you could also implement Raster Order Views but only support feature level 11.1.
> 
> You can not support Async at all and still have full support for DirectX 12 feature level 12.1.


Again there's a difference between claiming support and claiming compatibility (Nvidia did the former while AMD did the latter) and pray tell how can you do parallelized computing in a serial API?


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> Again there's a difference between claiming support and claiming compatibility (Nvidia did the former while AMD did the latter) and pray tell how can you do parallelized computing in a serial API?


What? Nvidia did claim support and they currently do support everything in DX12 feature level 12_1, as far as we know at least. Async isn't required by any DX12 feature level so you cannot get mad at them for claiming support for DX12 while not supporting it.

I think you are confused between feature levels and DirectX versions? I also should have used an "_" instead of a "." to make this clearer. I assume you don't mean DX12 feature level 11_1 is a serial API? Lets say you have Asyc support but only Resource Binding tier 1, that means you do not support feature level 12_0 but you still support Async. I do not believe support for Async shaders requires Resource Binding tier 2.
Quote:


> Originally Posted by *PostalTwinkie*
> 
> I find it slightly entertaining that people are complaining about one or the other not "truly" or "fully" supporting DX 12, while arguing about what in the Hell DX 12 support really is to begin with!


We have to find something to do while there is no new information.


----------



## NoirWolf

Quote:


> Originally Posted by *Asmodian*
> 
> What? Nvidia did claim support and they currently do support everything in DX12 feature level 12_1, as far as we know at least. Async isn't required by any DX12 feature level so you cannot get mad at them for claiming support for DX12 while not supporting it.


Say hello to this guy's link.
Quote:


> Originally Posted by *Vesku*
> 
> Does it matter what part of DX12 Async Compute is in when Nvidia has published things like this?
> 
> http://international.download.nvidia.com/geforce-com/international/images/nvidia-geforce-gtx-980-ti/nvidia-geforce-gtx-980-ti-directx-12-advanced-api-support.png


Considering you don't see Async compute in DX11.1 compatible games... well yeah. Even a multithreaded monstrosity like Elite Dangerous can't do Async currently.


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> Say hello to this guy's link.
> Considering you don't see Async compute in DX11.1 compatible games... well yeah. Even a multithreaded monstrosity like Elite Dangerous can't do Async currently.


So Nvidia said the DX12 API allows Async compute. That doesn't say they support it.







edit: Ok, Ok, they did and they must fix it so it actually works.

Obviously they still need to get it fixed before a PC game comes out that uses it. Actually, I find it odd that Async support isn't a requirement of DX12 feature level 12_0.

Are you talking about DX11 again?


----------



## NoirWolf

Quote:


> Originally Posted by *Asmodian*
> 
> So Nvidia said the DX12 API allows Async compute. That doesn't say they support it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Obviously they still need to get it fixed before a PC game comes out that uses it. Actually, I find it odd that Async support isn't a requirement of DX12 feature level 12_0.


It isn't but what it depends on is, see below.
Quote:


> Originally Posted by *Asmodian*
> 
> Are you talking about DX11 again?


Are you still trying to call a potato a spud and claim it isn't the same thing?
https://msdn.microsoft.com/en-us/library/windows/desktop/ff476876%28v=vs.85%29.aspx
Feel free to split hairs while being wrong again though.


----------



## Remij

I think it's reaching to say that Nvidia wasn't going to do anything to 'fix' the problem until the community kicked up a storm. I don't believe that at all. It's most likely that Nvidia was taking the time to implement things properly for their hardware and as I said a long time ago, was trying to keep it quiet until that happened.

It's obvious that if something isn't quite working properly that they would ask for it to be disabled for the time being..and thus Nvidia and Oxide worked out another method.

People jumped the gun too early on this topic imo, but the end result will be the same regardless. Now we're likely just going to be stuck theorizing a bit longer than if we would have waited.


----------



## HalGameGuru

Async is not required to support DX12, but it is required to be considered for the Tier 2 or better level of resource binding in DX FL 12_0.

Certain levels of DX12 support DO require it, and its on a pretty foundational level of support.

To be considered as supporting Feature Level DX12_0 you do need to have at least Tier 2 of resource binding, which DOES appear to require, at the least software emulation of, Async Compute.


----------



## NoirWolf

Quote:


> Originally Posted by *Remij*
> 
> I think it's reaching to say that Nvidia wasn't going to do anything to 'fix' the problem until the community kicked up a storm. I don't believe that at all. It's most likely that Nvidia was taking the time to implement things properly for their hardware and as I said a long time ago, was trying to keep it quiet until that happened.
> 
> It's obvious that if something isn't quite working properly that they would ask for it to be disabled for the time being..and thus Nvidia and Oxide worked out another method.
> 
> People jumped the gun too early on this topic imo, but the end result will be the same regardless. Now we're likely just going to be stuck theorizing a bit longer than if we would have waited.


Like they shared the fact that the GTX 970's specifications were wrong? They possibly could have ninja'd in some Async into their drivers before DX12 Async game launches or they could've stamped their feet demanding it be disabled for their GPUs like they did with AotS. Do you know which they would have without this scandal?


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> It isn't but what it depends on is, see below.
> Are you still trying to call a potato a spud and claim it isn't the same thing?
> https://msdn.microsoft.com/en-us/library/windows/desktop/ff476876%28v=vs.85%29.aspx
> Feel free to split hairs while being wrong again though.


I am quite happy to be wrong.









Are you saying DX12 feature level 11_1 is the same thing as DX11 feature level 11_1? A GCN 1.0 part in Windows 10 with DX12 cannot run async compute?

I was under the impression that DX12 feature level 11_0 already allowed the multi-threaded nature of the DX12 API and nothing was stopping anyone from implementing async compute on a 11_0 card (assuming the hardware allowed which I don't think any 11_0 only hardware does). The hardware with the lowest feature level that could run async compute, even with driver assistance, is GCN 1.0. At least that is how I understand it.

That link doesn't say anything about async compute at all, what am I supposed to learn from it? DX11 supports feature level 11_1 but, given that it also supports feature level 12_1 and async isn't part of a feature level, how is that relevant? Feature levels are a collection of the minimum features supported, not the maximum. A DX12 feature level 11_1 card can support more features than required by feature level 11_1 but not less.

So async requires DX12 and every feature level, all of which are also available in DX11.3, wouldn't support it when using DX11. Maybe this is this why async is not part of a feature set: MS wanted all the new feature sets compatible with DX11.3?


----------



## Asmodian

Quote:


> Originally Posted by *HalGameGuru*
> 
> Async is not required to support DX12, but it is required to be considered for the Tier 2 or better level of resource binding in DX FL 12_0.
> 
> Certain levels of DX12 support DO require it, and its on a pretty foundational level of support.
> 
> To be considered as supporting Feature Level DX12_0 you do need to have at least Tier 2 of resource binding, which DOES appear to require, at the least software emulation of, Async Compute.


Do you have a source for this? I cannot find anything that mentions async compute connected to resource binding. They don't seem related at all.


----------



## SpeedyVT

Quote:


> Originally Posted by *Asmodian*
> 
> I am quite happy to be wrong.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Are you saying DX12 feature level 11_1 is the same thing as DX11 feature level 11_1? A GCN 1.0 part in Windows 10 with DX12 cannot run async compute?
> 
> I was under the impression that DX12 feature level 11_0 already allowed the multi-threaded nature of the DX12 API and nothing was stopping anyone from implementing async compute on a 11_0 card (assuming the hardware allowed which I don't think any 11_0 only hardware does). The hardware with the lowest feature level that could run async compute, even with driver assistance, is GCN 1.0. At least that is how I understand it.
> 
> That link doesn't say anything about async compute at all, what am I supposed to learn from it? DX11 supports feature level 11_1 but, given that it also supports feature level 12_1 and async isn't part of a feature level, how is that relevant? Feature levels are a collection of the minimum features supported, not the maximum. A DX12 feature level 11_1 card can support more features than required by feature level 11_1 but not less.
> 
> So async requires DX12 and feature level 12_1, which is also available in DX11.3, wouldn't support it when using DX11. Maybe this is this why async is not part of a feature set: MS wanted all the new feature sets compatible with DX11.3?


Those are not proper relationships between feature level and dx version. The feature level is what categorizes what APIs accessible by a GPU. It is no longer a DX renderer with strict specifications but flexible allowance for different modular standardized feature sets.

This allows GPUs to play to their strengths equally, including Intel(puny).


----------



## Asmodian

Quote:


> Originally Posted by *SpeedyVT*
> 
> Those are not proper relationships between feature level and dx version. The feature level is what categorizes what APIs accessible by a GPU. It is no longer a DX renderer with strict specifications but flexible allowance for different modular standardized feature sets.
> 
> This allows GPUs to play to their strengths equally, including Intel(puny).


Sorry, I have no idea what you mean here.









What are not proper relationships between feature level and DX version?


----------



## HalGameGuru

I am pretty sure the slides have been shared more than once in this thread, let me find them.


It's a little inductive, but so far nothing without it has better than Tier 1, and everything with Tier 2 or better DOES support it. And Tier 2 is required to support Feature Level 12_0


----------



## NoirWolf

Quote:


> Originally Posted by *Asmodian*
> 
> I am quite happy to be wrong.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Are you saying DX12 feature level 11_1 is the same thing as DX11 feature level 11_1? A GCN 1.0 part in Windows 10 with DX12 cannot run async compute?
> 
> I was under the impression that DX12 feature level 11_0 already allowed the multi-threaded nature of the DX12 API and nothing was stopping anyone from implementing async compute on a 11_0 card (assuming the hardware allowed which I don't think any 11_0 only hardware does). The hardware with the lowest feature level that could run async compute, even with driver assistance, is GCN 1.0. At least that is how I understand it.
> 
> That link doesn't say anything about async compute at all, what am I supposed to learn from it? DX11 supports feature level 11_1 but, given that it also supports feature level 12_1 and async isn't part of a feature level, how is that relevant? Feature levels are a collection of the minimum features supported, not the maximum. A DX12 feature level 11_1 card can support more features than required by feature level 11_1 but not less.
> 
> So async requires DX12 and feature level 12_1, which is also available in DX11.3, wouldn't support it when using DX11. Maybe this is this why async is not part of a feature set: MS wanted all the new feature sets compatible with DX11.3?


Considering DX11_1 feature level doesn't support "Volume Tiled Resources" ? And a GCN 1.0 part supports DX 12 Feature level 12_0, don't try to confuse the issue.
If the GPU can support DX12 it supports by default Feature level 12_0 minimum (and every primary feature level under it because they're foundational).
Feature levels are minimum, yes, but my stupid question for you is: Does it not being maximum mean that Feature level 12_0 is also feature level 666_0 ? Ahh there you have it lad: The moment a new feature level is defined the previous one has a defined maximum as well (IE when it meets all the feature requirements of the next one you go to the next one, this is why you can have a GPU fully support feature level 12_0 but not fully support DX12).
DX 11.3 by what I can find on it seems to be DX11 trying to adapt as many parallelized features as it can to a serial system. You still cannot have Async native to the GPU in this API (you could in theory run it like Nvidia will via the driver but that would probably destroy the point of Async).

MS probably knew Nvidia's glass heel before anyone else did (they may indeed have been working together and spotted the issue with Kepler and Maxwell GPU's hardware) and opted not to poke a rather big bear by making FL 12_0 have a mandetory async requirement though all the elements it needs to work are a requirement for that feature level.


----------



## SpeedyVT

Quote:


> Originally Posted by *Asmodian*
> 
> Sorry, I have no idea what you mean here.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What are not proper relationships between feature level and DX version?


https://msdn.microsoft.com/en-us/library/windows/desktop/ff476876%28v=vs.85%29.aspx

Feature levels =/= DX version.

TIER =/= DX version.

Tier determines what features are either emulated or native of the feature levels.


----------



## FastEddieNYC

Being able to do Async is part of the Programing API of DX12. It's not a part of a separate feature that defines a DX level. Being able to control the Resources of the GPU directly instead of going through another software layer is where the gains in performance come from. With DX11 there was only basic Multi-thread support so it was mostly a serial pipeline. Developers now can program parallel workloads the same as they have been doing on the Xbox and Playstation. This is why Hardware Async now matters.


----------



## Asmodian

Quote:


> Originally Posted by *NoirWolf*
> 
> Considering DX11_1 feature level doesn't support "Volume Tiled Resources" ? And a GCN 1.0 part supports DX 12 Feature level 12_0, don't try to confuse the issue.
> If the GPU can support DX12 it supports by default Feature level 12_0 minimum (and every primary feature level under it because they're foundational).
> Feature levels are minimum, yes, but my stupid question for you is: Does it not being maximum mean that Feature level 12_0 is also feature level 666_0 ? Ahh there you have it lad: The moment a new feature level is defined the previous one has a defined maximum as well (IE when it meets all the feature requirements of the next one you go to the next one, this is why you can have a GPU fully support feature level 12_0 but not fully support DX12).
> DX 11.3 by what I can find on it seems to be DX11 trying to adapt as many parallelized features as it can to a serial system. You still cannot have Async native to the GPU in this API (you could in theory run it like Nvidia will via the driver but that would probably destroy the point of Async).
> 
> MS probably knew Nvidia's glass heel before anyone else did (they may indeed have been working together and spotted the issue with Kepler and Maxwell GPU's hardware) and opted not to poke a rather big bear by making FL 12_0 have a mandetory async requirement though all the elements it needs to work are a requirement for that feature level.


GCN 1.0 only supports FL 11_1. You can "support" DX12 but only support feature level 11_0.

I don't understand the maximum comment. You can be only feature level 11_1 compliant but still support a feature from 12_1, you simply have to be missing something else required for FL 12_0.

All the elements required for async compute are not required for FL 12_0 or 12_1, where did you get that?

DX11.3 can support FL 9_3, 10_0, 10_1, 11_0, 11_1, 12_0, 12_1, so only requires FL 9_3.
DX12 can support FL 11_0, 11_1, 12_0, 12_1, so only requires FL 11_0.
Quote:


> Originally Posted by *HalGameGuru*
> 
> It's a little inductive, but so far nothing without it has better than Tier 1, and everything with Tier 2 or better DOES support it. And Tier 2 is required to support Feature Level 12_0


Because the hardware that supports tier 2 resource binding also happens to support async doesn't mean that tear 2 resource binding or FL 12_0 requires async support.


----------



## Kand

Mantle supports Async.

We did not see that big a difference using Mantle against DX11 in games like Battlefield 4 which did use Async.

Why is this? AMD gave up coding a driver for DX11 in Ashes hence the almost miraculous gain with DX12.

Asynchronous Compute is not the miracle that people are thinking it is.


----------



## Xuper

Quote:


> Originally Posted by *Kand*
> 
> Mantle supports Async.
> 
> We did not see that big a difference using Mantle against DX11 in games like Battlefield 4 which did use Async.
> 
> Why is this? AMD gave up coding a driver for DX11 in Ashes hence the almost miraculous gain with DX12.
> 
> Asynchronous Compute is not the miracle that people are thinking it is.


You sure BF4 used it at 100% full potential? you forgot what Oxides said.They said 30% graphics pipeline now they wants to move 50% and at last , 100%.

Read this :

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2320_40#post_24383286

http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/2320_40#post_24383286

Quote:


> Originally Posted by *Kollock*
> 
> Regarding trying to figure out bottlenecks on GPUS, it's important to note that GPUs do not scale simply by adding more cores to it, especially graphics tasks which have alot of serial points. My $.02 is that GCN is a bit triangle limited, which is why you see greater performance on 4k, where the average triangle size is 4x the triangle size of 1080p.
> 
> I think you're also being a bit short-sighted on the possible use of compute for general graphics. It is not limited to post process. *Right now, I estimate about 20% of our graphics pipeline occurs in compute shaders, and we are projecting this to be more then 50% on the next iteration of our engine. In fact, it is even conceivable to build a rendering pipeline entirely in compute shaders. F*or example, there are alternative rendering primitives to triangles which are actually quite feasible in compute. There was a great talk at SIGGRAPH this year on this subject. If someone gave us a card with only compute pipeline, I'd bet we could build an engine around it which would be plenty fast. In fact, this was the main motivating factors behind the Larabee project. The main problem with Larabee wasn't that it wasn't fast, it was that they failed to be able to map dx9 games to it well enough to be a viable product. I'm not saying that the graphics pipeline will disappear anytime soon (or ever), but it's by no means certain that it's necessary. It's quite possible that in 5 years time Nitrous's rendering pipeline is 100% implemented via compute shaders.


----------



## Forceman

Quote:


> Originally Posted by *HalGameGuru*
> 
> Async is not required to support DX12, but it is required to be considered for the Tier 2 or better level of resource binding in DX FL 12_0.


No, it isn't. The Oxide developer very specifically stated that async compute and Resource Binding Tier are two different things.
Quote:


> I think you are confusing a few issues. Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute. It's has to do with the number of root level descriptors we can pass. In tier 3, it turns out we can basically never update a descriptor during a frame, but in tier 2 we sometimes have to build a few . I don't think it's a significant performance issue though, just a techinical detail.


http://www.overclock.net/t/1569897/various-ashes-of-the-singularity-dx12-benchmarks/1380#post_24360916


----------



## SpeedyVT

Quote:


> Originally Posted by *Kand*
> 
> Mantle supports Async.
> 
> We did not see that big a difference using Mantle against DX11 in games like Battlefield 4 which did use Async.
> 
> Why is this? AMD gave up coding a driver for DX11 in Ashes hence the almost miraculous gain with DX12.
> 
> Asynchronous Compute is not the miracle that people are thinking it is.


Nor did we get to see it properly utilized in any games. It was lightly used.


----------



## HalGameGuru

Hence why I said to assume that is inductive.

EDIT: Although seeing this: "Resource Binding tiers define maximum number of resources that can be addressed using CBV (constant buffer view), SRV (shader resource view) and UAV (unordered access view), as well as texture sampler units. Tier 3 hardware allows fully bindless resources only restricted by the size of the descriptor heap, while Tier 1 and Tier 2 hardware impose some limits on the number of descriptors ("views") that can be used simultaneously."

It could indicate Async may be necessary in order for there to be unlimited "unordered" accesses. Although there are multiple asynchronous techs involved aside from just compute.


----------



## Forceman

Quote:


> Originally Posted by *HalGameGuru*
> 
> Hence why I said to assume that is inductive.
> 
> EDIT: Although seeing this: "Resource Binding tiers define maximum number of resources that can be addressed using CBV (constant buffer view), SRV (shader resource view) and UAV (unordered access view), as well as texture sampler units. Tier 3 hardware allows fully bindless resources only restricted by the size of the descriptor heap, while Tier 1 and Tier 2 hardware impose some limits on the number of descriptors ("views") that can be used simultaneously."
> 
> It would indicate Async is necessary in order for there to be unlimited "unordered" accesses.


But we know, from the developer, that it isn't.


----------



## Asmodian

Quote:


> Originally Posted by *HalGameGuru*
> 
> It would indicate Async is necessary in order for there to be unlimited "unordered" accesses.


Why? What does async compute have to do with the number of descriptors ("views") that can be used simultaneously? If you needed async for multiple unordered access views wouldn't you still need it for the 1M descriptors or the 8 UAVs allowed in Tier 1? Going to 64 UAVs (tier 2) doesn't seem like a big jump where we would suddenly require async compute.


----------



## Mahigan

Extremetech has an article up here: http://www.extremetech.com/gaming/213202-ashes-dev-dishes-on-dx12-amd-vs-nvidia-and-asynchronous-compute

I hadn't seen it before. Probably because it uses the same title image as the last article on the topic.


----------



## GorillaSceptre

Async Compute is a large part of Dx12, people can make light of it all they want, but it is the future.

Have a look at what ND is achieving on PS4 and tell me you aren't impressed, not bad for a " weak APU " is it? I wonder how they're achieving the fidelity, lighting, and effects that put most modern PC titles to shame? I'll tell you, buy actually using the potential of the chip, and not leaving the most impressive part of it just sitting there doing nothing.

In Fiji's case, we have a 8.6 Tflop beast just being wasted.. What's the point of buying $650 high-end hardware, just so it can be outperformed by a $400 console? Which is why i said it's about damn time our GPU's will actually be used properly.

I bet we'll see the same people saying Async Compute isn't a big deal, change their tune very quickly If Pascal turns out to be a more parallel architecture


----------



## HalGameGuru

What do we know isn't, from the developer?

The dev has no say in tier level support, async compute AND Async DMA are available in Tier 2 or better supported GPUs where as Async DMA is unavailable in displayed Tier 1 GPUs, where the access levels are the driving factor of Tier support for the feature level and the amount of access needed as specificed here:

https://en.wikipedia.org/wiki/Direct3D#Direct3D_12_levels

Indicates some pretty big leaps both in scope and saturation. where Tier 3 support requires unlimited access over the entire heap across all stages.

Although there is also conflicting info on Raster Ordered Views, an intel tech, that AMD purportedly doesn't support, yet ALSO purportedly supports via an OGL analog.

Unless and until MS comes out and says, or we see a GPU claiming a certain tier level of support that contradicts it, as it sits Tier 3 appears to require Async DMA and Compute, Tier 2 appears to require Async DMA, but MAYBE not compute, and possibly not as such techs themselves but for the limits to access these techs' presence allow.

Async as a tech isn't what is required but the bumps in concurrent/parallel access that they allow are.


----------



## HalGameGuru

Quote:


> Originally Posted by *Asmodian*
> 
> Why? What does async compute have to do with the number of descriptors ("views") that can be used simultaneously? If you needed async for multiple unordered access views wouldn't you still need it for the 1M descriptors or the 8 UAVs allowed in Tier 1? Going to 64 UAVs (tier 2) doesn't seem like a big jump where we would suddenly require async compute.


I should have specified I was not speaking JUST of compute in that response.

Not just compute, according to that slide Async DMA is likely the more important one for the bump from Tier 1 to Tier 2, but both seem to have an impact going from Tier 2 to Tier 3, although there are other areas where the difference could very well be the emulated vs hardware enacted features. I would wager Async DMA is far more important than bespoke Async compute for the feature level support, although I would wager both are needed for Tier 2, and the features that are present in hardware rather than missing or emulated are what push the GCN design to Tier 3.

EDIT: As an aside I am not arguing Async of any kind is necessary for DX12 support, there are GPUs with none of that that support DX12, this is SOLELY about the Feature level 12_0 tier system and the required levels of support to fulfill them.


----------



## provost

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Async Compute is a large part of Dx12, people can make light of it all they want, but it is the future.
> 
> Have a look at what ND is achieving on PS4 and tell me you aren't impressed, not bad for a " weak APU " is it? I wonder how they're achieving the fidelity, lighting, and effects that put most modern PC titles to shame? I'll tell you, buy actually using the potential of the chip, and not leaving the most impressive part of it just sitting there doing nothing.
> 
> In Fiji's case, we have a 8.6 Tflop beast just being wasted.. What's the point of buying $650 high-end hardware, just so it can be outperformed by a $400 console? Which is why i said it's about damn time our GPU's will actually be used properly.
> 
> I bet we'll see the same people saying Async Compute isn't a big deal, change their tune very quickly If Pascal turns out to be a more parallel architecture


This!


----------



## Forceman

Quote:


> Originally Posted by *HalGameGuru*
> 
> What do we know isn't, from the developer?
> 
> The dev has no say in tier level support, async compute AND Async DMA are available in Tier 2 or better supported GPUs where as Async DMA is unavailable in displayed Tier 1 GPUs, where the access levels are the driving factor of Tier support for the feature level and the amount of access needed as specificed here:
> 
> https://en.wikipedia.org/wiki/Direct3D#Direct3D_12_levels
> 
> Indicates some pretty big leaps both in scope and saturation. where Tier 3 support requires unlimited access over the entire heap across all stages.
> 
> Although there is also conflicting info on Raster Ordered Views, an intel tech, that AMD purportedly doesn't support, yet ALSO purportedly supports via an OGL analog.
> 
> Unless and until MS comes out and says, or we see a GPU claiming a certain tier level of support that contradicts it, as it sits Tier 3 appears to require Async DMA and Compute, Tier 2 appears to require Async DMA, but MAYBE not compute, and possibly not as such techs themselves but for the limits to access these techs' presence allow.
> 
> Async as a tech isn't what is required but the bumps in concurrent/parallel access that they allow are.


The assumption is that a developer, who is developing a game in DX12 (and actually the subject game of this thread) knows what he is talking about when he says async compute and Resource Binding Tier level are separate things in a discussion about whether Maxwell's lack of async compute meant that it is Tier 1. Mahigan assumed that async compute was required for Tier 2, and that meant Nvidia was Tier 1, and the developer responded:
Quote:


> Tier 2 vs Tier 3 binding is a completely separate issue from Async Compute.


Seems pretty straightforward.


----------



## HalGameGuru

Yes 2 vs 3, I would presume, from what the slides have shown, that async dma and compute are the bailiwick of Tier 2, and Tier 3 is more beholden to the features that are unrepresented or merely emulated on tier 2 vs tier 3 supporting hardware.

And once again, not that async of either type is necessary in and of themselves but the increased accessibility they allow provide the bumps needed for the leap in tier. Do not forget the access levels are termed "across all stages" and the leaps in access are pretty hefty, especially from 2 to 3. even a non-async system could provide UAV across all stages of the graphics and compute pipelines at tier 1 levels although I would figure the move to tier 2 has the bumps that required more asynchronous access support. And since this is dictated by levels of access and not the techs themselves as implemented it could very well also be due to software vs hardware controlled async, or indirect implementations via emulation or software.

But, once again, this is all inductive over what disparate info we have been given. I hope eventually we will be provided full spec and support documents from MS about the involved accessibility and which features impact which ratings the most. This is one possibility out of many, and I fully expect any official info from MS about DX to sweep away most of what I have written here and provide a whole new set of things to discuss and debate.


----------



## 2010rig

Anyone willing to summarize this thread, I haven't been able to go through it fully.

I'll take a Team Red & Team Green recap.


----------



## Forceman

Quote:


> Originally Posted by *2010rig*
> 
> Anyone willing to summarize this thread, I haven't been able to go through it fully.
> 
> I'll take a Team Red & Team Green recap.


AMD is totally going to crush it in DX12, unless they don't. Nvidia is totally screwed, unless they aren't.

That's more or less it.


----------



## Mahigan

Quote:


> Originally Posted by *2010rig*
> 
> Anyone willing to summarize this thread, I haven't been able to go through it fully.
> 
> I'll take a Team Red & Team Green recap.


Recap: http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing

As for upcoming games...

Here's a list someone made on Reddit:

AMD (Hardware Partner):
Deus Ex
Hitman
AotS
Tomb Raider 2016
Battlefront

AMD (Known Affiliation):
Mirror's Edge
Fable: Legends

AMD (assumed based on historical hardware partnerships):
Anything coming from DICE/EA

Nvidia (Hardware Partner):
Ark
King of Wushu

Nvidia (Known Affiliation):
Unreal Tournament 4

Nvidia (assumed based on historical hardware partnerships):
Gears of War (Microsoft), could be moving towards AMD with DX12 due to the XBox One

Neutral:
Arma 3 (coming with map pack iirc)
Dayz Standalone
Killer Instinct (Microsoft)
Halo Wars 2 (Microsoft)
Star Citizen

Anything from Microsoft will likely be built for the XBox One and thus be optimized for GCN by virtue of the APU. Of course, historically, AMD titles run just as well on nVIDIA cards. The one wild card, in all of this, is Asynchronous compute.


----------



## HalGameGuru

Oxide released a benchmark, there were some performance issues and PR releases that stepped on toes. nVidia and Oxide had some beef, most of it unfounded or rooted in miscommunication.

nVidia DID require a vendor specific path be implemented for the interim to make up for a problem with Async Compute in engine. nVidia is working on their drivers to fully support the tech Oxide is trying to utilize. Earlier commentary on MSAA were unfounded and can be ignored for our purposes.

Red team and Green team both perform well on AotS, green better than red on DX11, red had a SLIGHT advantage on DX12, as it sits, and a large leap in performance between 11 and 12. Feathers were ruffled, and jimmies were rustled, in the end both handle it well enough and improvements should be made across the board via drivers and in game in the interim.

Mahigan did an amazing job corralling the info and digesting it for us, he has a companion thread here:

http://www.overclock.net/t/1572716/directx-12-asynchronous-compute-an-exercise-in-crowd-sourcing

going over asych compute and DX12 in general based off of the efforts in this thread.


----------



## provost

So, I had a few spare minutes and I visited this site that has been quoted in this thread frequently:

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-33
Quote:


> Oh, and no, Maxwell V2 is actually capable of "parallel" execution. The hardware doesn't profit from it much though, since it has only little "gaps" in the shader utilization either way. So in the end, it's still just sequential execution for most workload, even though if you did manage to stall the pipeline in some way by constructing an unfortunate workload, you could still profit from it. In general, you are only saving on idle time as there is always at least one queue which contains any work at alll. Unlike GCN, where you actually even NEED that additional workload to get full utilization.
> 
> Only Fermi, Kepler and Maxwell V1 have that issue, that they can't do ANY async compute shaders while in graphics context as they don't have any compute queues in that mode. But they aren't even feature level 12_0, so I wouldn't count them as DX12 "capable" either way.


Any insights into his post besides what he has obviously stated?


----------



## gamervivek

Quote:


> Originally Posted by *47 Knucklehead*
> 
> When people say "The real money is on Consoles", they mean the XBox One. Even though the PS4 uses mostly the same hardware, it uses a totally different OS and DirectX 12 is irrelevant for it. The only Console that will benefit from DirectX 12 at all is Microsofts, which is getting hammered in market.











Quote:


> So what does Cerny really think the console will gain from this design approach? Longevity.
> 
> Cerny is convinced that in the coming years, developers will want to use the GPU for more than pushing graphics -- and believes he has determined a flexible and powerful solution to giving that to them. "The vision is using the GPU for graphics and compute simultaneously," he said. *"Our belief is that by the middle of the PlayStation 4 console lifetime, asynchronous compute is a very large and important part of games technology."*
> 
> Cerny envisions "a dozen programs running simultaneously on that GPU" -- using it to "perform physics computations, to perform collision calculations, to do ray tracing for audio."


Quote:


> Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we've worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."


http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1


----------



## Mahigan

Quote:


> Originally Posted by *provost*
> 
> So, I had a few spare minutes and I visited this site that has been quoted in this thread frequently:
> 
> https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-33
> Any insights into his post besides what he has obviously stated?


What they're saying is pretty much what I summed up in the recap thread. Basically... Asynchronous Compute, or even more parallelism in general, probably won't give Maxwell 2 a boost in performance. We don't know yet... wait and see... but they're thinking the same thing I am.

GCN, on the other hand, needs to be fed more work in order to achieve its top performance. GCN thrives on parallelism. Asynchronous Compute is like a shot of Nitrous Oxide for GCN. Of course there's a limit to anything but we haven't yet seen what Fiji is capable of doing. It will take game engines whose Graphic pipeline is pushed further into compute in order to tap into what Fiji is capable of doing. Ashes of the Singularity, as it stands right now, is based on an iteration of the Nitrous engine which under utilizes Fiji at the moment.


----------



## Mahigan

Quote:


> Originally Posted by *Remij*
> 
> Always having to wait for AMD tech to be fully utilized... what a waste of $650 right now.


It's a different strategy, more of a long term investment than a short term investment. I suppose people are realizing this now. By this I mean they're realizing that GCN isn't an architecture whose products you use and throw away during the next product cycle. They're products with a longer lifespan.

Whether that fits your personality or not is entirely subjective. One thing is for certain, we have an industry which focuses on present day performance in benchmarks. This doesn't paint GCN in a good light. Whether this is the correct model or not isn't up to me to decide. I paid $350ea for my R9 290x's when they launched (thanks to some points I had accumulated over at NCIX). That's $700 and that was a few years ago. I haven't bought a Fiji or Maxwell 2 card because I don't need too. By the time I need to upgrade I'll probably do the same again, I'll pick the architecture which appears to be the most forward looking. That means the most Parallel architecture coupled with the architecture with the most strength in terms of Compute performance.

If that's not your thing then more power to you bro







That's why we've got choices. So we can choose what suits our needs best.


----------



## 2010rig

Thanks for the recaps +rep


----------



## Casey Ryback

Quote:


> Originally Posted by *Mahigan*
> 
> It's a different strategy, more of a long term investment than a short term investment. I suppose people are realizing this now. By this I mean they're realizing that GCN isn't an architecture whose products you use and throw away during the next product cycle. They're products with a longer lifespan.
> 
> Whether that fits your personality or not is entirely subjective. One thing is for certain, we have an industry which focuses on present day performance in benchmarks. This doesn't paint GCN in a good light. Whether this is the correct model or not isn't up to me to decide. I paid $350ea for my R9 290x's when they launched (thanks to some points I had accumulated over at NCIX). That's $700 and that was a few years ago. I haven't bought a Fiji or Maxwell 2 card because I don't need too. By the time I need to upgrade I'll probably do the same again, I'll pick the architecture which appears to be the most forward looking. That means the most Parallel architecture coupled with the architecture with the most strength in terms of Compute performance.
> 
> If that's not your thing then more power to you bro
> 
> 
> 
> 
> 
> 
> 
> That's why we've got choices. So we can choose what suits our needs best.


I bought a 7950 boost probably 3 years ago for a decent price (bout 660ti price), then during the mining craze sold it and upgraded to a 7970 at no extra cost.

The amount of life I got out of that initial investment is pretty amazing.

http://www.anandtech.com/bench/product/770?vs=860

I don't see anyone still using 660ti's, yet many are still on a 7950/7970/280X.


----------



## mav451

Speaking of choices, AMD has specifically excluded TPU, TR, and [H] from a Nano review sample.

Curious what we'll be learning come Thursday (Sept 10) this week. I am guessing AT and HWC both have one in their hands - or at least I'm hoping.


----------



## Casey Ryback

Quote:


> Originally Posted by *mav451*
> 
> Speaking of choices, AMD has specifically excluded TPU, TR, and [H] from a Nano review sample.
> 
> Curious what we'll be learning come Thursday (Sept 10) this week. I am guessing AT and HWC both have one in their hands - or at least I'm hoping.


Honestly who even cares about the nano by now lol, buy the fury/fury X if you want fiji.


----------



## Mahigan

Quote:


> Originally Posted by *mav451*
> 
> Speaking of choices, AMD has specifically excluded TPU, TR, and [H] from a Nano review sample.
> 
> Curious what we'll be learning come Thursday (Sept 10) this week. I am guessing AT and HWC both have one in their hands - or at least I'm hoping.


I saw that, I've been participating in threads across the web about it as well. Some PR guy at AMD said something about fairness in reviews... kinda being unprofessional about it too. Looks like AMD is unhappy with the way their products are being shown in reviews, at the same time they're acting like a child throwing a temper tantrum. I heard Linus is going to be sending their Nano to TR after they're done with it. You'd think AMD would be on their best behavior right now but it seems that some folks over there just don't understand the amount of bad PR this is getting them. Like AMD needs bad PR right now.










Spoiler: Warning: Spoiler!


----------



## CrazyElf

I feel like this thread has reached the point where we are reaching an impasse.

Quote:


> Originally Posted by *Mahigan*
> 
> I saw that, I've been participating in threads across the web about it as well. Some PR guy at AMD said something about fairness in reviews... kinda being unprofessional about it too. Looks like AMD is unhappy with the way their products are being shown in reviews, at the same time they're acting like a child throwing a temper tantrum. I heard Linus is going to be sending their Nano to TR after they're done with it. You'd think AMD would be on their best behavior right now but it seems that some folks over there just don't understand the amount of bad PR this is getting them. Like AMD needs bad PR right now.


Bad move AMD.

Their bad marketing and PR missteps have played a big role in their current situation. Marketing sells, not always the best product. As this thread reveals.









For those interested in the Nvidia's ideas on the future, they made a VR presentation in August:
https://developer.nvidia.com/virtual-reality-development

Anyways, here are the slides:

GameWorks_VR_2015_Final_handouts.pdf 3134k .pdf file


The VR SLI stuff is very interesting and very exciting, although there is a lack of parallel. I think that so far, there is increasing evidence that Nvidia was indeed taken off guard with the preemption. Mahigan may be right - it may be Volta before this is fully rectified.

The question is, can AMD translate that into a lasting advantage? The last round that AMD "won", was the 5870 vs the GTX 480 and there, AMD was unable to make a lasting dent into Nvidia's sales. They were able to regain it and then some. Can AMD get a lasting advantage out?

It's becoming clear that a Compute heavy GPU is the future. There may not be a tradeoff from here on between a "gaming" versus a compute centric GPU (an example, Maxwell was a very Gaming centric GPU).

Quote:


> Originally Posted by *Casey Ryback*
> 
> But you don't buy the latest and greatest GPU do you?


You shouldn't judge people by just what they put on their sig rigs.

A lot of people work with and have equipment other than what they show on their sig rights. Or they are planning large future purchases.

Quote:


> Originally Posted by *Casey Ryback*
> 
> But Nvidia buyers upgrade more often, people that buy AMD buy cards to last 2 years +
> 
> A lot of nvidia buyers just buy the latest benchmark winner for epeen on forums such as this one, they probably don't even play games and just run benchmarks and look at their cards through the window on their case.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> So even if one user on steam has a 980ti for example, chances are they've bought a 670/680 and a 780ti before that.
> 
> Nvidia know how to get repeat customers.
> 
> One hardware tactic is to make sure memory bandwidth and memory amount is as low as possible whilst being adequate for the current games being benchmarked.


You have a very valid point. Nvidia does have a very loyal fanbase (as this thread can demonstrate







), and even if AMD does succeed, it might not matter. AMD too has had marketing that has been pretty awful, almost to a comical extent. There's a reason why Nvidia has an almost Apple-like fan base and AMD doesn't.

That's why I wondered if this will turn out to be closer to the 5870 than a knockout blow for AMD. But given AMD's financial situation, even a moderate boost could be a huge gain. That and Zen must at least be decent.

If these fail, we could be looking at the end of AMD, unless they can get another source of cash.

Quote:


> Originally Posted by *47 Knucklehead*
> 
> When people say "The real money is on Consoles", they mean the XBox One. Even though the PS4 uses mostly the same hardware, it uses a totally different OS and DirectX 12 is irrelevant for it. The only Console that will benefit from DirectX 12 at all is Microsofts, which is getting hammered in market.


The problem is, the PS4 is also GCN, which means that games made for it too will be optimized for GCN and by extension, console ports. Plus Mantle, Vulkan, and DX12 share a lot of similarities. This will also make it easier for developers to make console ports.

People are not trying to play games from 2010 at 300 fps instead of 200 fps. They buy new GPUs to play the latest games at an acceptable FPS. That doesn't mean that AMD doesn't have immense challenges ahead - it does. In fact, if Zen and this don't work, we could be staring at the end of AMD.

But if AMD goes under, you and I are going to suffer the most (unless you're an Nvidia employee, but that won't necessarily mean your pay cheque is going to rise - all of the gains in our society go to the shareholders and the richest 1% of society







). You might unhappy that AMD disappeared if that were to happen, because even assuming a competitor comes, it could take years. The other problem is that Nvidia will have less incentive to innovate. Stuff like DX12, better parallelism would either be greatly slowed or wouldn't happen.+

Quote:


> Originally Posted by *FastEddieNYC*
> 
> I totally agree. I own both brands and I trust the marketing info when making a decision what to buy. What concerns me in all this is Nvidia not being completely truthful with us about DX12 support. Technically they didn't lie but there is a difference between full hardware support and software. That information does affect my buying decisions.
> We see this behavior with most companies that have a dominant market share. The pressures to maintain and increase margins from investors usually results in the loss of ethical standards.


We'll have to wait and see. But for hardware reasons, you're right to be skeptical. So am I.

The way Nvidia's architecture has been built, as Mahigan has noted, we may not see a truly parallel architecture until Volta (because Pascal taped out before this information became available to Nvidia).

Quote:


> Originally Posted by *Noufel*
> 
> ok you won




The question is whether or not AMD can translate this into a lasting advantage. In the past they've been first to market and often unable to profit.

The other problem, as noted before is whether or not they can make a noticeable dent in Nvidia's market share. During the 5870 vs GTX 480, even then the dip was minor.

They need to succeed this time or they are in serious trouble.

Quote:


> Originally Posted by *2010rig*
> 
> Anyone willing to summarize this thread, I haven't been able to go through it fully.
> 
> I'll take a Team Red & Team Green recap.


Basically AMD is about to make massive gains. They sacrificed their short term DX11 performance for greater DX12 performance.

Gaming is about to undergo a huge revolution as it becomes more "compute centric".

Also, don't buy a GPU right now, the 16nm ones are about to make a big change. Oh, and with the CAD$ so weak, everything computer has become more expensive (I think you live in Canada?)


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> What they're saying is pretty much what I summed up in the recap thread. Basically... Asynchronous Compute, or even more parallelism in general, probably won't give Maxwell 2 a boost in performance. We don't know yet... wait and see... but they're thinking the same thing I am.
> 
> GCN, on the other hand, needs to be fed more work in order to achieve its top performance. GCN thrives on parallelism. Asynchronous Compute is like a shot of Nitrous Oxide for GCN. Of course there's a limit to anything but we haven't yet seen what Fiji is capable of doing. It will take game engines whose Graphic pipeline is pushed further into compute in order to tap into what Fiji is capable of doing. Ashes of the Singularity, as it stands right now, is based on an iteration of the Nitrous engine which under utilizes Fiji at the moment.


So, what the heck does the following mean, as Beyond3DForum guys are saying that Kepler/Gk110 are not DX12 capable, however according to Nvidia:

http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/

Quote:


> Plus, our Maxwell and Kepler GPU architectures already support DX12, with support for Fermi coming later.


----------



## Mahigan

Quote:


> Originally Posted by *provost*
> 
> So, what the heck does Nvidia mean, as Beyond3DForum guys are saying that Kepler/Gk110 doesn't support DX 12, however:
> 
> http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/


They "support" DX12, but they lack so many features that they likely won't see any real gains from this "support".


----------



## Casey Ryback

Quote:


> Originally Posted by *Mahigan*
> 
> I saw that, I've been participating in threads across the web about it as well. Some PR guy at AMD said something about fairness in reviews... kinda being unprofessional about it too. Looks like AMD is unhappy with the way their products are being shown in reviews, at the same time they're acting like a child throwing a temper tantrum. I heard Linus is going to be sending their Nano to TR after they're done with it. You'd think AMD would be on their best behavior right now but it seems that some folks over there just don't understand the amount of bad PR this is getting them. Like AMD needs bad PR right now.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Spoiler: Warning: Spoiler!


What did TR actually do to cause this?

I know everyone kicked up a fuss when kitguru didn't get a fury card, but they didn't deserve one after their troll/rant video prior to the 300series/fury release.


----------



## Mahigan

Quote:


> Originally Posted by *Casey Ryback*
> 
> What did TR actually do to cause this?
> 
> I know everyone kicked up a fuss when kitguru didn't get a fury card, but they didn't deserve one after their troll/rant video prior to the 300series/fury release.


I have no idea. I also think theirs no excusing AMDs behavior... rather than looking for reasons to blame the victim... I'll just blame the offender.. which is AMD. I find this stunt pretty dumb. I mean I think most Tech Journalists, excuse the term, suck; but that doesn't mean you go around alienating potential pools of customers (which their viewer base is made up of).


----------



## Kand

Quote:


> Originally Posted by *provost*
> 
> So, what the heck does the following mean, as Beyond3DForum guys are saying that Kepler/Gk110 are not DX12 capable, however according to Nvidia:
> 
> http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/


It's like having a Sata 3 hard drive that you can connect to a Sata 2 interface. It will work, just not at 100%.


----------



## SpeedyVT

It honestly sounds like sourgraping to me. It's not uncommon for leaks to come from TPU. I've seen my fair share of leaks on their site.


----------



## Dudewitbow

Quote:


> Originally Posted by *provost*
> 
> So, what the heck does the following mean, as Beyond3DForum guys are saying that Kepler/Gk110 are not DX12 capable, however according to Nvidia:
> 
> http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/


its more or less of what we the users think on average of what DX12 was. In general, most of us went to believe DX12 existed to minimize CPU reliance, akin to what Mantle was a prototype for. In reality, DX12 is just like any other version. To have DX12 capable hardware there's a minimum specification that probably exist given by Microsoft. passing the minimum specifications does not include having all of the overhead performance which many users were expecting in the first place, which is basically slightly disappointing. Kepler and Maxwell are definitely DX12 capable, its just that in this particular non mandatory(though I'm pretty sure many of agree that it should be recommend now) instance of a performance enhancing feature, its lacking the power to efficiently do this due to a hardware decision, and will probably be mitigated somewhat by its upcoming software take on a scheduler. It may not be the prettiest way to approach the problem, but any free performance boost is better for the end user as a whole.


----------



## Silent Scone

Quote:


> Originally Posted by *Mahigan*
> 
> I have no idea. I also think theirs no excusing AMDs behavior... rather than looking for reasons to blame the victim... I'll just blame the offender.. which is AMD. I find this stunt pretty dumb. I mean I think most Tech Journalists, excuse the term, suck; but that doesn't mean you go around alienating potential pools of customers (which their viewer base is made up of).


I thought a lot of the media was culled after a quiet spell, but it seems not. The usual suspects are back in the limelight . It reflects really badly on AMD as a whole.


----------



## NoirWolf

Quote:


> Originally Posted by *Silent Scone*
> 
> I thought a lot of the media was culled after a quiet spell, but it seems not. The usual suspects are back in the limelight . It reflects really badly on AMD as a whole.


I tend to lean more on the idea of them really not having enough review copies to go around. The Fury X is suffering supply issues currently so the Fury Nano (which is even higher binned) can't be at all in large supply.... plus quite a few review sites make it a habit of asking for permanent units not lenders.


----------



## provost

Quote:


> Originally Posted by *Dudewitbow*
> 
> its more or less of what we the users think on average of what DX12 was. In general, most of us went to believe DX12 existed to minimize CPU reliance, akin to what Mantle was a prototype for. In reality, DX12 is just like any other version. To have DX12 capable hardware there's a minimum specification that probably exist given by Microsoft. passing the minimum specifications does not include having all of the overhead performance which many users were expecting in the first place, which is basically slightly disappointing. Kepler and Maxwell are definitely DX12 capable, its just that in this particular non mandatory(though I'm pretty sure many of agree that it should be recommend now) instance of a performance enhancing feature, its lacking the power to efficiently do this due to a hardware decision, and will probably be mitigated somewhat by its upcoming software take on a scheduler. It may not be the prettiest way to approach the problem, but any free performance boost is better for the end user as a whole.


Ok, so knowing what we know and what has been confirmed by everyone, except Nvidia (whose lack of direct response is a confirmation in itself), my question is simply this: Does Nvidia have a plan, or was it surprised itself ?


----------



## 47 Knucklehead

Here is the real recap:

Oxide, a company that has worked very closely with AMD for years and helped to develop Mantle with AMD, came up with a benchmark for their not yet released game that is still in "pre-beta beta". It uses a feature that isn't an official part of DX12, Async Compute. AMD had it on by default, nVidia didn't (they require a driver for it to work). Oxide said that nVidia couldn't do Async Compute and Team Red, tired from disapointment after disapointment in the news department from a rebranding of the entire 300 series, their new Fury X card being 3-5% slower than the already released 980Ti, and the obscene price of the Nano, jumped all over the great news. nVidia and Oxide talked and low and behold, nVidia CAN do Async Compute, but since it's a partly hardware, partly software, Team Red is clinging to the hope that it will still be inferior to GCN's hardware Async Compute when implemented, but they seem to forget about this .. That even with Async Compute ON for AMD and totally turned OFF for nVidia ...










The cards are tied. So odds are, when nVidia and Oxide work out the driver, and Async Compute is turned on, that means that the 980Ti will once again beat the Fury X.


----------



## Kuivamaa

Quote:


> Originally Posted by *Kand*
> 
> Mantle supports Async.
> 
> We did not see that big a difference using Mantle against DX11 in games like Battlefield 4 which did use Async.
> 
> Why is this? AMD gave up coding a driver for DX11 in Ashes hence the almost miraculous gain with DX12.
> 
> Asynchronous Compute is not the miracle that people are thinking it is.


Common misconception. All systems got good boost with Mantle in CPU limited situations, it is just so that reviewers liked to test scenarios with 4xMSAA etc where the bottleneck would be in the memory bandwidth and then call it a wash between AMD mantle and DX11 nvidia. Not the case at all in real situations,minimums and smoothness have been better with mantle (BF4 being my most played game of the last 2 years, with a variety of systems).


----------



## Silent Scone

Quote:


> Originally Posted by *Dudewitbow*
> 
> its more or less of what we the users think on average of what DX12 was. In general, most of us went to believe DX12 existed to minimize CPU reliance, akin to what Mantle was a prototype for. In reality, DX12 is just like any other version. To have DX12 capable hardware there's a minimum specification that probably exist given by Microsoft. passing the minimum specifications does not include having all of the overhead performance which many users were expecting in the first place, which is basically slightly disappointing. Kepler and Maxwell are definitely DX12 capable, its just that in this particular non mandatory(though I'm pretty sure many of agree that it should be recommend now) instance of a performance enhancing feature, its lacking the power to efficiently do this due to a hardware decision, and will probably be mitigated somewhat by its upcoming software take on a scheduler. It may not be the prettiest way to approach the problem, but any free performance boost is better for the end user as a whole.


Mantle was never specifically designed with the sole intent on reducing overhead, it was always going to regardless - that was a convenient sell when it transpired that BF4 performance gains weren't drastic on certain tiers of hardware.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *provost*
> 
> Ok, so knowing what we know and what has been confirmed by everyone, except Nvidia (whose lack of direct response is a confirmation in itself), my question is simply this: Does Nvidia have a plan, or was it surprised itself ?


They have a plan, but apparently Oxide has forced them to push up their schedule to quell this issue. I mean it's not like nVidia has nothing else to do, this just wasn't one of their priorities. Now it will be.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *CrazyElf*
> 
> But if AMD goes under, you and I are going to suffer the most (unless you're an Nvidia employee, but that won't necessarily mean your pay cheque is going to rise - all of the gains in our society go to the shareholders and the richest 1% of society
> 
> 
> 
> 
> 
> 
> 
> ). You might unhappy that AMD disappeared if that were to happen, because even assuming a competitor comes, it could take years. The other problem is that Nvidia will have less incentive to innovate. Stuff like DX12, better parallelism would either be greatly slowed or wouldn't happen.+


WHEN AMD goes under (not if, when), they won't simply vanish and there will be no more "AMD CPU's and video cards". People seem to always think that, and they are wrong. When a company that has considerable intellectual properties goes out of business, their assets (and Intellectual Property is a major one) are sold to another company who is in a much better financial position. The US government will not allow Intel to buy the AMD CPU division, nor will they allow nVidia to buy the graphics card division. They will be sold to other companies, say like Samsung, who has buckets of cash, people, and superior marketing and management. This will allow BETTER COMPETITION than having AMD just plodding along on it's own and doing nothing really to Intel or nVidia. To say that DX12 and parallelism wouldn't happen is laughable. Parallelism has been a part of computing for nearly a hundred years now. And I'm sure Microsoft, Intel, and nVidia will have something to say about DirectX 12 ... they too had a hand in it, just like they did with DirectX 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11.

So as you can see, AMD going out of business isn't a bad thing for the consumer ultimately. AMD staying in business as a gimped organization (aka what they are now) is MUCH WORSE.


----------



## Defoler

Quote:


> Originally Posted by *47 Knucklehead*
> 
> They have a plan, but apparently Oxide has forced them to push up their schedule to quell this issue. I mean it's not like nVidia has nothing else to do, this just wasn't one of their priorities. Now it will be.


Oxide didn't "force" them into it, but the masses going on an uproar over a game in alpha before drivers were finished, did.
Which is both hilarious and funny. Because the drivers would be ready when they want to. In the end, the statements that "we are working on it" doesn't mean in this second.
By the time the game is out, the drivers would have been ready anyway, and things would have been different.

The whole uproar over AMD vs Nvidia in async calls is pretty stupid, and AMD put some fuel in the flame because... well... they are AMD, and they like acting stupid and then eating their own words in the price of shares in the end.


----------



## Silent Scone

Not that it would happen, but if Samsung gained AMD they would absolutely rinse it's assets to further markets they already compete in. Samsung has no soul, it's something AMD does have at least. Slightly OT mind you


----------



## 2010rig

How does a corporation have a soul?


----------



## provost

Quote:


> Originally Posted by *47 Knucklehead*
> 
> They have a plan, but apparently Oxide has forced them to push up their schedule to quell this issue. I mean it's not like nVidia has nothing else to do, this just wasn't one of their priorities. Now it will be.


I don't know if this precisely answers the fundamental question, but thanks for clarifying anyway.

Quote:


> Originally Posted by *47 Knucklehead*
> 
> WHEN AMD goes under (not if, when), they won't simply vanish and there will be no more "AMD CPU's and video cards". People seem to always think that, and they are wrong. When a company that has considerable intellectual properties goes out of business, their assets (and Intellectual Property is a major one) are sold to another company who is in a much better financial position. The US government will not allow Intel to buy the AMD CPU division, nor will they allow nVidia to buy the graphics card division. They will be sold to other companies, say like Samsung, who has buckets of cash, people, and superior marketing and management. This will allow BETTER COMPETITION than having AMD just plodding along on it's own and doing nothing really to Intel or nVidia. To say that DX12 and parallelism wouldn't happen is laughable. Parallelism has been a part of computing for nearly a hundred years now. And I'm sure Microsoft, Intel, and nVidia will have something to say about DirectX 12 ... they too had a hand in it, just like they did with DirectX 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11.
> 
> So as you can see, AMD going out of business isn't a bad thing for the consumer ultimately. AMD staying in business as a gimped organization (aka what they are now) is MUCH WORSE.


This is by far one of the worst arguments (no the worst...lol) I have seen in favor of what's best for the consumer.... Lol

As I jokingly said in another thread, why even bother with Nvidia and AMD, let's go straight to Intel that has far more capital, fab resources and industry clout than both Nvidia and AMD combined. I don't see how having Nvidia as the sole monopolizer of GPUs benefits the consumers, but if it has to be one due to the industry pressures, I don't see any room for Nvidia either, frankly speaking. Intel can retool its fabs for GPUs should both Nvidia and AMD were to disappear, there will be no more reliance or begging from TSMC as Intel can control its own production, Intel will staff up on GpU resources which would be a drop in the bucket for it. As for pricing, I don't see it being any worse than what a monopolistic pricing would like for Nvidia (which is pretty much the case now) nor do I think that the so called "performance jump" from generation to generation would be any worse, if not better, if Intel wants to keep selling GPUs. Again, this is all theorized as a joke, so take it as such.


----------



## Silent Scone

Quote:


> Originally Posted by *2010rig*
> 
> How does a corporation have a soul?


Hi,

You've obviously not used many Samsung products.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Here is the real recap:
> 
> Oxide, a company that has worked very closely with AMD for years and helped to develop Mantle with AMD, came up with a benchmark for their not yet released game that is still in "pre-beta beta". It uses a feature that isn't an official part of DX12, Async Compute. AMD had it on by default, nVidia didn't (they require a driver for it to work). Oxide said that nVidia couldn't do Async Compute and Team Red, tired from disapointment after disapointment in the news department from a rebranding of the entire 300 series, their new Fury X card being 3-5% slower than the already released 980Ti, and the obscene price of the Nano, jumped all over the great news. nVidia and Oxide talked and low and behold, nVidia CAN do Async Compute, but since it's a partly hardware, partly software, Team Red is clinging to the hope that it will still be inferior to GCN's hardware Async Compute when implemented, but they seem to forget about this .. That even with Async Compute ON for AMD and totally turned OFF for nVidia ...
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The cards are tied. So odds are, when nVidia and Oxide work out the driver, and Async Compute is turned on, that means that the 980Ti will once again beat the Fury X.


Because I don't think you're grasping the fact that those benches are AMD Async enabled (more or less ideal for it) vs Nvidia pretty much running under ideal conditions for it (DX11). The only thing Async will do for your 980ti is introduce latency which will cut frames off DX11's FPS to allow it to even run Async games (if you need an analogy: the Nvidia GPUs need to think about doing Async, currently, while AMD GPUs do it instinctively, there will always be latency between the two) .

The Fury X? Roflstomps the 980ti in multi-GPU setups ( http://wccftech.com/amd-radeon-r9-fury-quad-crossfire-nvidia-geforce-gtx-titan-quad-sli-uhd-benchmarks/ ).
The Fury Pro? Shoots the 980 behind the chemical sheds ( the 980 arguably is a worse buy than the 390x but that depends on your application ).
The Fury Nano? Unless you can squeeze 2 SLI mini-itx 970s into the same rig the Nano beats anything Nvidia has in the small form factor range by a long way.

Feel free to complain more about the lack of Crossfire support for DX9 games on anything north of the 6970 though.


----------



## Defoler

Intel wanted in the past to make GPUs (Larrabee). But they scraped it when they realised it was a loss cause for them to enter that market, as both Nvidia and AMD rule the patents on the more important issues.
Intel instead went with sharing patents with Nvidia for the last several years for their integrated GPUs.

Single manufacturer is always something bad. Either it's intel, nvidia or whatever.
We can see it with very little progress every new generation from intel, we can see it in the GPU market where half the cards are just rebrands, or in windows what microsoft are doing and there is no other choice but to accept it, etc etc etc.

Look at the mobile market. Every year we great high leaps in processing, graphics, screen, cameras, battery life, size, OS, tools, etc etc, because there are tons of manufacturers and companies in those markets. Way more than we have in the PC market.

If we have similar amount of big and strong companies (or little ones as well) in the PC market, prices would be much lower than 1000$ flag ship GPU with barely 5% better performance, and only barely 10% increase in CPU performance every 1-1.5 years.


----------



## Silent Scone

Quote:


> Originally Posted by *NoirWolf*
> 
> How much did you get paid by Nvidia for that little lip service? Because I don't think you're grasping the fact that those benches are AMD Async enabled (more or less ideal for it) vs Nvidia pretty much running under ideal conditions for it (DX11).


This makes no sense given the context, it's a DX12 comparison.


----------



## Defoler

Quote:


> Originally Posted by *NoirWolf*
> 
> ( the 980 arguably is a worse buy than the 390x but that depends on your application ).


Actually the 390x is considered the worst card AMD has put out in a long time, as it is just basically a rebrand 290x with 8GB memory, which was already in the market, for a lower price.
The 390x performed almost identically to the 290x in most games. And you can still find the 290x 8GB for from 50$ to 100$ less.


----------



## Casey Ryback

Quote:


> Originally Posted by *Silent Scone*
> 
> Hi,
> 
> You've obviously not used many Samsung products.


You've got to be joking? After the whole samsung smart TV voice recording and data collection.............

Or maybe you got confused with the actual samsung soul product and thought it means they have one?









http://www.cnet.com/products/samsung-soul-sgh-u900-unlocked/
Quote:


> Originally Posted by *Defoler*
> 
> Actually the 390x is considered the worst card AMD has put out in a long time,


According to who? I've never seen anyone say it's the worst card released in a long time.

Sure the price isn't great, but far from the worst card released.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Defoler*
> 
> By the time the game is out, the drivers would have been ready anyway, and things would have been different.


I agree, that is why I said that this is largely just a bunch of hooie.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Silent Scone*
> 
> This makes no sense given the context, it's a DX12 comparison.


He's ranting, it's not supposed to make sense.


----------



## Serios

Trying to make Oxide look like is favoring AMD in comparison to Nvidia is laughable after all that has been discussed on this thread.
The only GPU specific path AotS is running is for Nvidia GPUs.
Nvidia had the source code for the game for over a year and did more on-site visits than AMD, they also release a driver for this "pre-beta beta" just in time for benchmarks.

All Knuckle and those like him did was wait for a bone from nvidia, that is all they ever wanted.


----------



## Silent Scone

Quote:


> Originally Posted by *Casey Ryback*
> 
> You've got to be joking? After the whole samsung smart TV voice recording and data collection.............
> 
> Or maybe you got confused with the actual samsung soul product and thought it means they have one?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.cnet.com/products/samsung-soul-sgh-u900-unlocked/
> According to who? I've never seen anyone say it's the worst card released in a long time.
> 
> Sure the price isn't great, but far from the worst card released.


No, no confusion here. Just likely have a better perception of them given the conglomerate scale.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> He's ranting, it's not supposed to make sense.


Async is disabled for Nvidia GPUs in DX12 mode. It doesn't make sense to those who don't read what has been said. For all intents and purposes DX12 for Nvidia on AotS works pretty much like DX11 with the only difference, I'd imagine, being some extra features, which Nvidia do support on their hardware, being still active. Now feel free to say I am ranting.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Serios*
> 
> Trying to make Oxide look like is favoring AMD in comparison to Nvidia is laughable after all that has been discussed on this thread.
> The only GPU specific path AotS is running is for Nvidia GPUs.
> Nvidia had the source code for the game for over a year and did more on-site visits than AMD, they also release a driver for this "pre-beta beta" just in time for benchmarks.
> 
> All Knuckle and those like him did was wait for a bone from nvidia, that is all they ever wanted.


You obviously missed the part where Oxide was a major developer of Mantle in the past.

You also obviously missed the part where Oxide has said that they hope to improve another aspect of the AMD system from 20% to 50% to 100%.

So don't try to pretend that Oxide doesn't have long history with AMD, nor have they said that they are doing things other than just doing simple DirectX 12 calls and not doing any vendor specific optimizations.


----------



## Mahigan

AMD hardware partnerships have always resulted in games where NVIDIA products perform just as well, if not better.

Since AMD have experience in coding closer to Metal, Mantle, then it is only logical that game developers partner with AMD in order to make use of that expertise in this uncharted territory.

That's likely why we see AMD obtaining most of the partnerships for the upcoming DX12 titles.

The one wildcard, in all of this, is asynchronous compute. So to make a big fuss about it is warranted. Why is it warranted? Because Asynchronous compute is a means by which a developer can optimize for GCN. It will be used, in all the titles, which have partnered or affiliated themselves with AMD. You can bet on it.

While a developer may have been limited in the way they could optimize for GCN under DX11, this is not the case for DX12. DX12 exposes GCN's parallel architectural advantages to the developer in a way which was impossible under DX11.

Based on this, I expect some stiff competition between AMD and NVIDIA in a rather short time.


----------



## Mahigan

Game engines are heading towards being more compute oriented. That's not because of a bias towards AMD, that's because that's what DX12 and Vulcan are all about.

It just so happens that GCN hardware is more compute oriented compared to its NVIDIA hardware competition. This will change. Expect to see NVIDIA boost its compute performance going forward.


----------



## Klocek001

Quote:


> Originally Posted by *Silent Scone*
> 
> Samsung has no soul, it's something AMD does have at least.


this is so ridiculous I don't even know how to respond.


----------



## Silent Scone

Quote:


> Originally Posted by *Klocek001*
> 
> this is so ridiculous I don't even know how to respond.


That's simply because you don't know how to because you failed to understand the context, if you read the rest of the post it should be fairly obvious. Look to your own faults before insinuating the comment is stupid.

Here is a breakdown.


Swallow assets
Not in discrete GPU markets interests
Much lesser consumer interaction

Here's a pro tip, you did respond.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> Game engines are heading towards being more compute oriented. That's not because of a bias towards AMD, that's because that's what DX12 and Vulcan are all about.
> 
> It just so happens that GCN hardware is more compute oriented compared to its NVIDIA hardware competition. This will change. Expect to see NVIDIA boost its compute performance going forward.


really nice to see some benchmarks between old 7870/R9 270X and GTX 960 and GTX 660 and R9 380.


----------



## CrazyElf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> WHEN AMD goes under (not if, when), they won't simply vanish and there will be no more "AMD CPU's and video cards". People seem to always think that, and they are wrong. When a company that has considerable intellectual properties goes out of business, their assets (and Intellectual Property is a major one) are sold to another company who is in a much better financial position. The US government will not allow Intel to buy the AMD CPU division, nor will they allow nVidia to buy the graphics card division. They will be sold to other companies, say like Samsung, who has buckets of cash, people, and superior marketing and management. This will allow BETTER COMPETITION than having AMD just plodding along on it's own and doing nothing really to Intel or nVidia. To say that DX12 and parallelism wouldn't happen is laughable. Parallelism has been a part of computing for nearly a hundred years now. And I'm sure Microsoft, Intel, and nVidia will have something to say about DirectX 12 ... they too had a hand in it, just like they did with DirectX 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11.
> 
> So as you can see, AMD going out of business isn't a bad thing for the consumer ultimately. AMD staying in business as a gimped organization (aka what they are now) is MUCH WORSE.


You just lost all credibility with me right there. I hate to say this, but there are times you know you're arguing with someone who has lost complete objectivity. Mahigan has a point - consumerism does seem to do this.

You are describing what happens in a takeover, not a bankruptcy. So far, there have been no plans for an AMD takeover. Even during a takeover, there are a lot of layoffs usually because there are redundancies between two companies. Of course, in today's corporate culture, ironically, many of those laid off are often not "redundant" as the companies learn the hard way later, but many are obsessed with quarterly results ....









If you want to see a high profile recent bankruptcy, look at what happened during the bankruptcy Nortel Networks. That's a huge gamble your taking. When a company goes bankrupt, often they don't just get absorbed into another company.

Overview of bankruptcies:

A board of trustees will be appointed to oversee the bankruptcy process.
The investors lose everything.
The creditors will get what's left, which is often pennies per dollar.
Individual operating divisions are often auctioned off.
What remaining assets are usually liquidated at auction to many companies (this can include things like patents, PPE not already auctioned off, etc).
A fund is created (or they use the old company's fund) for overseeing the worker benefits (retirement, health, etc). Sometimes the taxpayer may be liable during shortfalls.

As for the engineers, they don't 100% suddenly go to a new company that will continue operations as is. Most will lose their jobs and have to find new jobs. Some may be kept during the bankruptcy process to facilitate any transitions and others may be part of the operating divisions bought off, but there's no denying that many will lose their jobs.

For the record, the US government may very well allow a monopoly. They never broke up Microsoft. I know this is not a politics blog, but as of late, they've allowed more monopolies at the expense of consumers.







Monsanto is an example and they've been on a rapid buying spree as of late. They dropped their plans to buy Syngenta, but they've got their eye on new purchases and whatever emerges will be a powerful biotech corporation - to the detriment of the end consumer. Also, I'll note that the majority of 3DFX's assets were acquired by Nvidia. What's stopping them from doing the same to AMD? The combined company may not combine the best of both worlds; they will surely use it to milk. Nvidia's behaviour too in the past has not been exemplary, as many have noted (most notably Linus Torvalds).

The only other thing I will note is that they don't just appointed "anyone" as trustees, there are dedicated professionals that do this line of work. Typically an accounting firm is appointed for large bankruptcies (EY was for Nortel for example).

In Canada (this will be very relevant for AMD Markham's assets should they ever go bankrupt):
http://www.cairp.ca/general-public/what-is-a-cirp/

And in the US:
https://www.aira.org/aira/about_us

As I said, you're gambling that there will be someone that wants to buy AMD. I'm not saying it cannot happen (it could). But in order for that to happen, they would have to see AMD being worth buying (and considering all you've said, if you were a company flush with cash, would AMD really be the best possible acquisition even if very cheap for a company its size?)), gamble that their patents will be worth something (they have x86 agreements with Intel which I am not sure would be still around after the purchase, along with other cross licensing agreements with dozens of other companies), and that they keep the majority of the engineering talent. That's a pretty big gamble. Plus there would be a huge court battle over any licensing agreements. It would be a mess.

Even if there is a takeover, you still might be "wrong" and it still may be a "loss for the consumers". Remember that even in a takeover of a company that is ailing (look at Attachmate buying Novell, which in turn was later bought by Micro Focus International, for example), there will be layoffs and some divisions will be sold off. They may just take the parts they like and get rid of what is a money losing division (I'm sure AMD has lots of those). The problem is often the "money losers" are critical divisions that enabled the money winners to profit. You're gambling that the buyer will recognize that and plan accordingly. Not saying what you're proposing couldn't happen, but it's not probable that the buyer will keep everything intact and will just use their money to challenge Nvidia and Intel.

Quote:


> Originally Posted by *47 Knucklehead*
> 
> I agree, that is why I said that this is largely just a bunch of hooie.


For reasons Mahigan has already described, what you are saying is simply not possible. The 290X and Fury X are simply more parallel for architectural reasons.

It will have to wait for Pascal, or perhaps Volta. The leaked Nvidia slides show that they know that preemption is the future and they were working on it. I don't think they realized how fast it would be deployed or the wording would be quite different.

Quote:


> Originally Posted by *47 Knucklehead*
> 
> You obviously missed the part where Oxide was a major developer of Mantle in the past.
> 
> You also obviously missed the part where Oxide has said that they hope to improve another aspect of the AMD system from 20% to 50% to 100%.
> 
> So don't try to pretend that Oxide doesn't have long history with AMD, nor have they said that they are doing things other than just doing simple DirectX 12 calls and not doing any vendor specific optimizations.


They said they would move their engine into compute. That's quite different from saying they are supporting purely AMD hardware at Nvidia's expense. They've even said that Nvidia has contributed code to their base, after having access for over a year.

Actually Nvidia will benefit too - once they make more Compute centric GPUs. Actually, even their existing GPUs with very strong shader performance right now may benefit.

I would not be surprised if going forward, Nvidia matched or exceeded AMD's GPUs in Compute. Judging by the number of games that are being released, they might have to. DX12 adoption may very well be much faster. That's what happened in tessellation as Mahigan has noted.

Given this, I would not be surprised if like AMD, they adopted HDLs (High Density LIbraries) to maximize the amount of shaders they can put per mm^2 into a GPU. They haven't been afraid to adopt competing strategies. AMD for example originally adopted the small die, aggressive process mentality. Nvidia copied this after the disastrous Fermi. AMD had initially launched a small lower end GPU and appreciated the limitations of the next node, which is why their 5870 did better. Nvidia learned it's lesson, so Kepler was of course much more efficient. Kepler was launched on a small die and only then a full fat version released. Maxwell followed this with a small die, medium one, and only then a giant reticle limit sized one. Pascal and all future Nvidia GPUs will likely follow this as well.

Quote:


> Originally Posted by *Mahigan*
> 
> AMD hardware partnerships have always resulted in games where NVIDIA products perform just as well, if not better.
> 
> Since AMD have experience in coding closer to Metal, Mantle, then it is only logical that game developers partner with AMD in order to make use of that expertise in this uncharted territory.
> 
> That's likely why we see AMD obtaining most of the partnerships for the upcoming DX12 titles.
> 
> The one wildcard, in all of this, is asynchronous compute. So to make a big fuss about it is warranted. Why is it warranted? Because Asynchronous compute is a means by which a developer can optimize for GCN. It will be used, in all the titles, which have partnered or affiliated themselves with AMD. You can bet on it.
> 
> While a developer may have been limited in the way they could optimize for GCN under DX11, this is not the case for DX12. DX12 exposes GCN's parallel architectural advantages to the developer in a way which was impossible under DX11.
> 
> Based on this, I expect some stiff competition between AMD and NVIDIA in a rather short time.


We'll have to wait and see. I've already noted that even during the days of 5870 vs Fermi, Nvidia was able to hold ground.

But you may be right. This time may be different, although I expect the fluctuation to be smaller. AMD may very well be so low that they have nowhere to go but up at this point.

Edit: Mahigan, Joel has been following the discussion and has written an article on Extreme Tech:
http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far


----------



## Xuper

Quote:


> Originally Posted by *47 Knucklehead*
> 
> You obviously missed the part where Oxide was a major developer of Mantle in the past.
> 
> You also obviously missed the part where Oxide has said that they hope to improve another *aspect of the AMD system* from *20%* to *50%* to *100%*.
> 
> So don't try to pretend that Oxide doesn't have long history with AMD, nor have they said that they are doing things other than just doing simple DirectX 12 calls and not doing any vendor specific optimizations.


Geeeeeeee! He forgot this ! Nvidia Said we Support Async Compute.it's not AMD thing.it's From DX12.



Plus, our Maxwell and Kepler GPU architectures already support DX12, with support for Fermi coming later.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *CrazyElf*
> 
> You just lost all credibility with me right there. I hate to say this, but there are times you know you're arguing with someone who has lost complete objectivity. Mahigan has a point - consumerism does seem to do this.
> 
> You are describing what happens in a takeover, not a bankruptcy. So far, there have been no plans for an AMD takeover. Even during a takeover, there are a lot of layoffs usually because there are redundancies between two companies. Of course, in today's corporate culture, ironically, many of those laid off are often not "redundant" as the companies learn the hard way later, but many are obsessed with quarterly results ....


Exactly, I think that AMD will be split up and sold off, and that would be the best thing for everyone. They won't just go bankrupt. But even if they did, someone would buy up their assets and continue on.

I never said there wouldn't be layoffs. To think that there wouldn't be is silly. Much of management would be let go ... and in the case of AMD, that is a good thing, after all, it is management (or in their case, mismanagement) that got them in this problem. They have very technically smart engineers, and any company taking over (or buying up the parts if a bankruptcy ... which will never happen ... AMD will make a deal first), will surely be snapped up by their new parent company.

Quote:


> Originally Posted by *CrazyElf*
> 
> If you want to see a high profile recent bankruptcy, look at what happened during the bankruptcy Nortel Networks. That's a huge gamble your taking. When a company goes bankrupt, often they don't just get absorbed into another company.
> 
> Overview of bankruptcies:
> 
> A board of trustees will be appointed to oversee the bankruptcy process.
> The investors lose everything.
> The creditors will get what's left, which is often pennies per dollar.
> Individual operating divisions are often auctioned off.
> What remaining assets are usually liquidated at auction to many companies (this can include things like patents, PPE not already auctioned off, etc).
> A fund is created (or they use the old company's fund) for overseeing the worker benefits (retirement, health, etc). Sometimes the taxpayer may be liable during shortfalls.
> 
> As for the engineers, they don't 100% suddenly go to a new company that will continue operations as is. Most will lose their jobs and have to find new jobs. Some may be kept during the bankruptcy process to facilitate any transitions and others may be part of the operating divisions bought off, but there's no denying that many will lose their jobs.
> 
> For the record, the US government may very well allow a monopoly. They never broke up Microsoft. I know this is not a politics blog, but as of late, they've allowed more monopolies at the expense of consumers.
> 
> 
> 
> 
> 
> 
> 
> Monsanto is an example and they've been on a rapid buying spree as of late. They dropped their plans to buy Syngenta, but they've got their eye on new purchases and whatever emerges will be a powerful biotech corporation - to the detriment of the end consumer. Also, I'll note that the majority of 3DFX's assets were acquired by Nvidia. What's stopping them from doing the same to AMD? The combined company may not combine the best of both worlds; they will surely use it to milk. Nvidia's behaviour too in the past has not been exemplary, as many have noted (most notably Linus Torvalds).
> 
> The only other thing I will note is that they don't just appointed "anyone" as trustees, there are dedicated professionals that do this line of work. Typically an accounting firm is appointed for large bankruptcies (EY was for Nortel for example).
> 
> In Canada (this will be very relevant for AMD Markham's assets should they ever go bankrupt):
> http://www.cairp.ca/general-public/what-is-a-cirp/
> 
> And in the US:
> https://www.aira.org/aira/about_us


Again, I don't think that AMD will go bankrupt, so I'm sorry to say, everything you typed above is a moot point in this case.

Quote:


> Originally Posted by *CrazyElf*
> 
> As I said, you're gambling that there will be someone that wants to buy AMD. I'm not saying it cannot happen (it could). But in order for that to happen, they would have to see AMD being worth buying (and considering all you've said, if you were a company flush with cash, would AMD really be the best possible acquisition even if very cheap for a company its size?)), gamble that their patents will be worth something (they have x86 agreements with Intel which I am not sure would be still around after the purchase, along with other cross licensing agreements with dozens of other companies), and that they keep the majority of the engineering talent. That's a pretty big gamble. Plus there would be a huge court battle over any licensing agreements. It would be a mess.


There have been a ton of companies who have already expressed an interest in buying AMD/ATI. Samsung is one of them. They would love to add to their list of IP and make Apple pay even more.

The x86 Agreement is a sticking point, and depending on what lawyer you talk to, may or may not transfer. But then again, we aren't talking about a mom and pop shop who are paying $70,000 in a risk to see if Intel balks. No matter who buys them for tens or more millions of dollars, will be talking to Intel first and cutting a side deal. Remember it is in Intel's best interest to "play nice" with the new owner. Because while AMD licenses x86 FROM Intel, they license x64 TO Intel. So the new owner would work Intel BEFORE a sale. Also, Intel wants someone else to hold that license. It makes the government "less nervous".


----------



## sugarhell

Its easy actually to search for architecture info. Instead of speculate you can actually read articles.

http://www.realworldtech.com/kepler-brief/
You can read about kepler and the mindset shift vs Fermi. tl;dr; kepler is a great graphic unit not that good for general compute workload





 With Kanter 1:21:00 some great info

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/gr_proc_req_for_enabling_immer_VR.pdf Page 10+

http://www.gamedev.net/topic/671517-maxwells-asynchronous-compute-flaw/#entry5250608

tl;dr; Maxwell/kepler is perfect for dx11 development. Contexts switches at draw calls are the perfect modeling for dx11. Not that good for dx12 and VR. GCN is more like a CPU but still a gpu.

@Mahigan A gift for you(actually i will remove it i will send it with pm)


----------



## provost

Quote:


> Originally Posted by *CrazyElf*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> You just lost all credibility with me right there. I hate to say this, but there are times you know you're arguing with someone who has lost complete objectivity. Mahigan has a point - consumerism does seem to do this.
> 
> You are describing what happens in a takeover, not a bankruptcy. So far, there have been no plans for an AMD takeover. Even during a takeover, there are a lot of layoffs usually because there are redundancies between two companies. Of course, in today's corporate culture, ironically, many of those laid off are often not "redundant" as the companies learn the hard way later, but many are obsessed with quarterly results ....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> If you want to see a high profile recent bankruptcy, look at what happened during the bankruptcy Nortel Networks. That's a huge gamble your taking. When a company goes bankrupt, often they don't just get absorbed into another company.
> 
> Overview of bankruptcies:
> 
> A board of trustees will be appointed to oversee the bankruptcy process.
> The investors lose everything.
> The creditors will get what's left, which is often pennies per dollar.
> Individual operating divisions are often auctioned off.
> What remaining assets are usually liquidated at auction to many companies (this can include things like patents, PPE not already auctioned off, etc).
> A fund is created (or they use the old company's fund) for overseeing the worker benefits (retirement, health, etc). Sometimes the taxpayer may be liable during shortfalls.
> 
> As for the engineers, they don't 100% suddenly go to a new company that will continue operations as is. Most will lose their jobs and have to find new jobs. Some may be kept during the bankruptcy process to facilitate any transitions and others may be part of the operating divisions bought off, but there's no denying that many will lose their jobs.
> 
> For the record, the US government may very well allow a monopoly. They never broke up Microsoft. I know this is not a politics blog, but as of late, they've allowed more monopolies at the expense of consumers.
> 
> 
> 
> 
> 
> 
> 
> Monsanto is an example and they've been on a rapid buying spree as of late. They dropped their plans to buy Syngenta, but they've got their eye on new purchases and whatever emerges will be a powerful biotech corporation - to the detriment of the end consumer. Also, I'll note that the majority of 3DFX's assets were acquired by Nvidia. What's stopping them from doing the same to AMD? The combined company may not combine the best of both worlds; they will surely use it to milk. Nvidia's behaviour too in the past has not been exemplary, as many have noted (most notably Linus Torvalds).
> 
> The only other thing I will note is that they don't just appointed "anyone" as trustees, there are dedicated professionals that do this line of work. Typically an accounting firm is appointed for large bankruptcies (EY was for Nortel for example).
> 
> In Canada (this will be very relevant for AMD Markham's assets should they ever go bankrupt):
> http://www.cairp.ca/general-public/what-is-a-cirp/
> 
> And in the US:
> https://www.aira.org/aira/about_us
> 
> As I said, you're gambling that there will be someone that wants to buy AMD. I'm not saying it cannot happen (it could). But in order for that to happen, they would have to see AMD being worth buying (and considering all you've said, if you were a company flush with cash, would AMD really be the best possible acquisition even if very cheap for a company its size?)), gamble that their patents will be worth something (they have x86 agreements with Intel which I am not sure would be still around after the purchase, along with other cross licensing agreements with dozens of other companies), and that they keep the majority of the engineering talent. That's a pretty big gamble. Plus there would be a huge court battle over any licensing agreements. It would be a mess.
> 
> Even if there is a takeover, you still might be "wrong" and it still may be a "loss for the consumers". Remember that even in a takeover of a company that is ailing (look at Attachmate buying Novell, which in turn was later bought by Micro Focus International, for example), there will be layoffs and some divisions will be sold off. They may just take the parts they like and get rid of what is a money losing division (I'm sure AMD has lots of those). The problem is often the "money losers" are critical divisions that enabled the money winners to profit. You're gambling that the buyer will recognize that and plan accordingly. Not saying what you're proposing couldn't happen, but it's not probable that the buyer will keep everything intact and will just use their money to challenge Nvidia and Intel.
> For reasons Mahigan has already described, what you are saying is simply not possible. The 290X and Fury X are simply more parallel for architectural reasons.
> 
> It will have to wait for Pascal, or perhaps Volta. The leaked Nvidia slides show that they know that preemption is the future and they were working on it. I don't think they realized how fast it would be deployed or the wording would be quite different.
> They said they would move their engine into compute. That's quite different from saying they are supporting purely AMD hardware at Nvidia's expense. They've even said that Nvidia has contributed code to their base, after having access for over a year.
> 
> Actually Nvidia will benefit too - once they make more Compute centric GPUs. Actually, even their existing GPUs with very strong shader performance right now may benefit.
> 
> I would not be surprised if going forward, Nvidia matched or exceeded AMD's GPUs in Compute. Judging by the number of games that are being released, they might have to. DX12 adoption may very well be much faster. That's what happened in tessellation as Mahigan has noted.
> 
> Given this, I would not be surprised if like AMD, they adopted HDLs (High Density LIbraries) to maximize the amount of shaders they can put per mm^2 into a GPU. They haven't been afraid to adopt competing strategies. AMD for example originally adopted the small die, aggressive process mentality. Nvidia copied this after the disastrous Fermi. AMD had initially launched a small lower end GPU and appreciated the limitations of the next node, which is why their 5870 did better. Nvidia learned it's lesson, so Kepler was of course much more efficient. Kepler was launched on a small die and only then a full fat version released. Maxwell followed this with a small die, medium one, and only then a giant reticle limit sized one. Pascal and all future Nvidia GPUs will likely follow this as well.
> We'll have to wait and see. I've already noted that even during the days of 5870 vs Fermi, Nvidia was able to hold ground.
> 
> But you may be right. This time may be different, although I expect the fluctuation to be smaller. AMD may very well be so low that they have nowhere to go but up at this point.


Crazy Elf you continue to surprise me, and here I thought you were more on the technical side.









Yep, any re-org (other than of the balance sheet by existing stakeholders, or consensual petition by existing stakeholders, including management ), pre- post Chap 11 restructuring of AMD would essentially mean that Nvidia would be the sole discreet GPU vendor.
I won't discuss the merits and opportunity of this scenario from any perspective other than the consumers'; and it would be a lose-lose for the consumers no matter how you cut it. ...


----------



## PostalTwinkie

Quote:


> Originally Posted by *Mahigan*
> 
> Neutral:
> Arma 3 (coming with map pack iirc)
> *Dayz Standalone*
> Killer Instinct (Microsoft)
> Halo Wars 2 (Microsoft)
> *Star Citizen*


Ha! Those are never coming out!

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Exactly, I think that AMD will be split up and sold off, and that would be the best thing for everyone. They won't just go bankrupt. But even if they did, someone would buy up their assets and continue on.
> 
> I never said there wouldn't be layoffs. To think that there wouldn't be is silly. Much of management would be let go ... and in the case of AMD, that is a good thing, after all, it is management (or in their case, mismanagement) that got them in this problem. They have very technically smart engineers, and any company taking over (or buying up the parts if a bankruptcy ... which will never happen ... AMD will make a deal first), will surely be snapped up by their new parent company.
> Again, I don't think that AMD will go bankrupt, so I'm sorry to say, everything you typed above is a moot point in this case.
> There have been a ton of companies who have already expressed an interest in buying AMD/ATI. Samsung is one of them. They would love to add to their list of IP and make Apple pay even more.
> 
> The x86 Agreement is a sticking point, and depending on what lawyer you talk to, may or may not transfer. But then again, we aren't talking about a mom and pop shop who are paying $70,000 in a risk to see if Intel balks. No matter who buys them for tens or more millions of dollars, will be talking to Intel first and cutting a side deal. Remember it is in Intel's best interest to "play nice" with the new owner. Because while AMD licenses x86 FROM Intel, they license x64 TO Intel. So the new owner would work Intel BEFORE a sale. Also, Intel wants someone else to hold that license. It makes the government "less nervous".


I think the fact that both Jim's are back at AMD and working on Zen would be fairly enticing to a prospective buyer. When AMD poached Keller back from Apple, I am thinking they paid a pretty penny and are betting huge on the guy.

Hmm....

People underestimate the value in AMD, and their people. AMD's issues have been one bad management decision after another it seems. Maybe bad decisions that simply were forced by lack of resources ($)? AMD right now has some of the best people in the World, but seem to be missing something!

If AMD went boobs up, I don't think we would see a fire-sale like other Bankruptcy situations. Sure, massive cuts would be made, but I think AMD for the consumer would live on in some fashion. Even if it was under a new name.


----------



## DeathMade

Hey guys. Not related to the topic. But since there is a lot of AMD discussion and a lot of "Future for AMD" you might be interested in this

http://semiaccurate.com/2015/09/08/behind-amds-new-funding/

But only for subscribers :/


----------



## provost

Except AMD won't... AND, I am not even sure why we are even talking about AMD's doom and gloom...








Here is a thread about AMD moving the industry forward (I would say the same if it were Nvidia doing it) for the benefit of all pc gamers, so instead of focusing on the topic, I am not sure how we turned to doom and gloom predictions... lol

I guess, I am as guilty as any by participating in it - sorry to derail this otherwise generally positive thread


----------



## 47 Knucklehead

Quote:


> Originally Posted by *PostalTwinkie*
> 
> If AMD went boobs up, I don't think we would see a fire-sale like other Bankruptcy situations. Sure, massive cuts would be made, but I think AMD for the consumer would live on in some fashion. Even if it was under a new name.


Of course they would. But like I said before, AMD won't go bankrupt, they would be sold. Just like ATI was sold and all their IPs and the vast majority of the engineers went to work for AMD. Sure many of the middle managers and upper managers were out of a job, but let's get serious here, does the end customer really give a damn who is the CEO or middle manager of a company? No, they only care if the product is good, and that means they only really SLIGHTLY care about the engineers, who are making the products.

The sale of either the CPU -OR- the GPU side of AMD would be a big cash cow for the remaining side. That would then allow them to spend their limited budget (which if you listen to Lisa Su, the majority of AMD's R&D budget was on Zen, and that is why you only have Fury and HBM and the rest of the line is just a 200 series redo. Imagine if AMD made deals to sell off the ATI portion of the company for a couple billion (and of course negotiate a side deal so they can still use the APU's) what that money would mean to fighting Intel now that AMD is focused. Or the other way, imagine if AMD sold off their CPU section (and worked a deal for the APU's) what that would mean to the graphics card side of thing. They could seriously compete with nVidia and their MASSIVE marketing budget.

You think that AMD had power to develop Async Compute and push development with DX12 with Microsoft before? Imagine AMD with more focus and money. They would absolutely make real competition in the market.


----------



## provost

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Of course they would. But like I said before, AMD won't go bankrupt, they would be sold. Just like ATI was sold and all their IPs and the vast majority of the engineers went to work for AMD. Sure many of the middle managers and upper managers were out of a job, but let's get serious here, does the end customer really give a damn who is the CEO or middle manager of a company? No, they only care if the product is good, and that means they only really SLIGHTLY care about the engineers, who are making the products.
> 
> The sale of either the CPU -OR- the GPU side of AMD would be a big cash cow for the remaining side. That would then allow them to spend their limited budget (which if you listen to Lisa Su, the majority of AMD's R&D budget was on Zen, and that is why you only have Fury and HBM and the rest of the line is just a 200 series redo. Imagine if AMD made deals to sell off the ATI portion of the company for a couple billion (and of course negotiate a side deal so they can still use the APU's) what that money would mean to fighting Intel now that AMD is focused. Or the other way, imagine if AMD sold off their CPU section (and worked a deal for the APU's) what that would mean to the graphics card side of thing. They could seriously compete with nVidia and their MASSIVE marketing budget.
> 
> You think that AMD had power to develop Async Compute and push development with DX12 with Microsoft before? Imagine AMD with more focus and money. They would absolutely make real competition in the market.


Quote:


> Originally Posted by *47 Knucklehead*
> 
> Of course they would. But like I said before, AMD won't go bankrupt, they would be sold. Just like ATI was sold and all their IPs and the vast majority of the engineers went to work for AMD. Sure many of the middle managers and upper managers were out of a job, but let's get serious here, does the end customer really give a damn who is the CEO or middle manager of a company? No, they only care if the product is good, and that means they only really SLIGHTLY care about the engineers, who are making the products.
> 
> The sale of either the CPU -OR- the GPU side of AMD would be a big cash cow for the remaining side. That would then allow them to spend their limited budget (which if you listen to Lisa Su, the majority of AMD's R&D budget was on Zen, and that is why you only have Fury and HBM and the rest of the line is just a 200 series redo. Imagine if AMD made deals to sell off the ATI portion of the company for a couple billion (and of course negotiate a side deal so they can still use the APU's) what that money would mean to fighting Intel now that AMD is focused. Or the other way, imagine if AMD sold off their CPU section (and worked a deal for the APU's) what that would mean to the graphics card side of thing. They could seriously compete with nVidia and their MASSIVE marketing budget.
> 
> You think that AMD had power to develop Async Compute and push development with DX12 with Microsoft before? Imagine AMD with more focus and money. They would absolutely make real competition in the market.


Hmmmm

I would be glad to hear a list of possible suitors/strategics who would have a bona fide strategic interest in acquiring and funding the discreet GPU division to effectively compete with Nvidia? And, the emphasis here is on bona fide strategic interest, based on "fit"......

So, no wild guesses and pie in the sky scenarios... Lol


----------



## 47 Knucklehead

Quote:


> Originally Posted by *provost*
> 
> Hmmmm
> 
> I would be glad to hear a list of possible suitors/strategics who would have a bona fide strategic interest in acquiring and funding the discreet GPU division to effectively compete with Nvidia? And, the emphasis here is on bona fide strategic interest, based on "fit"......
> 
> So, no wild guesses and pie in the sky scenarios... Lol


How about Samsung. Who already makes TV's, Laptops, Tablets, and Phones?

How about Apple, who does the same?

How about Sony, who does the same ... and has a stake in the Console market as well?

The first two are flush with cash.


----------



## Mahigan

New article up on extremetech








http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far


----------



## Silent Scone

Quote:


> Originally Posted by *Mahigan*
> 
> New article up on extremetech
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far


Ah yes, Joel.....

Thanks for the link


----------



## PostalTwinkie

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Of course they would. But like I said before, AMD won't go bankrupt, they would be sold.


I am comfortable fully agreeing with this. I guess I am using the word Bankruptcy not in the literal sense, but that of _"We sell ourselves, or go into administration (Bankruptcy)"_. I wouldn't be surprised if AMD hasn't already discussed, seriously, that option with one or more prospective buyers.

AMD has to know they are betting it all on Zen. If Zen fails, I don't see AMD surviving. Unless DX 12 pulls a Miracle for them and causes a massive GPU market shift. A massive GPU market shift, with the consoles, might see them through a less than amazing Zen launch.

There are way too many damn _"What if"_'s with AMD, and that is why their Stock is in the toilet.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> New article up on extremetech
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far


And as I've said a dozen times before (and so have you) ...
Quote:


> Either way, leaping to conclusions about which company will "win" the DX12 era is extremely premature.


This is ONE benchmark from a "pre-beta beta" game that won't be ready for many months to come.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> New article up on extremetech
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far










as you said before mahigan wait and see


----------



## Xuper

Quote:


> Originally Posted by *Mahigan*
> 
> New article up on extremetech
> 
> 
> 
> 
> 
> 
> 
> 
> http://www.extremetech.com/extreme/213519-asynchronous-shading-amd-nvidia-and-dx12-what-we-know-so-far


Quote:


> "Final result (for now): AMD GPUs are capable of handling a much higher load. About 10x times what Nvidia GPUs can handle. *But they also need also about 4x the pressure applied before they get to play out there capabilities*."


what does it mean ?


----------



## provost

Quote:


> Originally Posted by *47 Knucklehead*
> 
> How about Samsung. Who already makes TV's, Laptops, Tablets, and Phones?
> 
> How about Apple, who does the same?
> 
> How about Sony, who does the same ... and has a stake in the Console market as well?
> 
> The first two are flush with cash.


I did say bona fide strategic interest right ....lol

There are a lot of consumer tech companies that are flushed with cash, but it doesn't mean they give a hoot about consumer discreet gpu market , from a strategic perspective.

AMD is a public company, and it's board has fiduciary responsibility to maximize shareholder value (not consumer or employees value... Lol), if there was a bonafide interest from any of these names (which I highly doubt), then it would have already happened.

I think you know AMD's financials better than I... Lol, and you know its debt burden and working capital requirements, not to mention contingent obligations, so the premise that AMD going through forced sale , Bk involuntary or otherwise, would somehow turn up a white knight in shining armor is about as ridiculous as hoping that a wolf can guard the henhouse better than, you know...

BK or "reluctant sale" processes only turn up bottom feeders that are looking at create an arbitrage or some strategic value. Furthermore, the Pc industry is in a lot different place (lower sales, declining estimates) than when ATI was sold to AMD. You have to look at the current market, and not compare it to the good old days of the Big Blue (ok I am exaggerating here...







)


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> And as I've said a dozen times before (and so have you) ...
> This is ONE benchmark from a "pre-beta beta" game that won't be ready for many months to come.


*coughs coughs*
http://www.eteknix.com/amd-r9-290x-goes-head-to-head-with-titan-x-with-dx12/
http://www.guru3d.com/news-story/quick-test-directx-12-api-overhead-benchmark.html
http://www.gamersnexus.net/guides/1885-dx12-v-mantle-v-dx11-benchmark
You seem to forget things that are against your arguments. Granted these aren't fully fledged benchmarks but low and behold they're showing roughly the same traits this pre-beta game







does .


----------



## Mahigan

@Xuper
Under low asynchronous loads... You don't get any real benefit from GCN. One you up the load, you start to see GCN shine.

This is testing async compute alone. We have to keep in mind that compute loads are already present in DX12 titles. Asynchronous shading allows the developer to tap into under utilized resources (idling CUs).


----------



## Silent Scone

Essentially it's a far cry from a real world test but insightful none the less


----------



## 47 Knucklehead

I'm done with this.

Someone bump this thread for me when AotS is actually released and all this pre-beta beta garbage actually means something ... not that I'm really into third person RTS games anyway.

Have fun arguing guys.


----------



## Noufel

Quote:


> Originally Posted by *Mahigan*
> 
> @Xuper
> Under low asynchronous loads... You don't get any real benefit from GCN. One you up the load, you start to see GCN shine.
> 
> This is testing async compute alone. We have to keep in mind that compute loads are already present in DX12 titles. Asynchronous shading allows the developer to tap into under utilized resources (idling CUs).


i hope that in one year or two the future engines will bring us cinematic quality games with the full controle on compute possibilities brought by Async shading


----------



## Mahigan

@Noufel

Same here bro, same here


----------



## GorillaSceptre

Quote:


> Originally Posted by *47 Knucklehead*
> 
> I'm done with this.
> 
> Someone bump this thread for me when AotS is actually released and all this pre-beta beta garbage actually means something ... not that I'm really into third person RTS games anyway.
> 
> Have fun arguing guys.


As per the usual for you, you've come into a thread, kicked up a big fuss using convoluted arguments that have no bearing for the topic at hand.

Instead of proposing your own theory, or adding constructive criticism, all you've done is misunderstand what Async Compute is, brought up AMD going bankrupt and their DX9 performance..

Then after all that you say "Have fun arguing"


----------



## semitope

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Exactly, I think that AMD will be split up and sold off, and that would be the best thing for everyone. They won't just go bankrupt. But even if they did, someone would buy up their assets and continue on.


if we are talking about whats good for consumers it would be if nvidia went and was replaced by a more forward thinking company. Losing AMD would be terrible for PC gamers considering its because of them we can be happy about what dx12 is, because of them we have HBM and not GDDR6 or some foolishness. etc.

Nvidia is a standard company, benefiting off other people's inventions and doing what they can to milk the market. If we are to talk about either of them going bankrupt for the sake of consumers, it should be nvidia.

That talk is pointless though. To me you just seem like you want AMD gone so there's no chance you are worse off for picking nvidia GPUs.


----------



## Klocek001

Quote:


> Originally Posted by *semitope*
> 
> if we are talking about whats good for consumers it would be if nvidia went and was replaced by a more forward thinking company. Losing AMD would be terrible for PC gamers considering its because of them we can be happy about what dx12 is, because of them we have HBM and not GDDR6 or some foolishness. etc.
> 
> Nvidia is a standard company, benefiting off other people's inventions and doing what they can to milk the market. If we are to talk about either of them going bankrupt for the sake of consumers, it should be nvidia.
> 
> That talk is pointless though. To me you just seem like you want AMD gone so there's no chance you are worse off for picking nvidia GPUs.


I think you should've used "progres oriented" rather than "forward thinking". They are thinking forward my man, you better believe me they are. They're already thinking of that money you wanna spend on their GPUs every 12-18 months







As long as those cards deliver it's fine by me tho







How the hell do you stick to the same card longer than that anyway, I was bored with my 290 trix after 10 months so bad... It's nice to see your GPU take a breath of fresh air with a new API but really I'd want to run that on a latest gen card, which may or may not come in 2016. I think it's fair to say it may take as long as 2 years from now to really put that dx12's pedal to the metal with cards which are in their early stages of planning now.


----------



## Redwoodz

Quote:


> Originally Posted by *47 Knucklehead*
> 
> How about Samsung. Who already makes TV's, Laptops, Tablets, and Phones?
> 
> How about Apple, who does the same?
> 
> How about Sony, who does the same ... and has a stake in the Console market as well?
> 
> The first two are flush with cash.


So you think AMD is ripe for takeover,and has a lot of value yet you fight anyone buying their products.









Maybe if a few more people invested in their products they would have more money for R&D.


----------



## Klocek001

Quote:


> Originally Posted by *Redwoodz*
> 
> So you think AMD is ripe for takeover,and has a lot of value yet you fight anyone buying their products.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Maybe if a few more people invested in their products they would have more money for R&D.


like what people?


----------



## Forceman

Quote:


> Originally Posted by *Redwoodz*
> 
> Maybe if a few more people invested in their products they would have more money for R&D.


Maybe if they had a more compelling product offering a few more people would invest in their products.


----------



## infranoia

Quote:


> Originally Posted by *Mahigan*
> 
> @Xuper
> Under low asynchronous loads... You don't get any real benefit from GCN. One you up the load, you start to see GCN shine.
> 
> This is testing async compute alone. We have to keep in mind that compute loads are already present in DX12 titles. Asynchronous shading allows the developer to tap into under utilized resources (idling CUs).


The ExtremeTech article Mahigan links to above had a B3D post link that I had to share. Great analogy from "ToTTenTranz": https://forum.beyond3d.com/posts/1869621/
Quote:


> Imagine I have a wall-painting workforce of 2048 people.
> My 2048 people are versatile enough. They can all do one of two tasks: 1) wash the wall and 2) paint.
> Each of my 2048 workers can be assigned only one task (wash or paint) each day.
> For telling my 2048 workers what they should do each day, I have two additional employees:
> A) Wash-Teamer who organizes teams for washing the wall
> B) Paint-Teamer who organizes teams for painting the wall
> 
> Now for older generations of GPUs with Compute capabilities (Kepler, Maxwell 1, pre-GCN, etc.), Wash-Teamer really disliked Paint-Teamer. They couldn't even be in the same room without throwing insults towards each other's mother.
> I couldn't have that. I want a safe and friendly environment in my company.. so what happens is that Wash-Teamer never comes in the same days as Paint-Teamer.
> My company is still efficient enough. I can predict my paint jobs rather well so I know when I'll be needing more people to wash and tell Wash-Teamer to come, or more people to paint and tell Paint-Teamer to come instead.
> 
> But this of course isn't the most efficient way to distribute my 2048 workers. For example, some days I only have painting space for 1536 painters, so it would be great if the remaining 512 who are sitting there playing cards and drinking beer (I'm a cool boss. I allow drinking beer during work hours. Deal with it.) could move on to the next block and start washing another wall.
> Even worse: lots of times we have a very strict deadline to paint a wall, but there's only one little bit missing! I can only put something like 64 workers in that wall, which means that during that whole day, I have no less than 1984 workers sitting around, doing nothing. Drinking beer. Almost 2000 guys! Those are some troublesome days...
> 
> Come GCN (and supposedly Maxwell 2..), and Wash-Teamer has finally made peace with Paint-Teamer!
> Now they both come to work everyday, they compliment each others' shoes first thing in the morning and proceed to distribute my 2048 workers in the best possible way.
> What happens now is that I sometimes have 1664 workers (shaders) doing the painting (Rendering) and 384 workers doing the washing (Compute). Other days I need more Rendering, so I get 1920 workers doing Rendering, but I can still spare some 128 shaders doing the Compute.
> 
> And thanks to this, by the end of the month my company is painting walls a lot faster.
> Man, I wish I had listened to my friend Mark Cerny back in 2013. He insisted that Wash-Teamer and Paint-Teamer needed to be friends over two years ago!
> 
> And to summarize: GCN results are showing that Wash-Teamer and Paint-Teamer are indeed showing up to work at once.
> On nVidia GPUs, they're supposed to be friends with each other, yet they refuse to show up at the same time, instead having one showing up after the other leaves the precint, while pretending to be at the same time. Perhaps they just told the press they were friends and would work together, but still can't stand one another.


----------



## Mahigan

Quote:


> Originally Posted by *infranoia*
> 
> The ExtremeTech article Mahigan links to above had a B3D post link that I had to share. Great analogy from "ToTTenTranz": https://forum.beyond3d.com/posts/1869621/


Another bonus is the shared Toolbox the painters and washers use (Shared Cache). In essence, GCN has a larger shared toolbox. So when the Painters and Washers need to share tools, more tools can be placed in GCN's toolbox than on Kepler/Maxwell/Maxwell 2's toolbox. As an added bonus, the toolbox, on GCN, is better organized, it makes it quicker to find the tools you're looking for (More Bandwidth/FLOP). These characteristics of the Toolbox (Shared Cache), allow GCN to both: Have access to more tool space and to retrieve tools quicker than Kepler/Maxwell/Maxwell 2.










Sources:
http://www.realworldtech.com/kepler-brief/
http://www.fz-juelich.de/SharedDocs/Downloads/IAS/JSC/EN/slides/nvidia-ws-2014/02-adinets-maxwell.pdf


----------



## Defoler

Quote:


> Originally Posted by *semitope*
> 
> if we are talking about whats good for consumers it would be if nvidia went and was replaced by a more forward thinking company. Losing AMD would be terrible for PC gamers considering its because of them we can be happy about what dx12 is, because of them we have HBM and not GDDR6 or some foolishness. etc.
> 
> Nvidia is a standard company, benefiting off other people's inventions and doing what they can to milk the market. If we are to talk about either of them going bankrupt for the sake of consumers, it should be nvidia.
> 
> That talk is pointless though. To me you just seem like you want AMD gone so there's no chance you are worse off for picking nvidia GPUs.


Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
G-sync forced freesync.
Ganeworks forced mantle.
Cuda forced GPGPU on AMD.
Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.

Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.

People love to hate nvidia and gameworks, but overall, gameworks brought new visual effects to PC gaming way more than AMD brought out. AMD have been trying to make things look better by shear force of their GPU, but nvidia are trying to find new things.

Anyway, no one wants to see AMD gone. They want to see AMD bought by someone who will push them forward (in both CPU and GPU).
Those are huge differences which people keep skipping.
I want samsung to buy AMD's CPU division so they can use their own quality FABs and knowledge to push it forward. Or maybe qualcomm and seeing them making better SoC for desktops. And maybe see someone else enter the GPU market, with more determination than just using using smear campaigns when they can't bring new tech (aka crying about gameworks every other tuesday).


----------



## mtcn77

You forgot Nvidia's Directx 12_1 Rasterizer Ordered Views triggering AMD's Dx11 Order Independent Transparencies. _Wait,_ that wasn't in the right order... Oh, the puns!















Quote:


> Originally Posted by *Defoler*
> 
> Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
> G-sync forced freesync.
> Ganeworks forced mantle.
> Cuda forced GPGPU on AMD.
> Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
> PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.
> 
> Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.
> 
> People love to hate nvidia and gameworks, but overall, gameworks brought new visual effects to PC gaming way more than AMD brought out. AMD have been trying to make things look better by shear force of their GPU, but nvidia are trying to find new things.
> 
> Anyway, no one wants to see AMD gone. They want to see AMD bought by someone who will push them forward (in both CPU and GPU).
> Those are huge differences which people keep skipping.
> I want samsung to buy AMD's CPU division so they can use their own quality FABs and knowledge to push it forward. Or maybe qualcomm and seeing them making better SoC for desktops. And maybe see someone else enter the GPU market, with more determination than just using using smear campaigns when they can't bring new tech (aka crying about gameworks every other tuesday).


----------



## semitope

Quote:


> Originally Posted by *Defoler*
> 
> Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
> *G-sync forced freesync.
> Ganeworks forced mantle.
> Cuda forced GPGPU on AMD.
> Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
> PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.*
> 
> Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.
> 
> People love to hate nvidia and gameworks, but overall, gameworks brought new visual effects to PC gaming way more than AMD brought out. AMD have been trying to make things look better by shear force of their GPU, but nvidia are trying to find new things.
> 
> Anyway, no one wants to see AMD gone. They want to see AMD bought by someone who will push them forward (in both CPU and GPU).
> Those are huge differences which people keep skipping.
> I want samsung to buy AMD's CPU division so they can use their own quality FABs and knowledge to push it forward. Or maybe qualcomm and seeing them making better SoC for desktops. And maybe see someone else enter the GPU market, with more determination than just using using smear campaigns when they can't bring new tech (aka crying about gameworks every other tuesday).


You have to really stretch to find things, hence your list looking bad. Because all nvidia has been doing is taking existing ideas and making them proprietary. GPGPU predates cuda. I don't see the link between apex and tressfx or the general claims about physx ("improvements" and pushing because we know that wasn't by nvidia). All just reaching.

Gameworks was redundant. It did not introduce new effects, it repackaged a specific implementation to plug into games for the sake of making nvidia look good (and supposedly make things easier to use).


----------



## Defoler

Quote:


> Originally Posted by *semitope*
> 
> Because all nvidia has been doing is taking existing ideas and making them proprietary. GPGPU predates cuda


Those are both lies and shows your in ability to check your own facts.

Cuda was fully released to the public instead of supplied through the specialised cards in 2007. AMD's GPGPU calculations came in 2011.
GPGPU is a general term, but AMD "took" that term to describe their ability in 2011, since they didn't really give it a name.

Also, freesync is eDP 1.4 applied to DP 1.2. G-sync hardware controlled variant monitor hz was completely new.

And with that, since this thread went from fact checking to complete fantasy by you, I'm also done. Keep dreaming etc etc. Must be good dreams.


----------



## Defoler

Quote:


> Originally Posted by *mtcn77*
> 
> You forgot Nvidia's Directx 12_1 Rasterizer Ordered Views triggering AMD's Dx11 Order Independent Transparencies. _Wait,_ that wasn't in the right order... Oh, the puns!


DX11 is software based. DX12_1 is hardware based.
AMD doesn't support DX12 hardware based transparent reorder. Nvidia does.
So when claiming that async calls in software is super bad, but doing transparency in software instead... well....


----------



## mtcn77

Quote:


> Originally Posted by *Defoler*
> 
> DX11 is software based. DX12_1 is hardware based.
> AMD doesn't support DX12 hardware based transparent reorder. Nvidia does.
> So when claiming that async calls in software is super bad, but doing transparency in software instead... well....


I'll have to ask for verification, please.









PS: Not actually Rasteriser Ordered Views, but you get the point...


----------



## NoirWolf

Quote:


> Originally Posted by *Defoler*
> 
> Those are both lies and shows your in ability to check your own facts.
> 
> Cuda was fully released to the public instead of supplied through the specialised cards in 2007. AMD's GPGPU calculations came in 2011.
> GPGPU is a general term, but AMD "took" that term to describe their ability in 2011, since they didn't really give it a name.
> 
> Also, freesync is eDP 1.4 applied to DP 1.2. G-sync hardware controlled variant monitor hz was completely new.
> 
> And with that, since this thread went from fact checking to complete fantasy by you, I'm also done. Keep dreaming etc etc. Must be good dreams.


You can just smell the fanboy in this post.
https://www.olcf.ornl.gov/kb_articles/history-of-the-gpgpu/
http://www.techspot.com/article/659-history-of-the-gpu-part-4/
ATI launched their version in 2008, roughly one year after Nvidia and both had it working since the early 2000s in HPCs.

This is before we get into the notion that AMD broke more technical ground than Nvidia. I am still rather curious if they'll botch Volta as much as I think when they'll cram HBM2 and a hardware scheduler in there ( I don't think they're suicidal enough to go with a die shrink, new memory and a major revision of Maxwell in one gen).


----------



## semitope

Quote:


> Originally Posted by *Defoler*
> 
> Those are both lies and shows your in ability to check your own facts.
> 
> Cuda was fully released to the public instead of supplied through the specialised cards in 2007. AMD's GPGPU calculations came in 2011.
> GPGPU is a general term, but AMD "took" that term to describe their ability in 2011, since they didn't really give it a name.
> 
> Also, freesync is eDP 1.4 applied to DP 1.2. G-sync hardware controlled variant monitor hz was completely new.
> 
> And with that, since this thread went from fact checking to complete fantasy by you, I'm also done. Keep dreaming etc etc. Must be good dreams.


You know... actually... what you are calling a lie is exactly true in these cases. CUDA was a proprietary GPGPU framework. GPGPU predated it. G-sync also took the concept of adaptive sync. Their implementation never made sense to me but made them money.

*I am not thinking about this as AMD vs Nvidia*. It's just that these things were being done some other way by any other group before nvidias proprietary implementation. If you just want to say done before AMD, meh. We can't make claims about what forced this or that but sure companies do things before or after each other. There are many cases like that between AMD and nvidia. Dx12 async support will be another. 12.1 features another (though intel apparently did that long ago)

But looking at bigger things like changing the PC gaming industry etc I would say we do not want AMD gone or under new management/philosophy.


----------



## Forceman

Quote:


> Originally Posted by *mtcn77*
> 
> I'll have to ask for verification, please.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> PS: Not actually Rasteriser Ordered Views, but you get the point...


Robert Hallock good enough?
Quote:


> When a fan asked afterwards what are the aspects of DX12 that the FuryX is missing, Robert replied and listed them.
> 
> "Raster Ordered Views and Conservative Raster.


I think we can safely disregard that entire chart at this point.


----------



## DigiHound

Quote:


> Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
> G-sync forced freesync.
> Ganeworks forced mantle.
> Cuda forced GPGPU on AMD.
> Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
> PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.
> 
> Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.


You've got your timelines and references more than a little scrambled.

"FreeSync" was possible on laptops from the beginning thanks to the eDP standard. Nvidia built the first custom ASICs to enable it on desktop displays, that's true, but it's not as if the functionality didn't exist at all before that. I'd ding AMD for missing a potential marketing opportunity, but not a fundamentally new technology.

GameWorks did not "force" Mantle. If anything, this went the other way. AMD was pushing for a low-latency API, and once it became clear that it would defy Microsoft's wishes and release its own, Nvidia took steps to ensure that it had a secondary approach to doing all of its performance-optimization in-driver. GameWorks gives NV an opportunity to perform its current driver optimizations (which can't work in DX12) in _library_.

This, however, is a competitive approach, not a technical one. On a technical level, GW and Mantle are nothing alike and accomplish entirely different goals.

You have a better point with CUDA and GPGPU. For a long time, CUDA was the only game in town, and I think it's fair to say that NV invested in this area far more than AMD or anyone else on the OpenCL team.

"Apex clothing forced AMD to try out TressFX as their own answer to "visual effects."

Again, I think you've overstated the case here. AMD has no equivalent to GW and no comprehensive package of visual effects that they "push" to developers for incorporation. AMD built TressFX, yes, and it's been used in what -- 2-3 games? The kind of comprehensive, library-oriented approach that NV takes just isn't the same as what AMD does as far as optimization or value-add, and while you can certainly view that as a weakness on AMD's part, they're fundamentally different strategies.

PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.

PhysX has, for the most part, withered and died on the vine. There were 10 PhysX titles launched in 2007 according to Wikipedia. By 2009, that had fallen to six. 2013 was a pretty good year, with eight titles, but there were just three in 2014 and three so far in 2015.

Don't get me wrong, I've always liked PhysX and often use it where available, but GPU-accelerated physics has not caught on in the market and PhysX found more success as a software solution like Havok than it did running on GeForce hardware. NV continues to back it, and a few games continue to ship, but this isn't a major feature. AMD talked up a few PhysX replacements on open source, like Bullet, but none of those really caught on, either.


----------



## Mahigan

Quote:


> Originally Posted by *Defoler*
> 
> Those are both lies and shows your in ability to check your own facts.
> 
> Cuda was fully released to the public instead of supplied through the specialised cards in 2007. AMD's GPGPU calculations came in 2011.
> GPGPU is a general term, but AMD "took" that term to describe their ability in 2011, since they didn't really give it a name.
> 
> Also, freesync is eDP 1.4 applied to DP 1.2. G-sync hardware controlled variant monitor hz was completely new.
> 
> And with that, since this thread went from fact checking to complete fantasy by you, I'm also done. Keep dreaming etc etc. Must be good dreams.


AMD doesn't have the resources, or software arm, to push their own proprietary technology. AMD relies on Open Industry standards. The Standards called the functionality "GPGPU" therefore AMD adopted the industry standard nomenclature. Now it is no secret that I am a big proponent of Open Standards and have been for many many years. There's a reason as to why I support Open Standards and dislike Proprietary Implementations. It all comes down to having a choice. But in order to have healthy competition, an attempt to just protect the weaker side doesn't always make sense or there will be unfair competition to the big player. It's about making the industry healthier and providing more options for consumers to choose whether they want a high-end or entry-level technology, and low cost or high cost .

A few points I'd like to bring up as it pertains to Proprietary vs Open Standards:

The 3Dfx Glide API, where is it now? How well have Direct3D and OpenGL done comparatively?
S3Metal API, where is it now?
Adobe Photoshop and other Software suites, utilized CUDA at first for GPGPU programming. Intel, Apple, AMD, Kronos Group etc Adopted OpenCL. What do most software suites now use?
AMD worked on an open standard, Freesync. Intel have recently adopted the technology and an increasing number of display panels are now being released with it. What do you think will happen to G-Sync in a few years time?
PhysX is a proprietary standard, TressFX(2.0) is a step towards a more Open Physics implementation. What do you think will happen to PhysX in a few more years?
Sony's BETAmax? How well did it fair against VHS?
How well is ARM doing in the mobile segment? What happened to its mobile competitors?
What happened to ATi's Truform?
What happened to ATi's 3Dc?

Now if we look towards more open standards:

GDDR3, GDDR4, GDDR5, who worked on their development? nVIDIA or AMD? (http://www.vrworld.com/2008/11/22/100th-story-gddr5-analysis-or-why-gddr5-will-rule-the-world/)
HBM, who worked on its development? nVIDIA or AMD?
Vulcan API, who worked on its foundational development? nVIDIA or AMD?
Heterogeneous System Architecture (HSA)? nVIDIA or AMD?
Clearly, Open Standards tend to win over proprietary standards across various industries. Open Standards are innovative. They drive the whole industry forward. Creating new opportunities and new solutions to problems. What Open Standards have nVIDIA developed as of late?

All the Proprietary stuff you mentioned, from nVIDIA or anyone else, will die. It's only a matter of time.

PS. nVIDIA did not create PhysX, they bought AGEIA.


----------



## Xuper

Quote:


> Originally Posted by *Defoler*
> 
> Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
> G-sync forced freesync.
> *Gameworks forced mantle.*
> Cuda forced GPGPU on AMD.
> Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
> PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.
> 
> Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.
> 
> People love to hate nvidia and gameworks, but overall, gameworks brought new visual effects to PC gaming way more than AMD brought out. AMD have been trying to make things look better by shear force of their GPU, but nvidia are trying to find new things.
> 
> Anyway, no one wants to see AMD gone. They want to see AMD bought by someone who will push them forward (in both CPU and GPU).
> Those are huge differences which people keep skipping.
> I want samsung to buy AMD's CPU division so they can use their own quality FABs and knowledge to push it forward. Or maybe qualcomm and seeing them making better SoC for desktops. And maybe see someone else enter the GPU market, with more determination than just using using smear campaigns when they can't bring new tech (aka crying about gameworks every other tuesday).


Gamework vs Mantle????

and this thread is not Nvidia vs AMD.We don't care for What Nvidia or AMD did.


----------



## DigiHound

Quote:


> AMD's GPGPU calculations came in 2011.
> GPGPU is a general term, but AMD "took" that term to describe their ability in 2011, since they didn't really give it a name.


*eyebrow*

OpenCL was the GPGPU compute solution answer to CUDA and it debuted in 2008, not 2011. AMD's specific heterogeneous compute capability is called HSA and it debuted as a concept in 2011 before launching on Kaveri APUs on 2014. For all Kaveri's weaknesses as a CPU (and the fact that HSA support is hard to find in shipping software), it offers a set of capabilities that no other integrated chip matches at this point. You can argue, fairly, that Intel hasn't *needed* it, but that doesn't make it less unique.


----------



## NoirWolf

Quote:


> Originally Posted by *Forceman*
> 
> Robert Hallock good enough?
> I think we can safely disregard that entire chart at this point.


http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/07/AMD_GCN3_Instruction_Set_Architecture.pdf


So what we're missing out on on the AMD side without rasterization is better physics (if the devs deem worth it to even put it in) and better particle effects. *

On the other side



This, more or less, is what is missing for Team Green. *

* - require software workarounds to compensate lack of hardware capability.

Between better looking smoke and more realistic worlds by the sheer amount of stuff running around... I know where I am leaning towards. Odds are Async will matter more sooner than rasterization simply because the former improves the game fundamentally while the latter... is a cherry on top, it would be nice don't get me wrong but unless you're running hydrodynamic simulations on your AMD GPU... I doubt you'd notice it that much.


----------



## Mahigan

Conservative Rasterizer and ROVs are great technological jumps. I wouldn't downplay them. They're also part of the DirectX12 specifications (12_1). As such they're part of an Open Standard.

Here's hoping that AMDs next-gen cards adopt these two key technologies and kuddo's to nVIDIA for supporting them.


----------



## mtcn77

Quote:


> Originally Posted by *Forceman*
> 
> Robert Hallock good enough?
> I think we can safely disregard that entire chart at this point.


Conservative rasterization will throttle some of the performance I believe. I'm either making this up, or read it somewhere. Hold on to your straw hats please, while I take a nose dive.








Gee, I'm like a computer virus hacking and slashing into arguments.
Quote:


> Last but certainly not least among Direct3D's new features will be conservative rasterization. Conservative rasterization is essentially a more accurate *but performance intensive solution* to figuring out whether a polygon covers part of a pixel.


[Necro'ed]


----------



## Myst-san

Adopting Conservative Rasterizer and ROVs for AMD will be easier then Nvidia remaking their architecture for Async, at least it looks that way to me.


----------



## CrazyElf

One thing I am interested in seeing is how this will affect Crossfire. DX12 is in theory able to bring much better multiGPU scaling.

Right now multiGPU setups are not worth it:
http://www.pcper.com/reviews/Graphics-Cards/AMD-Fury-X-vs-NVIDIA-GTX-980-Ti-2-and-3-Way-Multi-GPU-Performance
Quote:


> Originally Posted by *Ryan Shrout*
> This story that focuses on the performance scaling capability of the AMD Radeon R9 Fury X and the NVIDIA GeForce GTX 980 Ti has revealed some interesting information to us. Let's start with the easiest outcome to decipher: 3-Way SLI and 3-Way CrossFire just do not present a positive experience for gamers of either camp. It would seem that either due to neglect or complexity, drivers and profiles that center around more than 2 graphics processors have been left off to the side of either teams' roadmap, and maybe deservedly so. SLI/CF users are already a niche; and users that combine 3 or even 4 GPUs in a single system are even more rare than that. From what I am told by both NVIDIA and AMD, developing robust and reliable drivers for beyond two GPU configurations is incredibly time consuming, expensive and...under utilized.
> 
> Regardless of your thoughts on if either company should invest more time in these kinds of setups, it seems obvious to me that going into a build with the intent to run either three AMD Fury X cards or three GTX 980 Ti cards is a fruitless gesture. Just don't do it.


It should also make the possibility for 3 and 4 way Crossfire/SLI better.

Somewhat off topic, but Charlie over at SemiAccurate (behind a paywall) claims that AMD has a new source of cash. No idea about the accuracy of those statements though. Charlie has been wrong before, so I'm not taking it with anything but deep skepticism.
https://semiaccurate.com/2015/09/08/behind-amds-new-funding/

Quote:


> Originally Posted by *47 Knucklehead*
> 
> Exactly, I think that AMD will be split up and sold off, and that would be the best thing for everyone. They won't just go bankrupt. But even if they did, someone would buy up their assets and continue on.
> 
> I never said there wouldn't be layoffs. To think that there wouldn't be is silly. Much of management would be let go ... and in the case of AMD, that is a good thing, after all, it is management (or in their case, mismanagement) that got them in this problem. They have very technically smart engineers, and any company taking over (or buying up the parts if a bankruptcy ... which will never happen ... AMD will make a deal first), will surely be snapped up by their new parent company.
> Again, I don't think that AMD will go bankrupt, so I'm sorry to say, everything you typed above is a moot point in this case.
> There have been a ton of companies who have already expressed an interest in buying AMD/ATI. Samsung is one of them. They would love to add to their list of IP and make Apple pay even more.
> 
> The x86 Agreement is a sticking point, and depending on what lawyer you talk to, may or may not transfer. But then again, we aren't talking about a mom and pop shop who are paying $70,000 in a risk to see if Intel balks. No matter who buys them for tens or more millions of dollars, will be talking to Intel first and cutting a side deal. Remember it is in Intel's best interest to "play nice" with the new owner. Because while AMD licenses x86 FROM Intel, they license x64 TO Intel. So the new owner would work Intel BEFORE a sale. Also, Intel wants someone else to hold that license. It makes the government "less nervous".


The thing is:

The nature of the x86 agreement would be bitterly fought in court.
You're operating under the assumption that someone will buy AMD as a whole or at least the GPU division as a whole rather than it being split into pieces and sold off.
Finally, the other question is even if they do buy the GPU market, will they compete in this field given the future prospects?
That's a lot of "ifs". Answer is we're far better off with AMD intact.

Quote:


> Originally Posted by *semitope*
> 
> if we are talking about whats good for consumers it would be if nvidia went and was replaced by a more forward thinking company. Losing AMD would be terrible for PC gamers considering its because of them we can be happy about what dx12 is, because of them we have HBM and not GDDR6 or some foolishness. etc.
> 
> Nvidia is a standard company, benefiting off other people's inventions and doing what they can to milk the market. If we are to talk about either of them going bankrupt for the sake of consumers, it should be nvidia.
> 
> That talk is pointless though. To me you just seem like you want AMD gone so there's no chance you are worse off for picking nvidia GPUs.


This.

Although AMD (and ATI before them) have been far from perfect, Nvidia has definitely been the less consumer friendly of the 2 companies.

They've been pushing proprietary standards and technologies for a long time. It's not secret that it's for one reason - money.
Both companies have done bad things to try to make their rivals look worse, but things like Hairworks are definitely more common with Nvidia.
Very strong desire to control - it's why we see things like Project Greenlight and no custom PCB Titan Xs for example. There is also the situation with drivers - they don't optimize as much when the next architecture comes out.
I would hesitate to guess that behind closed doors, they must have been doing something to alienate the OS community or else Linus Torvalds would not be so angry (although since then they've tried to clean their act up).
AMD is far from perfect (they haven't allowed custom Fury X/Nano PCBs for example), but they are trying more.

They've pushed more open standards technologies.
I would not call Nvidia non-innovative, but I will note that AMD has been more "aggressive" in its adoption of ideas like new memory (GDDR5 and HBM for example), and it has historically pursued new nodes more quickly
For us consumers, I suppose a win is that that often are cheaper in price to performance and don't restrict their GPUs as stringently (ex: they allow for more overclocking)
Their GPUs age better right now (witness 7970 vs 680 on release day vs today)
That is not to say AMD hasn't done quite a few bad things (ex: their withholding of review samples smells to me of a really bad idea, as is their poor marketing), but overall they are the more ethical of the two companies, although as I have noted, far, far from perfect. They've also at times, both tried to deny when they've screwed up unless the evidence is overwhelming.

Perhaps it is forced upon them being the underdog. Any other examples? Off the top of my head, I have noticed IBM for example has pushed OpenPower now that the Xeon CPUs are basically totally dominant.

Both companies have common flaws - for example they need to both work on drivers. I wish that both companies would make a bigger push for open sourcing all of their drivers in full. All companies are out for money, but there are those that do a better job overall in terms of ethics and how they treat their people.

Either way, at this point, we're stuck with them, unless of course, AMD goes under, which is dangerously likely. I wish we had more players in this field. It may be though that the size of the graphics market only supports 2 players. If only hardcore PC gaming was a much bigger market ....

Finally, I'll note that we are far, far better off as Mahigan says with Open Standards. It's good for innovation and allows for more competition.

Quote:


> Originally Posted by *Noufel*
> 
> i hope that in one year or two the future engines will bring us cinematic quality games with the full controle on compute possibilities brought by Async shading


This brings up an interesting point.

Just how much "better" (and by better I mean more photorealistic) could graphics get with better compute power? I think it's an interesting question because graphics in the past few years have in some ways stagnated. It may be that there is a tradeoff, better graphics or the same graphics at higher frames. IMO, once we get 60 fps at 4k, which can happen next generation or the generation after, better graphics is the way to go.

I don't see as much point in pushing to say, 6k or 8k, which, well at some point resolution has diminishing returns.

So that leaves, how can a more compute-centric gaming engine combined with a powerful GPU improve graphics?


----------



## Serios

Quote:


> Originally Posted by *Defoler*
> 
> Actually nvidia are putting out more tech in the last 10 years than AMD did in the last 10 years.
> *G-sync forced freesync.
> Ganeworks forced mantle.
> Cuda forced GPGPU on AMD.
> Apex clothing forced AMD to try out TressFX as their own answer to "visual effects".
> PhysX improvements and pushing it to games forced AMD to seek alternatives and allow physx implementations once it was open sourced.*
> 
> Nvidia have been ahead of the game compared to AMD for a long time now. They keep pushing new technology, new ideas. There is a very good reason why AMD are trailing so bad behind nvidia, and they lose money consistently while nvidia are getting stronger.
> 
> People love to hate nvidia and gameworks, but overall, gameworks brought new visual effects to PC gaming way more than AMD brought out. AMD have been trying to make things look better by shear force of their GPU, but nvidia are trying to find new things.
> 
> Anyway, no one wants to see AMD gone. They want to see AMD bought by someone who will push them forward (in both CPU and GPU).
> Those are huge differences which people keep skipping.
> I want samsung to buy AMD's CPU division so they can use their own quality FABs and knowledge to push it forward. Or maybe qualcomm and seeing them making better SoC for desktops. And maybe see someone else enter the GPU market, with more determination than just using using smear campaigns when they can't bring new tech (aka crying about gameworks every other tuesday).


None of those things are as important as GDDR5 or the new HBM, even if you combine all of them and then compare.
Also Nvidia did all those things for their own benefit. When is the last time apart from CPU based physics(which is still software) Nvidia did something that benefited all gamers?


----------



## STEvil

Let alone all but one are wrong.


----------



## Silent Scone

Quote:


> Originally Posted by *CrazyElf*
> 
> One thing I am interested in seeing is how this will affect Crossfire. DX12 is in theory able to bring much better multiGPU scaling.
> 
> Right now multiGPU setups are not worth it:
> http://www.pcper.com/reviews/Graphics-Cards/AMD-Fury-X-vs-NVIDIA-GTX-980-Ti-2-and-3-Way-Multi-GPU-Performance
> It should also make the possibility for 3 and 4 way Crossfire/SLI better.


I'm not convinced by D3D native support, this will remain to be seen and which rendering method will be more widely adopted. Holding these decisions at the hands of the developer leaves an even larger divide than before, and will likely require a heavier involvement (push) from vendors than we see today. It will be a year or two before we see any of this anyway.


----------



## Olivon

Very interesting article from Hardware.fr on the subject.
Nothing is all black or all white as usual.
A must read article !

*Source (FR)*
*Translated*


----------



## SpeedyVT

You think all of this recent AMD strides have a lot with that ex-nVidia employee they scooped up a while back?


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Mahigan*
> 
> PS. nVIDIA did not create PhysX, they bought AGEIA.


To be fair then, AMD didn't invent half that stuff then, they bought ATI.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> To be fair then, AMD didn't invent half that stuff then, they bought ATI.


Fairly sure AMD's HBM, or heck GDDR5, wasn't based on anything ATI was working on while AEGIA pretty much made physx a thing on its own but then again you do care about tech working on Crossfire and SLI that hasn't been a selling point since the early days of the US Invasion of Iraq but eh... logical consistency







.
Quote:


> Originally Posted by *SpeedyVT*
> 
> You think all of this recent AMD strides have a lot with that ex-nVidia employee they scooped up a while back?


GCN was around in design,at least, since what? 2008? When was the employee scooped up?


----------



## 47 Knucklehead

Quote:


> Originally Posted by *NoirWolf*
> 
> Fairly sure AMD's HBM, or heck GDDR5, wasn't based on anything ATI ...


GDDR5 was based on GDDR3, which was designed by ATI. Same for GDDR2. GDDR4 was largely worked on by Samsung, and was largely a DOA standard.
Crossfire was designed by ATI.
Don't even get me going on what AMD's entire processor line was based on.








In fact, most of the engineers still at AMD working on the graphics side of things come from the ATI buy out, and many of the new ones come from nVidia.

So if people are going to balk at nVidia for buying another company to get their technology, then they should do the same about AMD in all fairness.

That's all I'm saying.

AMD has squat for R&D budget so they embrace partnering with others to front the bill, and they embrace open standards, which take longer and are often a compromise, because of their lack of money, not because they WANT to. Despite what some have said, open standards are not always the best or the most popular. Take Linux vs Microsoft Windows for example. Linux is open standard, Microsoft Windows isn't. Which is more popular? Which is the gaming platform of choice? FreeSync isn't an open standard, AdaptiveSync is. That is what Intel has said they will embrace in the future, NOT FreeSync. GCN is not an open standard. Despite all the hype Mantle is not an open standard. Crossfire is not an open standard. AMD x64 is not an open standard, just like the Intel x86 is not. Hell, most things in this world are not open standard.

Yes open standards have their place, but they aren't the beat all, end all, of ways to do anything.

P.S. And speaking of Linux, have you ever used it say 5 years ago and up until about a year ago? If you did, you will know that AMD cards were poorly supported in that open source OS. Only when AMD smelled the possibility of getting some sales from SteamBox and such, did AMD really give the open source community in Linux the time of day as far as drivers go. If you were doing more than just basic graphic work in Linux, you did it with an nVidia card.

Also, as far as Mantle goes, AMD locked both Intel and nVidia out (there is even a quote out there, where they finally admitted that and claimed "we want to minimize things for now, but maybe later"). Basically the whole Mantle thing was just a ploy to build an API that GREATLY benefited their cards and CPU's and not the Industry leaders. Once Mantle was largely done ... WITHOUT input from Intel and nVidia ... they stopped development on Mantle and basically turned it over to Kronos and it is now called Vulkan. But let's face it, do you think that Kronos is going to suddenly throw away everything and go to Intel and nVidia and make things "fair and open" for all? Hell no, it will continue to be largely one sided towards AMD, only now that someone else is heading up the project, AMD doesn't have to spend any money (other than a little consulting time) to get SOMEONE ELSE to do their work for them. As I said, for YEARS AMD didn't embrace open source. They only did when they had money issues, then they largely changed tactics and did a little work to get interest, and then let OTHER PEOPLE do their work for them.


----------



## SpeedyVT

Quote:


> Originally Posted by *NoirWolf*
> 
> GCN was around in design,at least, since what? 2008? When was the employee scooped up?


Over that time span they grabbed a few. Couldn't say which was whom or it's time.


----------



## EightDee8D

Quote:


> Originally Posted by *47 Knucklehead*
> 
> GDDR5 was based on GDDR3, which was designed by ATI. Same for GDDR2. GDDR4 was largely worked on by Samsung, and was largely a DOA standard.
> Crossfire was designed by ATI.
> Don't even get me going on what AMD's entire processor line was based on.
> 
> 
> 
> 
> 
> 
> 
> 
> In fact, most of the engineers still at AMD working on the graphics side of things come from the ATI buy out, and many of the new ones come from nVidia.
> 
> So if people are going to balk at nVidia for buying another company to get their technology, then they should do the same about AMD in all fairness.
> 
> That's all I'm saying.
> 
> AMD has squat for R&D budget so they embrace partnering with others to front the bill, and they embrace open standards, which take longer and are often a compromise, because of their lack of money, not because they WANT to. Despite what some have said, open standards are not always the best or the most popular. Take Linux vs Microsoft Windows for example. Linux is open standard, Microsoft Windows isn't. Which is more popular? Which is the gaming platform of choice? FreeSync isn't an open standard, AdaptiveSync is. That is what Intel has said they will embrace in the future, NOT FreeSync. *GCN is not an open standard. Despite all the hype Mantle is not an open standard. Crossfire is not an open standard. AMD x64 is not an open standard, just like the Intel x86 is not*. Hell, most things in this world are not open standard.
> 
> Yes open standards have their place, but they aren't the beat all, end all, of ways to do anything.


----------



## NoirWolf

Quote:


> Originally Posted by *EightDee8D*


Like talking to a wall, eh? He's amusing considering he just said derivative work = not original which means Nvidia's gone from sort of innovative to 1960s era China according to his logic. Basically defending someone by setting them on fire.


----------



## CasualCat

Is there new info yet, or is it just red vs green ad nauseam at this point?


----------



## SpeedyVT

Quote:


> Originally Posted by *CasualCat*
> 
> Is there new info yet, or is it just red vs green ad nauseam at this point?


Microsoft released a document recently about Maxwell 2's support for async confirming it was emulated. No specifics to back or validate it.


----------



## Forceman

Quote:


> Originally Posted by *SpeedyVT*
> 
> Microsoft released a document recently about Maxwell 2's support for async confirming it was emulated. No specifics to back or validate it.


Source?


----------



## gondReus

Related? No?
AMD Patent.
Issued: Aug 25, 2015
Control circuits for asynchronous circuits
http://patents.justia.com/patent/9117511

Load balancing for optimal tessellation performance
Patent number: 9105125
Issued: August 11, 2015
http://patents.justia.com/patent/9105125

and lot of them just recent
http://patents.justia.com/assignee/advanced-micro-devices-inc

Nvidia
http://patents.justia.com/assignee/nvidia-corporation


----------



## SpeedyVT

Quote:


> Originally Posted by *Forceman*
> 
> Source?


Posted on reddit earlier.



https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#Direct3D_12


----------



## sugarhell

Quote:


> Originally Posted by *Olivon*
> 
> Very interesting article from Hardware.fr on the subject.
> Not is all black or all white as usual.
> A must read article !
> 
> *Source (FR)*
> *Translated*


One of the best article to clear the fog around Async compute (wrong term) and after 1 page still no mention of it. Thank you for posting

Damien seems to believe that mantle is the inspiration of dx12 in a big way.


----------



## CasualCat

Quote:


> Originally Posted by *SpeedyVT*
> 
> Microsoft released a document recently about Maxwell 2's support for async confirming it was emulated. No specifics to back or validate it.


Quote:


> Originally Posted by *Forceman*
> 
> Source?


Quote:


> Originally Posted by *SpeedyVT*
> 
> Posted on reddit earlier.
> 
> 
> 
> https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#Direct3D_12


I'd expect when referencing a Microsoft document, and being asked for a source you'd provide the document not an uncited wikipedia entry.

Anyone else have this document available?


----------



## Assirra

Quote:


> Originally Posted by *SpeedyVT*
> 
> Posted on reddit earlier.
> 
> 
> 
> https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#Direct3D_12


That is not a source but a wikipedia article anyone edit whenever they felt like.


----------



## SpeedyVT

Quote:


> Originally Posted by *Assirra*
> 
> That is not a source but a wikipedia article anyone edit whenever they felt like.


Just me sleepily copypasta stuff.


----------



## FastEddieNYC

Quote:


> Originally Posted by *Olivon*
> 
> Very interesting article from Hardware.fr on the subject.
> Not is all black or all white as usual.
> A must read article !
> 
> *Source (FR)*
> *Translated*


Thanks for Translated link. Great write up on this subject. The date is now Sept 9th and still no response from Nvidia.


----------



## semitope

Quote:


> Originally Posted by *47 Knucklehead*
> 
> GDDR5 was based on GDDR3, which was designed by ATI. Same for GDDR2. GDDR4 was largely worked on by Samsung, and was largely a DOA standard.
> Crossfire was designed by ATI.
> Don't even get me going on what AMD's entire processor line was based on.
> 
> 
> 
> 
> 
> 
> 
> 
> In fact, most of the engineers still at AMD working on the graphics side of things come from the ATI buy out, and many of the new ones come from nVidia.


ATI is AMD, AMD is ATI. there's no point making the distinction when they gained their entire GPU segment from what is essentially a merge. It's the same company. Call it what ati evolved into.
Quote:


> So if people are going to balk at nVidia for buying another company to get their technology, then they should do the same about AMD in all fairness.
> 
> That's all I'm saying.


I really don't think these things are worth nitpicking. aegia bought NovodeX, nvidia bought aegia. meh. Oh and its not balking at nvidia. In terms of what you said about innovation, it becomes relevant if the tech was bought. Would mean what you said is wrong, but it's not necessarily a bad thing to buy tech. You could say aegia is part of nvidia so it's their work... ok.
Quote:


> *AMD has squat for R&D budget* so they embrace partnering with others to front the bill, and they embrace open standards, which take longer and are often a compromise, because of their lack of money, not because they WANT to.


To be fair, nvidias budget would be squat as well if this were true. AMD actually has a higher average. They are not that far apart.

https://ycharts.com/companies/NVDA/r_and_d_expense
https://ycharts.com/companies/AMD/r_and_d_expense
Quote:


> Despite what some have said, open standards are not always the best or the most popular. Take Linux vs Microsoft Windows for example. Linux is open standard, Microsoft Windows isn't. Which is more popular? Which is the gaming platform of choice? FreeSync isn't an open standard, AdaptiveSync is. That is what Intel has said they will embrace in the future, NOT FreeSync. GCN is not an open standard. Despite all the hype Mantle is not an open standard. Crossfire is not an open standard. AMD x64 is not an open standard, just like the Intel x86 is not. Hell, most things in this world are not open standard.
> 
> Yes open standards have their place, but they aren't the beat all, end all, of ways to do anything.


"Open" standards are how the PC industry grows. Either truly open (can use, change etc) or just open to use. *Windows is open* in the sense that anyone can use it. You do not pay MS money to create software for windows etc. If windows was otherwise, it would not be where it is. I don't share the eagerness some have for gaming on linux. That just complicates things for devs already not being particularly nice to PC. A limitation of who controls the code is platform, and windows is spreading to other platforms (processors etc).
Quote:


> Also, as far as Mantle goes, AMD locked both Intel and nVidia out (there is even a quote out there, where they finally admitted that and claimed "we want to minimize things for now, but maybe later"). Basically the whole Mantle thing was just a ploy to build an API that GREATLY benefited their cards and CPU's and not the Industry leaders. Once Mantle was largely done ... WITHOUT input from Intel and nVidia ... they stopped development on Mantle and basically turned it over to Kronos and it is now called Vulkan. But let's face it, do you think that Kronos is going to suddenly throw away everything and go to Intel and nVidia and make things "fair and open" for all? Hell no, it will continue to be largely one sided towards AMD, only now that someone else is heading up the project, AMD doesn't have to spend any money (other than a little consulting time) to get SOMEONE ELSE to do their work for them. As I said, for YEARS AMD didn't embrace open source. They only did when they had money issues, then they largely changed tactics and did a little work to get interest, and then let OTHER PEOPLE do their work for them.


It never left closed beta did it? They can get it in vulkan now.

rest is just tin foil shenanigans...

I think we should ignore this guy actually. I can see some argument about AMD and Nvidia but this is taking it too far. So many pointless things coming up.


----------



## Mahigan

Knucklehead has a point. Nvidia bought Ageia and AMD bought ATi. That being said... That was just the last line of my post.

As it pertains to open standards... NVIDIA lacks in that department. Even mighty Intel adopts and even leads the way on many open standards.

On another point. Open Standard does not mean Open Source or Freeware. Windows is closed source, for example, but you don't need a license or Microsoft's permission to develop for it.

I like Open Source too, but open source does always come out on top. Android is an open standard as well... How's that IOS market share doing compared to Android these days?

I think that the area of open standards is one in which NVIDIA could improve. That's all I'm saying.


----------



## 2010rig

Quote:


> Originally Posted by *Mahigan*
> 
> Knucklehead has a point. Nvidia bought Ageia and AMD bought ATi. That being said... That was just the last line of my post.
> 
> As it pertains to open standards... NVIDIA lacks in that department. Even mighty Intel adopts and even leads the way on many open standards.
> 
> On another point. Open Standard does not mean Open Source or Freeware. Windows is closed source, for example, but you don't need a license or Microsoft's permission to develop for it.
> 
> I like Open Source too, but open source does always come out on top. Android is an open standard as well... How's that IOS market share doing compared to Android these days?
> 
> I think that the area of open standards is one in which NVIDIA could improve. That's all I'm saying.


Which is more profitable, IOS or Android?

Or in other words, which OS makes its parent company the most money despite market share?

Hint


----------



## ku4eto

Quote:


> Originally Posted by *semitope*
> 
> To be fair, nvidias budget would be squat as well if this were true. AMD actually has a higher average. They are not that far apart.
> 
> https://ycharts.com/companies/NVDA/r_and_d_expense
> https://ycharts.com/companies/AMD/r_and_d_expense.


Those AMD R&D are for both CPU and GPU segment. I lol'd when i saw them, with 100M$ less TOTAL for both CPU and GPU, they are offering amazingly competetive GPUs when compared to nVidia. And AMD APU's are as well really competetive vs Intel ones, in terms of iGPU power.


----------



## Mahigan

Last night, I spent my time on reddit. I was trying to alleviate fears in the NVIDIA section.

It's like what Yoda said. Fear leads to anger, anger to hate, hate to suffering.

A lot of people are angry and acting out of hate. Some guy sold his GTX 980 Ti, bought a Fury-X and is playing Witcher 3. Why would you buy an AMD card if the main game you play is Witcher 3?

He did so out of fear over this asynchronous compute topic. He mentioned that the lack of response, from NVIDIA, compelled him to jump ship.

Crazy times.


----------



## semitope

Quote:


> Originally Posted by *2010rig*
> 
> Which is more profitable, IOS or Android?
> 
> Or in other words, which OS makes its parent company the most money despite market share?
> 
> Hint


whats the relevance. The reasons for profits there are complicated. eg. apple could take a bigger share of app sales


----------



## mtcn77

Quote:


> Originally Posted by *ku4eto*
> 
> Those AMD R&D are for both CPU and GPU segment. I lol'd when i saw them, with 100M$ less TOTAL for both CPU and GPU, they are offering amazingly competetive GPUs when compared to nVidia. And AMD APU's are as well really competetive vs Intel ones, in terms of iGPU power.


They can launch a single APU and drop their entire range and I wouldn't mind. Integrated solutions are the future. I deeply root for them to launch another series and just drop it off on the doorstep of Qualcomm or some other firm. They don't particularly have to _sell_ unit volumes - look at ARM's model. They are doing pretty good and are making remarkable leeway for themselves.


----------



## 2010rig

Quote:


> Originally Posted by *semitope*
> 
> whats the relevance. The reasons for profits there are complicated. eg. apple could take a bigger share of app sales


What's the point?
Quote:


> I like Open Source too, but open source does always come out on top. Android is an open standard as well... *How's that IOS market share doing compared to Android these days?*


Do you think Apple cares about its IOS market share, when they keep having record breaking profitable quarters?


----------



## semitope

Quote:


> Originally Posted by *2010rig*
> 
> What's the point?
> Do you think Apple cares about its IOS market share, when they keep having record breaking profitable quarters?


but then you are talking about different things. Apples profitability vs how fast android spread. Android being open source allowed it to grow, apple is profitable. Not mutually exclusive


----------



## 2010rig

Quote:


> Originally Posted by *semitope*
> 
> but then you are talking about different things. Apples profitability vs how fast android spread. Android being open source allowed it to grow, apple is profitable. Not mutually exclusive


But see, that's the whole point, you can't really compare the Android ecosystem to Apple's. Apple is the sole maker of IOS devices...


----------



## DigiHound

As far as I am aware, the image below, which is from MS, is completely accurate. It disagrees with both the Wikipedia page and the emphatically wrong image being passed along earlier.

http://www.extremetech.com/wp-content/uploads/2015/06/DX12FeatureLEvels.png

This image: http://cdn.overclock.net/0/0d/0d6e7b6e_4714aef8_featuresetsdx12.png is incorrect in multiple ways. It's incorrect enough that I'm going to ignore it altogether.

Let's go over the differences between the Wikipedia chart and the MS image. Here's the Wikipedia chart image.

http://cdn.overclock.net/1/1a/1a0a8939_Rs6QPn8.png

"No difference" means no difference in the areas that the MS chart covers. The Wikipedia chart covers Skylake, for example, which isn't included in the document I have.

*Resource Binding*: No difference.
*Conservative Rasterization*: Chart claims only Maxwell supports Tier 1. MS document states Maxwell supports Type 2.
*Tiled Resources*: Chart claims that F, K, and M1 are Tier 1, with M2 as Tier 3. MS document states that Fermi is Tier 1, Kepler and M1 are Tier 2, and M2 is Tier 3.
*UAV Loads*: Chart claims that UAV formats are supported on GCN and Maxwell (both types). MS agrees, but clarifies that GCN 1.0 has one level of support and later versions have another.
*Rasterizer-Ordered Views*: No difference.

Based on what I know today, the MS document is right, but that image is also at least several months old. If anyone has evidence that it is less than accurate, I'm happy to entertain it.


----------



## infranoia

I do wonder how Futuremark will deal with this. I can't imagine the back room deals going on right now over their DX12 3DMark iteration.

Will they have a dedicated async shader bench? Ethically, what do you do, if you have a feature that potentially favors one architecture over another, but is a significant part of the API? That hasn't stopped them from implementing tesselation benches, but it will be interesting if they are truly vendor-agnostic, or if they become a religious tool.

3DMark certainly moves more hardware than any other bench.


----------



## Mahigan

Partisans will stoop to any level... Even editing a Wikipedia entry. I'd trust MS, they built DX12, over Wikipedia.


----------



## ivanp3000

Quote:


> Originally Posted by *DigiHound*
> 
> As far as I am aware, the image below, which is from MS, is completely accurate. It disagrees with both the Wikipedia page and the emphatically wrong image being passed along earlier.
> 
> http://www.extremetech.com/wp-content/uploads/2015/06/DX12FeatureLEvels.png
> 
> This image: http://cdn.overclock.net/0/0d/0d6e7b6e_4714aef8_featuresetsdx12.png is incorrect in multiple ways. It's incorrect enough that I'm going to ignore it altogether.
> 
> Let's go over the differences between the Wikipedia chart and the MS image. Here's the Wikipedia chart image.
> 
> http://cdn.overclock.net/1/1a/1a0a8939_Rs6QPn8.png
> 
> "No difference" means no difference in the areas that the MS chart covers. The Wikipedia chart covers Skylake, for example, which isn't included in the document I have.
> 
> *Resource Binding*: No difference.
> *Conservative Rasterization*: Chart claims only Maxwell supports Tier 1. MS document states Maxwell supports Type 2.
> *Tiled Resources*: Chart claims that F, K, and M1 are Tier 1, with M2 as Tier 3. MS document states that Fermi is Tier 1, Kepler and M1 are Tier 2, and M2 is Tier 3.
> *UAV Loads*: Chart claims that UAV formats are supported on GCN and Maxwell (both types). MS agrees, but clarifies that GCN 1.0 has one level of support and later versions have another.
> *Rasterizer-Ordered Views*: No difference.
> 
> Based on what I know today, the MS document is right, but that image is also at least several months old. If anyone has evidence that it is less than accurate, I'm happy to entertain it.


I think the ms image is fake.

http://images.anandtech.com/reviews/video/dx12/fls/GTX680.png
http://images.anandtech.com/reviews/video/dx12/fls/GTX750Ti.png
http://images.anandtech.com/reviews/video/dx12/fls/GTX980.png

http://images.anandtech.com/reviews/video/dx12/fls/7970.png
http://images.anandtech.com/reviews/video/dx12/fls/290X.png
http://images.anandtech.com/reviews/video/dx12/fls/285.png

source:https://forum.x.com/threads/direct3d-feature-levels-discussion.56575/page-4#post-1833996
edit: x=beyond3d


----------



## ku4eto

Quote:


> Originally Posted by *ivanp3000*
> 
> I think the ms image is fake.
> 
> http://images.anandtech.com/reviews/video/dx12/fls/GTX680.png
> http://images.anandtech.com/reviews/video/dx12/fls/GTX750Ti.png
> http://images.anandtech.com/reviews/video/dx12/fls/GTX980.png
> 
> http://images.anandtech.com/reviews/video/dx12/fls/7970.png
> http://images.anandtech.com/reviews/video/dx12/fls/290X.png
> http://images.anandtech.com/reviews/video/dx12/fls/285.png
> 
> source:https://forum.x.com/threads/direct3d-feature-levels-discussion.56575/page-4#post-1833996
> edit: x=beyond3d


Ehh, dunno why you put this x.com instead of directly leaving a link ???

Also, did not know for such feature, that allows you to see GPU BIOS saved DX support.


----------



## ivanp3000

Quote:


> Originally Posted by *ku4eto*
> 
> Ehh, dunno why you put this x.com instead of directly leaving a link ???
> 
> Also, did not know for such feature, that allows you to see GPU BIOS saved DX support.


Didn't read the forum rules I was playing it safe.

I just learned about dxcapsviewer recently when I was searching the net about dx12.


----------



## airfathaaaaa

someone was saying that it would be good for amd gpu department to be separated from the cpu one..well

AMD Forms Radeon Technologies Group, Taps Raja Koduri To Lead Team Dedicated To Graphics Growth
http://www.forbes.com/sites/jasonevangelho/2015/09/09/amd-forms-radeon-technologies-group-taps-raja-koduri-to-lead-team-dedicated-to-graphics-growth/


----------



## DigiHound

IvanP,

I am not sure what you are referring to. The B3D post states: "Whoever had GCN 1.0 at Tiled Resources Tier 1 and GCN 1.1 at Tier 2, please be sure to collect."

The links you provided show the following:

HD 7970 (GCN 1.0) = Tiled Resources Tier 1.
R9 290X (GCN 1.1) = Tiled Resources Tier 2
R9 285 (GCN 1.2) = Tiled Resources Tier 2.

That's what the MS slide states.

If you were referring to a different stat, please call out which one you think is incorrect.

Edit: NVM. I see it. The screenshots say Tier 1 for Fermi and Kepler, not Tier 2.

The source I got this slide from is incredibly unlikely to have faked it or have reason to do so -- but that doesn't mean it's 100% correct. I'll see what I can find out.


----------



## Mahigan

Quote:


> Originally Posted by *airfathaaaaa*
> 
> someone was saying that it would be good for amd gpu department to be separated from the cpu one..well
> 
> AMD Forms Radeon Technologies Group, Taps Raja Koduri To Lead Team Dedicated To Graphics Growth
> http://www.forbes.com/sites/jasonevangelho/2015/09/09/amd-forms-radeon-technologies-group-taps-raja-koduri-to-lead-team-dedicated-to-graphics-growth/


Now that's what I'm talking about







I was just talking about this the other day here in the thread










Raja was chief architect of the ATi R300 btw









2016 just got a whole lot more exciting.


----------



## semitope

I actually don't get that. Would have expected their graphics was setup like that already. though I don't like the idea of a single person in charge. Was it that they had one person heading both businesses? How will this affect their APUs?


----------



## Mahigan

Quote:


> Originally Posted by *semitope*
> 
> I actually don't get that. Would have expected their graphics was setup like that already. though I don't like the idea of a single person in charge. Was it that they had one person heading both businesses? How will this affect their APUs?


Raja still reports to Dr Lisa Su, therefore the Radeon Graphics Division isn't a completely separate entity from AMD. This shouldn't change the APU strategies, in fact, it ought to make the graphics side of future APUs stronger. We also have to remember that Jim Keller (K7/K8) is the chief architect now working on Zen.

This sounds like AMD is preparing for a comeback, having achieved their goal with HSA. Now each division can focus on what it does best.

Everyone, including myself, complaining about bad PR coming from AMD are likely to see some changes in that dept now that Raja is in charge.


----------



## DigiHound

Quote:


> though I don't like the idea of a single person in charge. Was it that they had one person heading both businesses? How will this affect their APUs?


I think you'll like it more than you think you will. Anandtech has a good summary: "Between the various groups, AMD has had departments reporting to CTO Mark Papermaster, CVP of Global Marketing John Taylor, CVP and GM of graphics Matt Skynner, VP of Visual Computing Raja Koduri, and other executives within the AMD structure. The end result is that graphics is truly everywhere within AMD, but at times it is also nowhere."

My own knowledge of AMD echoes this. APU, SoC, Catalyst, FirePro, Radeon, developer relations, marketing, PR -- each of these were a different group at AMD, and each group had its own interests and pursuits. I don't think this changed much strictly on the hardware side where dGPU was concerned, but it could definitely impact which features rolled out to which product families when.

Think about AMD's driver stack. For years, AMD's drivers have had a reputation for not being quite as polished as the NV equivalent. Yes, GameWorks made this problem worse (at least as far as AMD customers were concerned), but these issues predate GameWorks by a number of years. That's partly because you didn't have one leader calling the shots and making the decisions about which features the driver team should be pursuing and how those features might lay the groundwork for hardware capabilities in current or future drivers.

Think about Nvidia's GeForce Experience. I'm not claiming it's a perfect application, but it integrates quite well with the capabilities of NV GPUs and it offers some genuinely useful one-button toggles for quick optimization, driver updates, and things like Battery Boost. AMD's partnership with Raptr offers, at best, an imperfect echo of those capabilities. Look at the overall utility and function that NV drivers offer compared to Catalyst's rather more anemic profile options.

Up until now, one of the problems at AMD is that the guys building the hardware didn't have the ability to call up the driver team and say "We need to focus on implementing X, Y, and Z" or "We want to make sure consumers have an easy way to use the hardware features we've built into GCN." I'm not claiming that the two teams didn't work together, but they weren't working for the same bosses and they didn't always have the same priorities.

This reorganization should give AMD a better structure to respond to issues and give it a faster path to bringing new driver features to market or fixing bugs.


----------



## Mahigan

Quote:


> Originally Posted by *DigiHound*
> 
> Think about AMD's driver stack. For years, AMD's drivers have had a reputation for not being quite as polished as the NV equivalent. Yes, GameWorks made this problem worse (at least as far as AMD customers were concerned), but these issues predate GameWorks by a number of years. That's partly because you didn't have one leader calling the shots and making the decisions about which features the driver team should be pursuing and how those features might lay the groundwork for hardware capabilities in current or future drivers.
> 
> Think about Nvidia's GeForce Experience. I'm not claiming it's a perfect application, but it integrates quite well with the capabilities of NV GPUs and it offers some genuinely useful one-button toggles for quick optimization, driver updates, and things like Battery Boost. AMD's partnership with Raptr offers, at best, an imperfect echo of those capabilities. Look at the overall utility and function that NV drivers offer compared to Catalyst's rather more anemic profile options.
> 
> Up until now, one of the problems at AMD is that the guys building the hardware didn't have the ability to call up the driver team and say "We need to focus on implementing X, Y, and Z" or "We want to make sure consumers have an easy way to use the hardware features we've built into GCN." I'm not claiming that the two teams didn't work together, but they weren't working for the same bosses and they didn't always have the same priorities.
> 
> This reorganization should give AMD a better structure to respond to issues and give it a faster path to bringing new driver features to market or fixing bugs.


Great points.

Raja, being an engineer himself, being in charge of the driver team as well as the hardware engineering team(s) ought to have quite an impact. All of these changes, right before DX12 titles hit, are pointing towards a new direction and a new AMD for a new generation of titles on a new API. I think that a year from now... we might not even recognize AMD.


----------



## SpeedyVT

Quote:


> Originally Posted by *airfathaaaaa*
> 
> someone was saying that it would be good for amd gpu department to be separated from the cpu one..well
> 
> AMD Forms Radeon Technologies Group, Taps Raja Koduri To Lead Team Dedicated To Graphics Growth
> http://www.forbes.com/sites/jasonevangelho/2015/09/09/amd-forms-radeon-technologies-group-taps-raja-koduri-to-lead-team-dedicated-to-graphics-growth/


This is just typical business restructuring within AMD it's not a fork in the business. The way forbes depicts it as a seperate entity. It's just a development team focused on engineering GPUs that report specifically to Dr. Sue.


----------



## DigiHound

Quote:


> This is just typical business restructuring within AMD it's not a fork in the business. The way forbes depicts it as a seperate entity. It's just a development team focused on engineering GPUs that report specifically to Dr. Sue [sic].


With respect, I wouldn't characterize this as "just" a business restructuring. I don't claim to have special insight of AMD's internal structures, but I've done a lot of talking to various AMD groups over the years and seen some of the inefficiencies this is meant to address.

After AMD bought ATI, it embarked on a specific mission to embed its GPU technology at every level of its CPU business. That didn't just mean baking GPUs into CPUs -- it divided the GPU teams responsible for developer relations, Catalyst, discrete GPU design, APU design, OEM partnerships, marketing, PR -- all of these various aspects of the GPU business were segmented and handled by different teams of people.

It was a laudable goal and an attempt to unify two different companies, but it didn't always pay great dividends in certain areas. Bringing everyone back together under a common banner and giving that assignment to Raja is, IMO, a very significant and important move.

No, it's not a magic bullet. It doesn't give AMD 20 points of market share. It doesn't give Raja $500M dollars to invest in GPU tech. But it should allow for much more efficient centralization of resources than AMD 's GPU divisions have had access to before, and give them a chance to create a more unified product vision.


----------



## SpeedyVT

Quote:


> Originally Posted by *DigiHound*
> 
> With respect, I wouldn't characterize this as "just" a business restructuring. I don't claim to have special insight of AMD's internal structures, but I've done a lot of talking to various AMD groups over the years and seen some of the inefficiencies this is meant to address.
> 
> After AMD bought ATI, it embarked on a specific mission to embed its GPU technology at every level of its CPU business. That didn't just mean baking GPUs into CPUs -- it divided the GPU teams responsible for developer relations, Catalyst, discrete GPU design, APU design, OEM partnerships, marketing, PR -- all of these various aspects of the GPU business were segmented and handled by different teams of people.
> 
> It was a laudable goal and an attempt to unify two different companies, but it didn't always pay great dividends in certain areas. Bringing everyone back together under a common banner and giving that assignment to Raja is, IMO, a very significant and important move.
> 
> No, it's not a magic bullet. It doesn't give AMD 20 points of market share. It doesn't give Raja $500M dollars to invest in GPU tech. But it should allow for much more efficient centralization of resources than AMD 's GPU divisions have had access to before, and give them a chance to create a more unified product vision.


Anything that is happening now was in the works much longer than moments ago. A lot of people jumped ship, probably including manufacturers with the DX12 issues not being addressed quickly by NVidia. Christmas season is arising, big retailers have to stock systems to meet the demands of the consumers. We'll probably see quite a few with AMD gpu chips for Christmas sales. Basically about this time every year inventory is purchased by Dell, HP and whom ever. Don't forget Nintendo's purchase of chips for their new system.

That's easily a huge 20 point gain if you want talk about stock performance.


----------



## FastEddieNYC

Silver Lake is reported to be buying 20% of AMD. If true, the announced restructuring of the GPU division is part of a deal that AMD agreed to in return for the cash. Silver Lake is the same investment firm that took Dell private and they have very deep pockets. Big changes ahead for AMD.
http://www.bit-tech.net/news/hardware/2015/09/09/silver-lake-amd-deal/1


----------



## steadly2004

So.... Quick side question. If the Nvidia side and drivers unifies or fixes compute with their drivers, do you think it will work with SLI? Like currently when using CUDA and compute for BOINC, I have to disable SLI to get both GPU's computing. I would love to not have to enable and disable SLI when going from gaming to computing.

Or.... Is it a completely different issue? Does AMD have to disable crossfire to compute with programs like boinc or folding currently?

Sent from my trltetmo using Tapatalk


----------



## SpeedyVT

Quote:


> Originally Posted by *FastEddieNYC*
> 
> Silver Lake is reported to be buying 20% of AMD. If true, the announced restructuring of the GPU division is part of a deal that AMD agreed to in return for the cash. Silver Lake is the same investment firm that took Dell private and they have very deep pockets. Big changes ahead for AMD.
> http://www.bit-tech.net/news/hardware/2015/09/09/silver-lake-amd-deal/1


Can we stop posting this, first Fud posts it and then hours later bit-tech. Completely unverified rumor. There is no proof of AMD selling 20%. AMD did however restructure for the longterm.


----------



## mav451

Considering the speculation _rampant_ in this thread, discussing rumors about private equity firm involvement is along the very same lines.

I would simply suggest that FastEddieNYC revise his post to say "rumored" instead of "reported."


----------



## Dudewitbow

Quote:


> Originally Posted by *steadly2004*
> 
> So.... Quick side question. If the Nvidia side and drivers unifies or fixes compute with their drivers, do you think it will work with SLI? Like currently when using CUDA and compute for BOINC, I have to disable SLI to get both GPU's computing. I would love to not have to enable and disable SLI when going from gaming to computing.
> 
> Or.... Is it a completely different issue? Does AMD have to disable crossfire to compute with programs like boinc or folding currently?
> 
> Sent from my trltetmo using Tapatalk


the issue here isn't compute based, its not that nvidia is lacking compute, but it currently lacks the ability to do different kinds of compute in parallel due to how its scheduler works. if nvidia coudlnt do compute completely, then it wouldn't be possible to do CUDA, Physx or things like mining on an nvidia card(which of course is completely the opposite it can do these)


----------



## steadly2004

Quote:


> Originally Posted by *Dudewitbow*
> 
> Quote:
> 
> 
> 
> Originally Posted by *steadly2004*
> 
> So.... Quick side question. If the Nvidia side and drivers unifies or fixes compute with their drivers, do you think it will work with SLI? Like currently when using CUDA and compute for BOINC, I have to disable SLI to get both GPU's computing. I would love to not have to enable and disable SLI when going from gaming to computing.
> 
> Or.... Is it a completely different issue? Does AMD have to disable crossfire to compute with programs like boinc or folding currently?
> 
> Sent from my trltetmo using Tapatalk
> 
> 
> 
> the issue here isn't compute based, its not that nvidia is lacking compute, but it currently lacks the ability to do different kinds of compute in parallel due to how its scheduler works. if nvidia coudlnt do compute completely, then it wouldn't be possible to do CUDA, Physx or things like mining on an nvidia card(which of course is completely the opposite it can do these)
Click to expand...

So if drivers fix this it won't fix my problem.... Damn.

Sent from my trltetmo using Tapatalk


----------



## DigiHound

AFAIK, this typically goes the same way for both companies. If a program supports running on both GPUs without disabling multi-GPU configurations, it probably can do so for both AMD and Nvidia. If it doesn't, you probably have to manually set it for both.

I *think* it's because the GPGPU workload usually wants you to tell it, explicitly, which GPU to execute on. When SLI or Crossfire is enabled, the driver is trying to treat the GPU as a unified pool for rendering purposes (albeit a unified pool with duplicated VRAM data), and this mucks with applications, which aren't expecting it.

DX12 *might* change this, but I'm assuming that the driver still sends a flag to a game to tell it that multiple GPUs are present in a system, even if the game engine handles more of the grunt work of deciding where various workloads ought to run.


----------



## drSeehas

Quote:


> Originally Posted by *HalGameGuru*
> 
> ...
> 
> ...


Which chips are Kepler(v1) and which are Kepler(v2)?


----------



## PirateZ

Should i buy a Nvidia or AMD card now?


----------



## NoirWolf

Quote:


> Originally Posted by *PirateZ*
> 
> Should i buy a Nvidia or AMD card now?


If you like games which may take advantage of Async sooner and much more heavily (open world games, strategy games, etc) and play at 1080p-1440p then AMD is good.
If you like games that may go heavily into Gameworks in the not so distant future or don't need to switch to Async for better performance and play at 1080p-1440p then Nvidia is good.
Above 1440p it's a crapshoot really depending on your own needs.


----------



## Defoler

Quote:


> Originally Posted by *mtcn77*
> 
> PS: Not actually Rasteriser Ordered Views, but you get the point...


When you post a chart which is so far inaccurate, you lost your point. So lost it that it went right out the window, crashed on the pavement, and oozed itself into the sewers. Never to be seen again.
Quote:


> Originally Posted by *NoirWolf*
> 
> ATI launched their version in 2008, roughly one year after Nvidia and both had it working since the early 2000s in HPCs.


1. We are not talking about quadro vs firepro, unless you really want to go there, and you will lose all ground about AMD doing anything groundbraking in the last 5-6 years.
2. GPGPU really arrived to the radeon with the 7970. Until then it was software and partially blocked in the cards before it to prevent drop sales of the firepro cards. Nvidia on the other hand sold a crap load of GPUs because they supported CUDA for a long time.
Quote:


> Originally Posted by *semitope*
> 
> G-sync also took the concept of adaptive sync. Their implementation never made sense to me but made them money.


That is funny







because "freesync" implementation is... exactly like g-sync. Except instead of AMD making the hardware and software part, they make the monitor manufacturers do it. If you talk about making sense. Don't forget that AMD claimed "we don't need special hardware!" but alas, yes they did.
Also while both g-sync and "freesync" have the same roots, and while eDP does not need any dedicated hardware, both of the desktop PC do need it, and again, nvidia just brought it out quickly, and AMD followed by yelling, barking and complaining (and lying) all the way to it.

And just to put the nails in the coffins, the ashes alpha benchmark shows the 980 TI and the Fury X almost neck to neck, before the 980 TI even supported async calls. I wonder what will happen when they do...


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Defoler*
> 
> When you post a chart which is so far inaccurate, you lost your point. So lost it that it went right out the window, crashed on the pavement, and oozed itself into the sewers. Never to be seen again.
> 
> ...
> 
> That is funny
> 
> 
> 
> 
> 
> 
> 
> because "freesync" implementation is... exactly like g-sync. Except instead of AMD making the hardware and software part, they make the monitor manufacturers do it.


The FreeSync implementation is NOT "exactly like g-sync". You obviously know nothing about G-Sync. So let me educate people on one key difference. G-Sync, because it has hardware, had memory and the ability to do frame multiplication. Thus when the FPS dips below the monitors low end range, it can sent the same frame out again, thus doubling (or more) the number of frames and thus allow the VRR range to be extended below that of the monitor. That is why, unlike the LIES that AMD is telling, G-Sync has a range of 2-240Hz, not the 30-144 like AMD claims (oh and since there is now a 165Hz G-Sync monitor, I guess that proves AMD was a liar as well). You can't do that with FreeSync, and no, monitor companies don't build in a frame buffer into their monitors. So as you can see, FreeSync is NOT "exactly like g-sync".

nVidia looked at the eDP standard, elected NOT to go that route because it was a worse solution and would take a lot longer to get implemented anyway (because of standards bodies ... then never move fast, and as everyone sees, it is now only an OPTIONAL standard). The eDP implimentation was not for gaming, it was for power savings. It COULD be modified into something else for gaming (and was, both by AMD and nVidia ... aka 'Mobile G-Sync'), but G-Sync is superior that FreeSync on the low end because of the hardware. The low end is where things are important anyway. Once you are getting 120+ FPS, it really isn't going to matter much if it's on or off, but at 25 or 30FPS, yeah, it REALLY makes a difference. Trust me, I know, I have 2 G-Sync monitors and use them every day, and have for awhile. If you are like me and like "turning up your settings" at 1440p (and very soon 4K), well then, the low end is what you need. If you don't care about looks and run on a 1080p TN panel, well then it isn't so much.

That's why when AMD got caught flat footed when nVidia released G-Sync, they had to scramble to counter it. The only way they could was to use what nVidia threw away, the eDP solution. That is why when AMD showed it, it was a cobbled together laptop screen and has lots of issues. Even then, it took another full year to get it some what working on the desktop, and even then there were issues. Entire "released" monitors had to be sent back after being certified, because of overdrive issues. It is only now, about 2 years later, that FreeSync is STARTING to take off. And if it does, nVidia will support it ... for their LOW END solution. But they will continue to offer G-Sync for those who want a superior solution that A-Sync/FreeSync can't. Oh, and speaking of, let's not forget that A-Sync (AdaptiveSync) and FreeSync are NOT the same. FreeSync is a PROPRIETARY version of A-Sync. Intel is adopting A-Sync, not FreeSync. Odds are when nVidia has to, they too will be like Intel and adopt A-Sync.

People love to say that AMD is all about "Open Standards", and yeah, Adaptive Sync is an OPTIONAL open standard, which is basically just something that already existed before AMD even touched it, in the eDP standard. But the fact of the matter is, FreeSync is NOT an open standard. It is a proprietary extension of Adaptive Sync.

Quote:


> Originally Posted by *Defoler*
> 
> And just to put the nails in the coffins, the ashes alpha benchmark shows the 980 TI and the Fury X almost neck to neck, before the 980 TI even supported async calls. I wonder what will happen when they do...


Exactly.


----------



## sugarhell

Knuckle and an AMD thread? Choose one of the topics no matter the thread

Dx9 frame pacing
Dx11 overhead
Freesync


----------



## NoirWolf

Quote:


> Originally Posted by *Defoler*
> 
> When you post a chart which is so far inaccurate, you lost your point. So lost it that it went right out the window, crashed on the pavement, and oozed itself into the sewers. Never to be seen again.
> 1. We are not talking about quadro vs firepro, unless you really want to go there, and you will lose all ground about AMD doing anything groundbraking in the last 5-6 years.
> 2. GPGPU really arrived to the radeon with the 7970. Until then it was software and partially blocked in the cards before it to prevent drop sales of the firepro cards. Nvidia on the other hand sold a crap load of GPUs because they supported CUDA for a long time.
> 
> That is funny
> 
> 
> 
> 
> 
> 
> 
> because "freesync" implementation is... exactly like g-sync. Except instead of AMD making the hardware and software part, they make the monitor manufacturers do it. If you talk about making sense. Don't forget that AMD claimed "we don't need special hardware!" but alas, yes they did.
> Also while both g-sync and "freesync" have the same roots, and while eDP does not need any dedicated hardware, both of the desktop PC do need it, and again, nvidia just brought it out quickly, and AMD followed by yelling, barking and complaining (and lying) all the way to it.
> 
> And just to put the nails in the coffins, the ashes alpha benchmark shows the 980 TI and the Fury X almost neck to neck, before the 980 TI even supported async calls. I wonder what will happen when they do...


So in order:
That chart is inaccurate but so is the one off Wikipedia.

1 and 2: And Nvidia killed their GPUs in similar fashion after Fermi to prevent Quadro sales taking a dive. If you wanna be pedantic about GPGPU not being supported on a hardware level by AMD around the same time then that logic can be reversed for Async which shows Nvidia actually gimped their GPUs after Fermi on more than one front.

Freesync is easier to implement than G-sync simply because there's no ... proprietary chip requirement for the former. More profit for the manufacturer of the monitor = more incentive to go with Freesync







.

Nvidia brought it in to get another reason to sell their overpriced hardware to people who believe physx or G-sync is in any way remarkable (who did it first? well as Tessellation shows it doesn't really matter ^^ who does it best does







) .

And as for the final nail in your attempt at a reasoned argument: It will tank in performance because the 980ti will have to wait for its software to run Async







which means CPU involvement







. The Ashes benchmark pits the Fury X in its best conditions to the 980ti in its best conditions and as you can quite clearly see one of them is gonna be pretty useless in Async, I wonder which







.


----------



## sugarhell

The only official chart is this:


----------



## 47 Knucklehead

Quote:


> Originally Posted by *sugarhell*
> 
> Knuckle and an AMD thread? Choose one of the topics no matter the thread
> 
> Dx9 frame pacing
> Dx11 overhead
> Freesync


It's called ... setting the record straight with FACTS.

I know that AMD people have issues with facts, but hey, it's a pet peeve of mine.


----------



## NoirWolf

Quote:


> Originally Posted by *47 Knucklehead*
> 
> It's called ... setting the record straight with FACTS.
> 
> I know that AMD people have issues with facts, but hey, it's a pet peeve of mine.


FACTS such as:
1. Did you know Windows 10 does not natively support floppy disks? I know, the audacity of Microsoft!
2. Did you know Windows 10 does not run multiple Windows 1.0 games? Yes they robbed you. You heard it here first.
3. Did you know AMD GPUs which are overkill for for League of Legends at 4k ultra cannot be run in crossfire on LoL?

Thanks though for letting us know that statements by AMD on G-sync capabilities at that time make them liars for what G-sync is capable of today. Guess there isn't a single hardware vendor in existence that isn't a liar.

And to be honest fanboys have a love of cherry picking for and against certain hardware vendors. Sometimes they pick cherries which are still valid (that compiler that's still running around from Intel that kills the performance of anything that doesn't ID itself as an Intel chip) but most times they pick cherries which aren't so much rotten as fossilized.


----------



## SpeedyVT

Quote:


> Originally Posted by *NoirWolf*
> 
> FACTS such as:
> 1. Did you know Windows 10 does not natively support floppy disks? I know, the audacity of Microsoft!
> 2. Did you know Windows 10 does not run multiple Windows 1.0 games? Yes they robbed you. You heard it here first.
> 3. Did you know AMD GPUs which are overkill for for League of Legends at 4k ultra cannot be run in crossfire on LoL?
> 
> Thanks though for letting us know that statements by AMD on G-sync capabilities at that time make them liars for what G-sync is capable of today. Guess there isn't a single hardware vendor in existence that isn't a liar.
> 
> And to be honest fanboys have a love of cherry picking for and against certain hardware vendors. Sometimes they pick cherries which are still valid (that compiler that's still running around from Intel that kills the performance of anything that doesn't ID itself as an Intel chip) but most times they pick cherries which aren't so much rotten as fossilized.


not quite sure why you'd crossfire for LoL anyway I get some insane frames at 4k.


----------



## mcg75

And we're done here.

As more information comes available, new threads can be created.


----------

