# [computerbase.de] DOOM + Vulkan Benchmarked.



## dir_d

Very Nice


----------



## ku4eto

Gains for the green camp = 1-5%.

Anyway, no CPU benching? Also, where is the benching on the older GCN cards, they should be getting bumps as ewll.


----------



## OneB1t

there is massive increase for FX cpus lineup as well


----------



## Randomdude

Fury X at 4k is pushing 45FPS and 160FPS in 1080P. 4 times as many pixels > 4x45=180FPS if it were scaled down like that. That's almost perfect FPS scaling as pixels per frame increase. 88% actually for the aforementioned card.

For comparison's sake.

980 Ti - 80%

1070 - 77%

Sadly the other cards aren't included in the 4k benchmark.


----------



## Themisseble

RX 480 looks great! 11% slower than GTX 1070 at 1080p... what a deal.


----------



## TopicClocker

The Fury X is destroying it, that's an amazing 50-60% performance boost at 1080p and 1440p!

I wonder when a new driver will be pushed out for the NVIDIA GPUs?


----------



## Ha-Nocri

Quote:


> Originally Posted by *TopicClocker*
> 
> The Fury X is destroying it, that's an amazing 50-60% performance boost at 1080p and 1440p!
> 
> I wonder when a new driver will be pushed out for the NVIDIA GPUs?


I wouldn't hold my breath for improvements on NV GPUs.

Btw, Fury X is limited by 4GB @4k maybe?!


----------



## Randomdude

Quote:


> Originally Posted by *TopicClocker*
> 
> The Fury X is destroying it, that's an amazing 50-60% performance boost at 1080p and 1440p!
> 
> *I wonder when a new driver will be pushed out for the NVIDIA GPUs?*


If it takes more than a year - never!


----------



## Klocek001

whoa fury x kicks ass and takes names


----------



## Kpjoslee

Quote:


> Originally Posted by *Ha-Nocri*
> 
> I wouldn't hold my breath for improvements on NV GPUs.
> 
> Btw, Fury X is limited by 4GB @4k maybe?!


ROP limited.


----------



## kingduqc

Holly batman the 480 is kicking some major ass. Can't wait to see more


----------



## Serios

Battlefield is next.


----------



## venom55520

This has always been the case, I'm not sure why anyone's surprised. AMD has been consistently developing their cards towards vulkan and dx12 since the r9 2xx series.


----------



## Ha-Nocri

Quote:


> Originally Posted by *Serios*
> 
> Battlefield is next.


Is it confirmed Battlefield 1 will (not) use async?


----------



## qlum

Man that fury x score just goes up so much on vulkan, this does kind of shows. How much the opengl implementation is hindering the card's performance.


----------



## Serios

Quote:


> Originally Posted by *Ha-Nocri*
> 
> Is it confirmed Battlefield 1 will (not) use async?


I'm almost certain it will have Async because of the Xbox and PlayStation versions, we are talking about Dice here.


----------



## magnek

RX 480 near 980 Ti performance finally confirmed.


----------



## pengs

Quote:


> Originally Posted by *Ha-Nocri*
> 
> Is it confirmed Battlefield 1 will (not) use async?


Battlefield 1 has Async Compute and DX12, expect 'DX12 fireworks' as of June 28th
sub article 6/28
TechFrag 6/28

All on about DX12 fireworks and async. The only downside is that it will not a native DX12 game but that's a given. DICE isn't going to cut off the nose to spite the face.


----------



## DrFPS

Wishful thinking. Your not even close.


----------



## mtcn77

The usual suspects' jimmies have been rustled by the results, as always. I bet AMD has cut deals to include these enhancements in all forthcoming titles.


----------



## michaelius

Quote:


> Originally Posted by *mtcn77*
> 
> The usual suspects' jimmies have been rustled by the results, as always. I bet AMD has cut deals to include these enhancements in all forthcoming titles.


Well Nvidia performance hasn't been cut by 20% without adding single new effect so it looks like it wasn't AMD sponsored game.


----------



## jmcosta

nice boost for amd cards and nvidia might not have the driver ready because if you guys compare other games under vulkan (linux) they perform much better on their cards

time will tell


----------



## LancerVI

LOL......

Talk about speaking out of both sides of your mouth. Jeeeeeesus. LOL


----------



## tp4tissue

Oh snap.. RX480, messing up Nvidia's face.. /scratch


----------



## sugarhell

So i tried both my 290 and 7970. Even with my 7970 and the 2 ACEs when i enable TSSAA the performance is exactly the same as SMAA.

On my xeon there is almost zero cpu bottleneck


----------



## Kpjoslee

Still waiting for multi-gpu support.....


----------



## Basard

This is what I was expecting from the Fury X right off the bat on launch date.... Boy was I disappointed.


----------



## mtcn77

Quote:


> Originally Posted by *magnek*
> 
> finish the sentence dammit


When you see the frametimes, you won't deliver on that.


----------



## PontiacGTX

Quote:


> Originally Posted by *sugarhell*
> 
> So i tried both my 290 and 7970. Even with my 7970 and the 2 ACEs when i enable TSSAA the performance is exactly the same as SMAA.
> 
> On my xeon there is almost zero cpu bottleneck


Then this means GCN1 wasnt taken into account for asynchronous compute. because of minimal gains or the few ACEs dont make a difference?


----------



## provost

I think this is a good example of performance gains that can be achieved by the GPUs; a one trick pony, compared to, for example, a CPU. However, I wouldn't be surprised if Nvidia (and/or AMD?) start to "incentify" developers not to give away "free performance", since it inherently poses a threat to their business model of yesterday.... Lol


----------



## mejobloggs

Is there any Vulkan vs DX bechmarks? Not really interested in Vulkan vs OpenGL


----------



## Clovertail100

Holy smokes.

Those are some _impressive_ gains for AMD. The 480 really shows it's weakness, here, next to past chips.. but AMD is looking much better overall. I've been waiting a long time to see what kind of performance was sitting "unused" due to Direct X stagnation.

Fiji looks spectacular until you see it's original OpenGL performance numbers. OpenGL doesn't like HBM, maybe. The gains certainly showcase Direct X limitations that have been plaguing AMD. I'm really looking forward to having an API that'll advance at the same pace AMD has been for the last few years; I hope Vulkan does can keep pace. AMD won't be the only one introducing hardware in the hopes that the industry might support it, if this keeps up. NV might start playing ball too, and bringing things to the table that can benefit all consumers.

Things are getting exciting again. I'm stoked.


----------



## JackCY

Vulkan vs DX12? Sure code your own graphic demo/benchmark in both


----------



## FLCLimax

Quote:


> Originally Posted by *Mookster*
> 
> Holy smokes.
> 
> Those are some _impressive_ gains for AMD. *The 480 really shows it's weakness, here, next to past chips*.. but AMD is looking much better overall. I've been waiting a long time to see what kind of performance was sitting "unused" due to Direct X stagnation.
> 
> Fiji looks spectacular until you see it's original OpenGL performance numbers. OpenGL doesn't like HBM, maybe. The gains certainly showcase Direct X limitations that have been plaguing AMD. I'm really looking forward to having an API that'll advance at the same pace AMD has been for the last few years; I hope Vulkan does can keep pace. AMD won't be the only one introducing hardware in the hopes that the industry might support it, if this keeps up. NV might start playing ball too, and bringing things to the table that can benefit all consumers.
> 
> Things are getting exciting again. I'm stoked.


Lmao, it has half the specs of those past chips...


----------



## Clovertail100

Quote:


> Originally Posted by *FLCLimax*
> 
> Lmao, it has half the specs of those past chips...


It's also a double shrink, at 14nm FF. You can get an AIB 8GB R9 390 for $259 on Newegg right now.

Sure, the TDP. But, die shrinks have always meant better TDP, price, _and_ performance in the past. At least, it did before we took a 5 year hiatus on shrinks. It's a weak release.. especially considering it's a long overdue _double_ shrink.


----------



## pengs

Quote:


> Originally Posted by *sugarhell*
> 
> So i tried both my 290 and 7970. Even with my 7970 and the 2 ACEs when i enable TSSAA the performance is exactly the same as SMAA.
> 
> On my xeon there is almost zero cpu bottleneck


Quote:


> Originally Posted by *PontiacGTX*
> 
> Then this means GCN1 wasnt taken into account for asynchronous compute. because of minimal gains or the few ACEs dont make a difference?


Dunno.
DOOM 123% Vulcan Performance Increase on R9 280x A lot of that is probably from Vulkan itself but that's a massive improvement.

Is the compute option checked in the settings sugar?


----------



## Gungnir

Quote:


> Originally Posted by *mejobloggs*
> 
> Is there any Vulkan vs DX bechmarks? Not really interested in Vulkan vs OpenGL


Vulkan and DX12 should be pretty much the same; they have pretty similar designs and goals, they're both inspired by/based on Mantle, etc. DX12 has the advantage of XB1 support and MS' blessing, Vulkan has the advantage of Windows 7-8.1, Linux, and Android (and possibly macOS/iOS at some point in the future, if Apple decides to start supporting it). There are some other technical details that might become important in the future, especially with regard to extensions and feature levels, but as far as right now, the performance should be close enough between the two that it doesn't really matter.


----------



## mtcn77

Quote:


> Originally Posted by *Mookster*
> 
> It's also a double shrink, at 14nm FF. You can get an AIB 8GB R9 390 for $259 on Newegg right now.
> 
> Sure, the TDP. But, die shrinks have always meant better TDP, price, _and_ performance in the past. At least, it did before we took a 5 year hiatus on shrinks. It's a weak release.. especially considering it's a long overdue _double_ shrink.


390 is always working too close to its power limit, so just like the tonga series gpus, the dilemma of picking between the lesser of the two evils comes at a premium.


----------



## Clovertail100

Quote:


> Originally Posted by *mtcn77*
> 
> 390 is always working too close to its power limit, so just like the tonga series gpus, the dilemma of picking between the lesser of the two evils comes at a premium.


I don't recall any issues like that. I recall the reference 290 coming with new software to throttle clocks as it approached thermal limits, but never problems with power limits like the 480 appears to have.

I'd prefer the 480, of course, because of the TDP. Nevertheless, it's not hyper-critical to call the 480 a bit of a disappointment for not outperforming the 390 by a large margin considering it's roughly the same price. Die shrinks have always brought improvements to price, performance and TDP. This time around, we're only getting an improvement to TDP. No price improvement, no performance improvement.

I'm an optimist, but that's a downgrade from what we're used to seeing from die shrinks. I'm sorry, it just is.


----------



## boot318

Quote:


> Originally Posted by *Mookster*
> 
> *It's also a double shrink,* at 14nm FF. You can get an AIB 8GB R9 390 for $259 on Newegg right now.
> 
> Sure, the TDP. But, die shrinks have always meant better TDP, price, _and_ performance in the past. At least, it did before we took a 5 year hiatus on shrinks. It's a weak release.. especially considering it's a long overdue _double_ shrink.


GloFlo/Samsung's 14nm is just 20nm with finfet.


----------



## mtcn77

Quote:


> Originally Posted by *Mookster*
> 
> I don't recall any issues like that. I recall the reference 290 coming with new software to throttle clocks as it approached thermal limits, but never problems with power limits like the 480 appears to have.
> 
> I'd prefer the 480, of course, because of the TDP. Nevertheless, it's not hyper-critical to call the 480 a bit of a disappointment for not outperforming the 390 by a large margin considering it's roughly the same price. Die shrinks have always brought improvements to price, performance and TDP. This time around, we're only getting an improvement to TDP. No price improvement, no performance improvement.
> 
> I'm an optimist, but that's a downgrade from what we're used to seeing from die shrinks. I'm sorry, it just is.


I have been following the issue a lot. All "enthusiast" grade AMD gpus have demonstrated this double-edged pattern as of yet. Thus, I don't object when people have trouble adopting the gpus because you either have to ignore all power restrictions and act as if you had unrestricted cooling potential to let Tahiti/Hawaii gpus shine through, or you would rest on your laurels being restricted by the power threshold limitations all the time. No wonder they sold so bad since they were never meant to be mainstream components in the first place. Pitcairn, Tonga and Polaris were, on the other hand.
For starters, you can easily hit 360 watts with a 390(from "Tom's Review of the subject") however you cannot do the same with 480 - it has three folds more power delivery than its overall power usage. You literally have to overclock it past its air cooler's potential in order to reach phase limitation levels(1.35v voltage potential isn't even officially available) and that still isn't the total of what the card can give.


----------



## Kastor16

I don't think this had been brought forward but this is form the FAQ on the steam forums:
Quote:


> *Currently asynchronous compute is only supported on AMD GPUs* and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.


Hopefully we do get some new drivers from Nvidia soon. I also found on my 1080 that Gysnc is longer changing my screen's refresh rate, I'll just stick to OGL for now.


----------



## phenom01

In breaking news AMD sees performance increase going from a API they suck at to a API they paid to develop nothing more at 9pm.


----------



## Clovertail100

Quote:


> Originally Posted by *boot318*
> 
> GloFlo/Samsung's 14nm is just 20nm with finfet.


For all intents and purposes, a double shrink. It's not as if FinFet should be negated when you're comparing the last node to the new one.

It's kind of moot to treat this as anything other than a double shrink, which is why everyone is conceding to call these 14nm/16nm nodes instead of 20nm.
Quote:


> Originally Posted by *mtcn77*
> 
> I have been following the issue a lot. All "enthusiast" grade AMD gpus have demonstrated this double-edged pattern as of yet. Thus, I don't object when people have trouble adopting the gpus because you either have to ignore all power restrictions and act as if you had unrestricted cooling potential to let Tahiti/Hawaii gpus shine through, or you would rest on your laurels being restricted by the power threshold limitations all the time. No wonder they sold so bad since they never were meant as mainstream components in the first place. Pitcairn, Tonga and Polaris were, on the other hand.
> For starters, you can easily hit 360 watts with a 390(from "Tom's Review of the subject") however you cannot do the same with 480 - it has three folds more power delivery than its overall power usage. You literally have to overclock it past its air cooler's potential in order to reach phase limitation levels(1.35v voltage potential isn't even officially available) and that still isn't the total of what the card can give.


AMD prefers to pack more transistors into smaller dies than NV more often than not, so it's not surprising to see a steeper curve in power consumption/temperature occuring at the same time. People make the mistake of attributing this to the cooler or the power, but it's really just due to the transistor density. More power radiates out as heat instead of it's desired purpose (computing), and that heat causes even more leakage because that's how it impacts the transistors. The function of the cooler is to prevent this runaway, but the closer you pack your transisters, the less it'll be possible for any conventional cooler to work. AMD is in the spectrum where fine tuning is required, which is why you feel like you're always limited by either heat or power.

The impulse is to blame inadequate cooling or inadequate power for a lack of overclocking headroom, but the reality is that AMD is tailoring their coolers and power phase to match the _designed_ limits of their extra-dense chips.

They know what they're doing. They responded confidently that extra power does absolutely nothing to help achieve better clocks with P10 because of the steep temperature rise at higher clocks, and I'm sure they knew that would be the result of those tightly packed transistors long before they started determining how tightly they should pack them.

It creates a bit of a "meh" situation for overclockers, but it does seem to be a more intelligent way of designing chips overall.


----------



## Slomo4shO

Quote:


> Originally Posted by *rcfc89*
> 
> We need competition to bring these ridiculous prices down.


$400 Fury X?

Also, wonder how much of this performance would have been realized with better OpenGL drivers


----------



## rv8000

Quote:


> Originally Posted by *Mookster*
> 
> It's also a double shrink, at 14nm FF. You can get an AIB 8GB R9 390 for $259 on Newegg right now.
> 
> Sure, the TDP. But, die shrinks have always meant better TDP, price, _and_ performance in the past. At least, it did before we took a 5 year hiatus on shrinks. It's a weak release.. especially considering it's a long overdue *double shrink*.


AFAIK, both TSMC and GloFo/Samsung 16m, amd 14m, are 20nm with FF. Neither is true 14nm or 16nm, they're both larger, so you can't really call it a double shrink; there was a tech slide in another one of the news threads comparing Intel 14nm to 16nm TSMC and 14nm Glofo/Samsung showing the size differences.


----------



## raghu78

id software has done a fantastic job with Doom running Vulkan. The massive performance boost is due to 3 specific features running on top of Vulkan (which brings perf improvements by reducing CPU driver overhead).
1. Async compute
2. Shader intrinsics
3. Frame flip optimizations.

http://radeon.com/doom-vulkan/
https://community.bethesda.net/thread/54585?start=0&tstart=0

The performance of Rx 480 is amazing at 90% of GTX 1070. kudos to id software and AMD.


----------



## mtcn77

Quote:


> Originally Posted by *Mookster*
> 
> For all intents and purposes, a double shrink. It's not as if FinFet should be negated when you're comparing the last node to the new one.
> 
> It's kind of moot to treat this as anything other than a double shrink, which is why everyone is conceding to call these 14nm/16nm nodes instead of 20nm.
> AMD prefers to pack more transistors into smaller dies than NV more often than not, so it's not surprising to see a steeper curve in power consumption/temperature occuring at the same time. People make the mistake of attributing this to the cooler or the power, but it's really just due to the transistor density. More power radiates out as heat instead of it's desired purpose (computing), and that heat causes even more leakage because that's how it impacts the transistors. The function of the cooler is to prevent this runaway, but the closer you pack your transisters, the less it'll be possible for any conventional cooler to work. AMD is in the spectrum where fine tuning is required, which is why you feel like you're always limited by either heat or power.
> 
> The impulse is to blame inadequate cooling or inadequate power for a lack of overclocking headroom, but the reality is that AMD is tailoring their coolers and power phase to match the _designed_ limits of their extra-dense chips.
> 
> They know what they're doing. They responded confidently that extra power does absolutely nothing to help achieve better clocks with P10 because of the steep temperature rise at higher clocks, and I'm sure they knew that would be the result of those tightly packed transistors long before they started determining how tightly they should pack them.
> 
> It creates a bit of a "meh" situation for overclockers, but it does seem to be a more intelligent way of designing chips overall.


Being able to push the limits of the cooler is a good event, in my opinion, since the tdp limit works against gpu clocks, so an absence of peripheral units that hold back the gpu frequency sounds like an ideal solution.
For 390, 360 amperes was a hard limit, as per observations by credible members, so there just wasn't enough redundancy built into that card's power delivery. Risk taking isn't my best suit and I know how hot HD4890 got during sustained loads. Once the temperature becomes critical, the cooling deficit only grows as you are removing the same heat at the same temperature gradient while the semiconductor components get more and more leaky(Poole-Frenkel Effect). Suddenly removing twice the heat is a necessity at twice the fan speed and the gpu's alter ego turns up like a bad penny. Vroom, vroom!


----------



## Kana Chan

290x/390x is 1.40x faster than the 970 and 290/390 is 1.275x faster than the 970
290x vs 780ti / 980?
They were finally able to utilize all the cores.


----------



## gamervivek

Quote:


> Originally Posted by *provost*
> 
> I think this is a good example of performance gains that can be achieved by the GPUs; a one trick pony, compared to, for example, a CPU. However, I wouldn't be surprised if Nvidia (and/or AMD?) start to "incentify" developers not to give away "free performance", since it inherently poses a threat to their business model of yesterday.... Lol


Well, it's not like it has happened before, oh wait.
Quote:


> We have been following a brewing controversy over the PC version of Assassin's Creed and its support for AMD Radeon graphics cards with DirectX 10.1 for some time now. The folks at Rage3D first broke this story by noting some major performance gains in the game on a Radeon HD 3870 X2 with antialiasing enabled after Vista Service Pack 1 is installed-gains of up to 20%. Vista SP1, of course, adds support for DirectX version 10.1, among other things.


http://techreport.com/news/14707/ubisoft-comments-on-assassin-creed-dx10-1-controversy-updated


----------



## rcfc89

Quote:


> Originally Posted by *Slomo4shO*
> 
> $400 Fury X?
> 
> Also, wonder how much of this performance would have been realized with better OpenGL drivers


Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.


----------



## FLCLimax

Quote:


> Originally Posted by *rcfc89*
> 
> 
> 
> Spoiler: Warning: Spoiler!





Spoiler: Warning: Spoiler!


----------



## infranoia

Quote:


> Originally Posted by *rcfc89*
> 
> Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.


It's a little something called a "big picture". It involves APIs, game developers, and GPU architectures, and not so much a single game.

Oh, and no surprise-- fortunes change over time. It's kind of how capitalism & competition should work.


----------



## flippin_waffles

An example of whats to come? It seems likely that AMD's architecture is just better suited and prepared for these new gen games with more advanced APIs. Considering how similar DX12 and Vulkan are to Mantle.


----------



## NuclearPeace

Fury X and other Fiji GPUs gained so much because it has a massive front end bottleneck that was significantly ameliorated with Vulkan. Look at how much the Fury X gains compared to Polaris and Hawaii.


----------



## infranoia

Quote:


> Originally Posted by *NuclearPeace*
> 
> Fury X and other Fiji GPUs gained so much because it has a massive front end bottleneck. Look at how much the Fury X gains compared to Polaris and Hawaii.


Check the other thread. I went from 60-80 FPS in OpenGL to 120 - 140 FPS in Vulkan on this launch-day Hawaii at stock clocks. That went to 130 - 160 when I bumped it to 1100/1500. At 1080p mind you, but still.

If I look up at the skybox or at a wall it pegs at 200 frames per second, all on Ultra. Not bad for a 2013 chip.


----------



## magnek

Quote:


> Originally Posted by *rcfc89*
> 
> Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.


So AMD developed Vulkan. Now where are all the guys that swore up and down that AMD had nothing to do with DX12 or Vulkan.

Also, I would claim _all_ benchmarks if I were you, since all it takes is one counterexample to refute the argument, and there are already quite a few.


----------



## mtcn77

Quote:


> Originally Posted by *infranoia*
> 
> Check the other thread. I went from 60-80 FPS in OpenGL to 120 - 140 FPS in Vulkan on this launch-day Hawaii at stock clocks. That went to *130* - *160* when I bumped it to 1100/1500. At 1080p mind you, but still.


Same settings as the Pascal launch demonstration. Somebody tell me what is wrong:


----------



## infranoia

Quote:


> Originally Posted by *mtcn77*
> 
> Same settings as the Pascal launch demonstration. Somebody tell me what is wrong:


No async shader render path is enabled on Pascal, that's what's wrong. I'm getting those frames easily on this OC 290x.


----------



## Slomo4shO

Quote:


> Originally Posted by *rcfc89*
> 
> Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.










I am now a red loyalist for pointing out a $250 price drop...


----------



## formula m

You can keep telling us how cheap old NVidia hardware is on ebay... and how a 980ti price has dropped, etc..

None of it matters..!


----------



## Wishmaker

Funny how people think one size fits all. Enjoy the weather until Volta. NVIDIA will not let DX 12 and Live Long and Prosper slip their milking machine


----------



## Lass3

This is great, even tho I got a 980 Ti, I still got very good improvements (mostly minimum fps).

Why did they not use Vulkan to begin with? Then the game would have run good on both Nvidia and AMD from the start. My friend had pretty bad performance with his 290, till now. And now we don't play it anymore









Looking forward to BF1 benchmarks







Will it support both Vulkan and DX12?


----------



## daviejams

Quote:


> Originally Posted by *Lass3*
> 
> This is great, even tho I got a 980 Ti, I still got very good improvements (mostly minimum fps).
> 
> Why did they not use Vulkan to begin with? Then the game would have run good on both Nvidia and AMD from the start. My friend had pretty bad performance with his 290, till now. And now we don't play it anymore
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Looking forward to BF1 benchmarks
> 
> 
> 
> 
> 
> 
> 
> Will it support both Vulkan and DX12?


I'd imagine it will use dx11 & dx12

Don't think dice will have anything to do with Vulcan


----------



## scorch062

Quote:


> Originally Posted by *daviejams*
> 
> I'd imagine it will use dx11 & dx12
> 
> *Don't think dice will have anything to do with Vulcan*


They did implement Mantle very nicely in BF4, so it is possible i think. However, even if Vulkan will appear alongside DX12, doubt it will be anywhere near release date.


----------



## MuscleBound

yah yah Vulkan rocks- now what?


----------



## Klocek001

Quote:


> Originally Posted by *rcfc89*
> 
> Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.


this guy again







exaggerate much ?

but I admit once the novelty of HBM,small form factor,standard liquid cooling and a totally friggin rad name has worn off it's nothing more than a decent card. It gets stuck somewhere between 980 and 980Ti due to its architecture and drivers getting a hard time clashing with DX11, with very little oc room.


----------



## Wishmaker

Quote:


> Originally Posted by *scorch062*
> 
> They did implement Mantle very nicely in BF4, so it is possible i think. However, even if Vulkan will appear alongside DX12, doubt it will be anywhere near release date.


You mean that Mantle that used to crash on AMD cards? The same one where the fix was 'use DX 11' if you experience issues? Now I remember. Thanks!


----------



## ku4eto

Quote:


> Originally Posted by *Wishmaker*
> 
> You mean that Mantle that used to crash on AMD cards? The same one where the fix was 'use DX 11' if you experience issues? Now I remember. Thanks!


You know, the issue is present from like.. 1 month, and is only with the latest beta drivers. No issues on the WHQL ones.


----------



## scorch062

Quote:


> Originally Posted by *Wishmaker*
> 
> You mean that Mantle that used to crash on AMD cards? The same one where the fix was 'use DX 11' if you experience issues? Now I remember. Thanks!


Hmm, never had this issue, though i did buy BF4 in summer 2014, so apparently i missed all the fun









The main point i want to make is that DICE experimented, with a rocky start, a new API, which eventually lead to the birth DX12 and Vulkan. From this is would consider possible for DICE to work with Vulkan in the future. the main obstacle i see is the EA and Microsoft partnership.


----------



## OneB1t

microsoft is pushing DX12 hard







they want to bring console games to pc


----------



## ChevChelios

M$ will crush Vulkan and only some Bethesda games will continue using it for AAA titles, as before

but DX12 does about the same thing as Vulkan anyway


----------



## provost

Quote:


> Originally Posted by *rcfc89*
> 
> Is this guy serious? Fury X is garbage outside of Doom. Gets destroyed in all other benchmarks. Suddenly Amd gets good performance on a 2 month old game on the API they developed and red loyalist are going banana's. It's quite funny to be honest. It's why I love this forum.


If you are expecting someone to make you feel better about about doubling down on the now obsolete 980 Ti, then you are out of luck. Nvidia cards become EOL, from a driver and software optimization support perspective, the moment Nvidia gets ready to release a new GPU. It's as simple as that. The old cards will continue to function well enough, but don't expect any additional feature support, or driver improvements.

From my perspective, it's good to see Fury getting a performance boost in Vulkan and hopefully in DX12 going forward. I will accept free performance greatfully....lol


----------



## Bytales

Any News if Crossfire works for 480x in doom with Vulkan ?
As an NVIDA hater, im never going to buy their Cards again, their only after Profit, and their pretty greedy at it, stagnating Progress if it benefits them financialy.

Hence im getting two 480x and when the time Comes two vega Cards, or whatever single Card amd hasas the most powerfull with the newest architecture.


----------



## dieanotherday

can someone plz put dx12 vulcan on starcraft 2 and other RTSs, thats where most of the gains should be at.


----------



## Evil Penguin

Quote:


> Originally Posted by *daviejams*
> 
> I'd imagine it will use dx11 & dx12
> 
> *Don't think dice will have anything to do with Vulcan*


I'm pretty sure some of their developers helped develop the Vulkan spec.

They do have a WIP Vulkan render path but I'm just not sure if they will end up using it for BF1.


----------



## Newbie2009

Quote:


> Originally Posted by *Bytales*
> 
> Any News if Crossfire works for 480x in doom with Vulkan ?
> As an NVIDA hater, im never going to buy their Cards again, their only after Profit, and their pretty greedy at it, stagnating Progress if it benefits them financialy.
> 
> Hence im getting two 480x and when the time Comes two vega Cards, or whatever single Card amd hasas the most powerfull with the newest architecture.


It doesn't work with my 290x


----------



## JackCY

Quote:


> Originally Posted by *dieanotherday*
> 
> can someone plz put dx12 vulcan on starcraft 2 and other RTSs, thats where most of the gains should be at.


Ask Blizzard to support Vulkan. It works on all OSes, I think SC etc. runs not just on Win so it makes more sense to use Vulkan and have it run on any of their releases for all the OS they support.


----------



## scorch062

Quote:


> Originally Posted by *JackCY*
> 
> Ask Blizzard to support Vulkan. It works on all OSes, I think SC etc. runs not just on Win so it makes more sense to use Vulkan and have it run on any of their releases for all the OS they support.


I think Blizzard has a team that is working with Vulkan/Directx 12. Starcraft 2, WoW and even Diablo 3 would greatly benefit from these APIs, not sure if Overwatch would need one ASAP.


----------



## OneB1t

i hope that blizzard will bring their engines from 1990 tech level to 2016


----------



## Bytales

Isnt Crossfire suppose to work with Vulkan ?
It would sure be nice if Blizzard games get Vulkan Support, they always ran slower on AMD Hardware.


----------



## Glottis

Quote:


> Originally Posted by *OneB1t*
> 
> i hope that blizzard will bring their engines from 1990 tech level to 2016


stop trolling. Overwatch uses pretty much every latest graphical technology and effect available. WoW, which is more than a decade old is using DX11 with most DX11 features. Blizzard games are some of the best supported and updated ever. If you want Blizzard games to switch to photorealistic look like Crysis etc. that is never ever happening.


----------



## ChevChelios

Blizzard has good support in general, but hardly the best graphics or newest APIs

Overwatch doesnt do DX12, right ?

They also want their games to run on hardware as old/slow as possible

wouldnt bet on Blizzard having Vulkan any time soon

then again it will pobably be awhile before the next new Blizzard game even comes out


----------



## ZealotKi11er

Quote:


> Originally Posted by *ChevChelios*
> 
> Blizzard has good support in general, but hardly the best graphics or newest APIs
> 
> Overwatch doesnt do DX12, right ?
> 
> They also want their games to run on hardware as old/slow as possible
> 
> wouldnt bet on Blizzard having Vulkan any time soon
> 
> then again it will pobably be awhile before the next new Blizzard game even comes out


Their games are played by million of players. They do not want yo use a "Beta" API. Overwatch does not really need Vulkan.


----------



## Ha-Nocri

I don't play WoW, but I do play Civilization games. And Civilization 6 will have DX12 and Async, released in October.


----------



## ZealotKi11er

Quote:


> Originally Posted by *Ha-Nocri*
> 
> I don't play WoW, but I do play Civilization games. And Civilization 6 will have DX12 and Async, released in October.


Yeah because all companies what work with AMD will have the latest API.


----------



## Evil Penguin

Quote:


> Originally Posted by *Bytales*
> 
> Isnt Crossfire suppose to work with Vulkan ?
> It would sure be nice if Blizzard games get Vulkan Support, they always ran slower on AMD Hardware.


CF/SLI doesn't work with Vulkan/DX12.

Explicit multi-adapter support on the other hand should be supported in a future version of Vulkan.


----------



## Ha-Nocri

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Yeah because all companies what work with AMD will have the latest API.


Huh? It's been officially confirmed. Both DX12 and Async. They didn't say will those be available from day 1 or not tho.

http://radeon.com/radeon-civ6-dx12/?sf30852881=1


----------



## OneB1t

civ5 had mantle which is nearly same thing


----------



## ZealotKi11er

What kind of CFX mode did BF4 use under Mantle? AFR?


----------



## kaosstar

Very impressive Fury X showing. This is not an isolated incident, as we've seen impressive DX12 performance as well.

I very recently bought an Nvidia card for the first time in a decade. It makes me think I may have switched to the green team at a bad time.


----------



## scorch062

DF did a video on the topic, but because of Vulkan they could not capture the numbers as they usually do.


----------



## doritos93

the 7870 i'm using right now performs worse using vulkan compared to opengl, any ideas why?

couldve been a bug, but even the menus slowed down to like 7fps


----------



## Potatolisk

Quote:


> Originally Posted by *ZealotKi11er*
> 
> What kind of CFX mode did BF4 use under Mantle? AFR?


Afaik AFR. SFR was only in Civ with mantle.


----------



## Evil Penguin

Quote:


> Originally Posted by *doritos93*
> 
> the 7870 i'm using right now performs worse using vulkan compared to opengl, any ideas why?
> 
> couldve been a bug, but even the menus slowed down to like 7fps


Latest drivers installed (beta)?


----------



## 364901

Quote:


> Originally Posted by *OneB1t*
> 
> civ5 had mantle which is nearly same thing


That was Beyond Earth, actually. Civ V was D3D11, but had early implementations of something approaching Async compute thanks to the use of compute shaders for decompressing textures. GCN was particularly good at that.

Quote:


> Originally Posted by *ZealotKi11er*
> 
> What kind of CFX mode did BF4 use under Mantle? AFR?


Explicit multiadapter, more or less. It wasn't called that specifically, and Johann Anderson was very cagey about how exactly they were doing it, but the descriptions used back then matches what we've come to know as EMA. AFR was also the default rendering mode. The only Mantle title that didn't use AFR was Civ:Beyond Earth, which made use of split-frame rendering to reduce latency.


----------



## doritos93

Quote:


> Originally Posted by *Evil Penguin*
> 
> Latest drivers installed (beta)?


Ah.. didnt check which version I'm at.. thanks


----------



## st0necold

Quote:


> Originally Posted by *provost*
> 
> If you are expecting someone to make you feel better about about doubling down on the now obsolete 980 Ti, then you are out of luck. Nvidia cards become EOL, from a driver and software optimization support perspective, the moment Nvidia gets ready to release a new GPU. It's as simple as that. The old cards will continue to function well enough, but don't expect any additional feature support, or driver improvements.
> 
> From my perspective, it's good to see Fury getting a performance boost in Vulkan and hopefully in DX12 going forward. I will accept free performance greatfully....lol


The 980ti's are far from "obsolete"..

The replacement for them has not even dropped yet. The 1080 is the 980's upgrade.. 1070 is the 970 upgrade.. The Ti, and Titan have yet to even be revealed. 2 1080's right now compared to 2 980ti's might give 10-20fps on fraps but in reality your not going to see a single thing different unless your running fraps.

When the Titan or 1080ti comes out then the 980ti will be obsolete.. The 1070 barely does better... c'mon man.


----------



## st0necold

dx12 blows


----------



## formula m

Quote:


> Originally Posted by *st0necold*
> 
> The 980ti's are far from "obsolete"..
> 
> The replacement for them has not even dropped yet. The 1080 is the 980's upgrade.. 1070 is the 970 upgrade.. The Ti, and Titan have yet to even be revealed. 2 1080's right now compared to 2 980ti's might give 10-20fps on fraps but in reality your not going to see a single thing different unless your running fraps.
> 
> When the Titan or 1080ti comes out then the 980ti will be obsolete.. The 1070 barely does better... c'mon man.


He meant the 980ti's tech is obsolete..


----------



## L36

Quote:


> Originally Posted by *st0necold*
> 
> The 980ti's are far from "obsolete"..
> 
> The replacement for them has not even dropped yet. The 1080 is the 980's upgrade.. 1070 is the 970 upgrade.. The Ti, and Titan have yet to even be revealed. 2 1080's right now compared to 2 980ti's might give 10-20fps on fraps but in reality your not going to see a single thing different unless your running fraps.
> 
> When the Titan or 1080ti comes out then the 980ti will be obsolete.. The 1070 barely does better... c'mon man.


The architecture that is based on is obsolete. It's only strength over the 1070 is that it's a massive chip with way more resources.

As someone pointed out in this thread, once Nvidia releases a GPU based on new architecture, GPUs based on older architectures become EOL interms of driver optimizations and features.


----------



## ChevChelios

Quote:


> Originally Posted by *Evil Penguin*
> 
> CF/*SLI* doesn't work with Vulkan/*DX12.*


i dont know about all DX12, but SLI certainly works on RotR DX12 as of the latest patch


----------



## sinholueiro

Quote:


> Originally Posted by *st0necold*
> 
> The 980ti's are far from "obsolete"..
> 
> The replacement for them has not even dropped yet. *The 1080 is the 980's upgrade.. 1070 is the 970 upgrade..* The Ti, and Titan have yet to even be revealed. 2 1080's right now compared to 2 980ti's might give 10-20fps on fraps but in reality your not going to see a single thing different unless your running fraps.
> 
> When the Titan or 1080ti comes out then the 980ti will be obsolete.. The 1070 barely does better... c'mon man.


Well, that's what everyone was thinking, but the think is that as the prices go, a 1070 is around 500€-525€ and the 970 was 330€, so when I go to the store, I see the 1070 as the replace of the 980 (550€ at release). The 980Ti was 650-675€ at release, so if I see the 1080 at 750€, I see it as the replacement....


----------



## Yvese

AMD is killing it. This is what happens when software can take advantage of the hardware, though AMD is at fault for DX11 performance compared to Nvidia. DX11 is the past though. We're getting nothing but DX12/Vulkan titles from here on out which is a huge boon for AMD.
Quote:


> Originally Posted by *TopicClocker*
> 
> The Fury X is destroying it, that's an amazing 50-60% performance boost at 1080p and 1440p!
> 
> I wonder when a new driver will be pushed out for the NVIDIA GPUs?


Nvidia promised Async drivers like a year ago. If they're not out now it never will be. It was obviously just a PR move that people forgot about.


----------



## ChevChelios

Quote:


> We're getting nothing but DX12/Vulkan titles from here on ou


probably 1-2 years until the next AAA Vulkan title

DX12 patches/modes will slowly improve and increase in amount, and turn into more built from the ground up DX12 games eventually

until then though still plenty of DX11 to go around


----------



## ZealotKi11er

Quote:


> Originally Posted by *Yvese*
> 
> AMD is killing it. This is what happens when software can take advantage of the hardware, though AMD is at fault for DX11 performance compared to Nvidia. DX11 is the past though. We're getting nothing but DX12/Vulkan titles from here on out which is a huge boon for AMD.
> Nvidia promised Async drivers like a year ago. If they're not out now it never will be. It was obviously just a PR move that people forgot about.


I do not think ASync is as big of a deal as people make it out to be. Also Nvidia is not doing anything wrong fro DX12/Vulkan. The only thing that has changes is DX12/Vulkan basically unlock GCN architecture more. Same thing with ASync. Nvidia Architecture already goes full tilt since day 1 with DX11 and OpenGL. The only bad thing here for customers is that we mostly pay for performance when we buy GPUs. We pay for the performance at the time the cards come out. Most Tittles that tested 980 Ti 1 year ago where DX11. Even now for 1080 and 1070 are DX11. This might make people pay more in relation to AMD which could potentially offer that performance for cheaper but since they do not have a faster card Nvidia can still sell the new Pascal GPUs at the given prices.


----------



## formula m

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I do not think ASync is as big of a deal as people make it out to be. Also Nvidia is not doing anything wrong fro DX12/Vulkan. The only thing that has changes is DX12/Vulkan basically unlock GCN architecture more. Same thing with ASync. Nvidia Architecture already goes full tilt since day 1 with DX11 and OpenGL. The only bad thing here for customers is that we mostly pay for performance when we buy GPUs. We pay for the performance at the time the cards come out. Most Tittles that tested 980 Ti 1 year ago where DX11. Even now for 1080 and 1070 are DX11. This might make people pay more in relation to AMD which could potentially offer that performance for cheaper but since they do not have a faster card Nvidia can still sell the new Pascal GPUs at the given prices.


Correct^

Vulkan made more use of AMD's async abilities, than did with NVidia's pseudo-async.

And again, DX11 titles must die already... honestly who is still designing for windows 8? DX11 is a pointless discussion, nobody is buying a card today, worried about yesterday's games.


----------



## ChevChelios

Quote:


> And again, DX11 titles must die already... honestly who is still designing for windows 8? DX11 is a pointless discussion, nobody is buying a card today, worried about yesterday's games.


they are worried about present games which are still majorly DX11

and they have to design for Win 8 & 7 since there are still more ppl on those than on Win10


----------



## kyrie74

Quote:


> Originally Posted by *ChevChelios*
> 
> they are worried about present games which are still majorly DX11
> 
> and they have to design for Win 8 & 7 since there are still more ppl on those than on Win10


According to this Windows 10 64bit sits at 42.94% of Steam systems surveyed.


----------



## ZealotKi11er

Quote:


> Originally Posted by *kyrie74*
> 
> According to this Windows 10 64bit sits at 42.94% of Steam systems surveyed.


I do not trust it.

AMD Radeon HD 8800 Series 1.38%

What the hell is even that card lol. I know it OEM but it's almost us much as 7900 Series. Steam users do not represent OCN at all.


----------



## ChevChelios

Quote:


> Originally Posted by *kyrie74*
> 
> According to this Windows 10 64bit sits at 42.94% of Steam systems surveyed.


but if you put all the non-Win 10 OS together (aka all non-DX12 OS together) then they are still majority

and Win10 free upgrade will end in a few weeks too


----------



## Kpjoslee

Quote:


> Originally Posted by *ZealotKi11er*
> 
> I do not trust it.
> 
> AMD Radeon HD 8800 Series 1.38%
> 
> What the hell is even that card lol. I know it OEM but it's almost us much as 7900 Series. Steam users do not represent OCN at all.


Steam survey is much better than any other survey to figure out who runs what on rigs that plays games.


----------



## criminal

Good showing for the Fury X.


----------



## pengs

Quote:


> Originally Posted by *kyrie74*
> 
> According to this Windows 10 64bit sits at 42.94% of Steam systems surveyed.


I'm seeing 45%

There goes 7 and everything under it...


----------



## ToTheSun!

Quote:


> Originally Posted by *criminal*
> 
> Good showing for the Fury X.


What the Fury X did to that poor 1070 was criminal.


----------



## magnek

Quote:


> Originally Posted by *ToTheSun!*
> 
> What the Fury X did to that poor 1070 was criminal.


But the real criminal did the 1070 instead of the Fury X.


----------



## ZealotKi11er

Quote:


> Originally Posted by *magnek*
> 
> But the real criminal did the 1070 instead of the Fury X.


1070 is $380 MSRP, uses less power, has 8GB. I think its a better card then Fury X. If Fury X has 8GB HMB1 it would have been a 290X card but because it's 4GB it can never take that spot.


----------



## criminal

Quote:


> Originally Posted by *ToTheSun!*
> 
> What the Fury X did to that poor 1070 was criminal.











Quote:


> Originally Posted by *magnek*
> 
> But the real criminal did the 1070 instead of the Fury X.











Quote:


> Originally Posted by *ZealotKi11er*
> 
> 1070 is $380 MSRP, uses less power, has 8GB. I think its a better card then Fury X. If Fury X has 8GB HMB1 it would have been a 290X card but because it's 4GB it can never take that spot.


Not to mention it was still $549+ when I got my 1070.

Plus I don't plan on playing Doom. I buy the card that best for the games I currently play. Buying a Fury X after selling my 980 would have gained me maybe 5-10% performance improvement (if any) in those games. Glad people who bought the Fury X for $650 are starting to get some great results. To bad for them the card can be had for $399 now though.


----------



## infranoia

Quote:


> Originally Posted by *ZealotKi11er*
> 
> 1070 is $380 MSRP, uses less power, has 8GB. I think its a better card then Fury X. If Fury X has 8GB HMB1 it would have been a 290X card but because it's 4GB it can never take that spot.


And some of us are still, after two years, having to use our 4K sets as 1080p sets, thanks to HDMI 1.4. The so-called adapters are garbage. So yeah, I'm still miffed that AMD didn't consider Fury to be a 2.0 part.

Water under the bridge now, I guess. But it still keeps me away from cheap Furies.


----------



## pengs

Here comes another one...




A look at the efficiency difference at 7 minutes in. 7-8% more efficient than the 970, 33% more efficient when ran with Vulkan/async meaning if you vsync or lock your frame rate you should (crudely) see a 33% reduction in power, otherwise the higher frame rate.


----------



## Levys

Quote:


> Originally Posted by *magnek*
> 
> So AMD developed Vulkan. Now where are all the guys that swore up and down that AMD had nothing to do with DX12 or Vulkan.


Was thinking just the same...all those posts swearing it was Cronos group tech sponsored by Nvidia.

I don't get why people have bin bashing the new api direction AMD has gone since mantle....
AMD got it right by investing in GCN and new api's.
DX 12 has (is ) taken way to long to slip trough the cracks.
Vulkan should be the future, ho can really be against a better optimized api that works on all platforms.
It benefit's us all.
You can make gpu's as powerful as what, but without an efficient way (api) to channel that power.
it's just a very inefficient process .

I'm just happy that because of these technology's my R9 390X will last me a while longer.


----------



## kyrie74

Quote:


> Originally Posted by *pengs*
> 
> I'm seeing 45%
> 
> There goes 7 and everything under it...


Nice catch, I was only going by the front page stats at the bottom where it shows the top OS.


----------



## OneB1t

from that 30% of win 7 pcs prolly only 10-15% will have capability to run DX12 games


----------



## Glottis

Eurogamer Digital Foundry's results are a bit more down to earth. No questionable surreal Fury X lead. Once Vulcan is enabled, Fury X reaches 980Ti performance levels, as it should.


----------



## michaelius

Quote:


> Originally Posted by *formula m*
> 
> Correct^
> 
> Vulkan made more use of AMD's async abilities, than did with NVidia's pseudo-async.
> 
> And again, DX11 titles must die already... honestly who is still designing for windows 8? DX11 is a pointless discussion, nobody is buying a card today, worried about yesterday's games.


I'm going to buy big Pascal for dx11 performance on windows 8 so just because you are drinking dx12 kool-aid it doesn't mean everyone else is.


----------



## EightDee8D

Quote:


> Originally Posted by *Glottis*
> 
> Eurogamer Digital Foundry's results are a bit more down to earth. No questionable surreal Fury X lead. Once Vulcan is enabled, Fury X *Beats* 980Ti/1070 by 7.5%.


FTFY


----------



## Glottis

Quote:


> Originally Posted by *EightDee8D*
> 
> FTFY


yes 7.5% and what's you point? that's down from outrageous 35% fury x lead over 980ti that computerbase.de claimed and then all websites regurgitated like sheeple without doing any research whatsoever. now that more and more sources tested themselves no one can reproduce computerbase results.

here's another one.


----------



## OneB1t

so 780ti on half of 290X performance even in this bad case scenario?







nicely done NGREEDIA


----------



## EightDee8D

Quote:


> Originally Posted by *Glottis*
> 
> yes 7.5% and what's you point?


That 7.5% isn't same as having same performance. which is what you said on the benchmark you posted before, which is false.
different systems different gains. but huge gains nonetheless.

now go find millions of other benches lol, funny to see this







.

pro tip - this is what going to happen with future games. unless nvidia sponsors something.


----------



## FLCLimax

Quote:


> Originally Posted by *Glottis*
> 
> Eurogamer Digital Foundry's results are a bit more down to earth. No questionable surreal Fury X lead. Once Vulcan is enabled, Fury X reaches 980Ti performance levels, as it should.


He took minimums at a single point, not averages. Best case for nvidia.


----------



## FLCLimax

Quote:


> Originally Posted by *Glottis*
> 
> Quote:
> 
> 
> 
> Originally Posted by *EightDee8D*
> 
> FTFY
> 
> 
> 
> yes 7.5% and what's you point? that's down from outrageous 35% fury x lead over 980ti that computerbase.de claimed and then all websites regurgitated like sheeple without doing any research whatsoever. now that more and more sources tested themselves no one can reproduce computerbase results.
> 
> here's another one.
Click to expand...

No async probably.


----------



## Glottis

Quote:


> Originally Posted by *EightDee8D*
> 
> That 7.5% isn't same as having same performance. which is what you said on the benchmark you posted before, which is false.
> different systems different gains. but huge gains nonetheless.
> 
> now go find millions of other benches lol, funny to see this
> 
> 
> 
> 
> 
> 
> 
> .
> 
> pro tip - this is what going to happen with future games. unless nvidia sponsors something.


haha now you are worried and started to scapegoat when more clear as day evidence start to pour in that something is very fishy with computerbase.de. i won't be looking for any more benchmarks tonight as i have to go to sleep and rest up before work tomorrow, but you are free to find at least one benchmark that collaborates with computerbase.de findings.


----------



## flippin_waffles

these latest results are without Async, so it may seem down to earth but the amazing results in the OP are accurate due to the additional use ot Async.


----------



## EightDee8D

Quote:


> Originally Posted by *Glottis*
> 
> haha now you are worried and started to scapegoat when more clear as day evidence start to pour in that something is very fishy with computerbase.de. i won't be looking for any more benchmarks tonight as i have to go to sleep and rest up before work tomorrow, but you are free to find at least one benchmark that collaborates with computerbase.de findings.


Find benchmarks with tssaa as df and computerbase.de did than you will see those gains, because noaa/tssaa = async enabled means higher gains. gamegpu use smaa which doesn't benefit from async that's why their fps are lower. there i explained you. now go and hide lol

Source - https://twitter.com/idSoftwareTiago/status/752590016988082180


----------



## infranoia

Quote:


> Originally Posted by *Glottis*
> 
> ...you are free to find at least one benchmark that collaborates with computerbase.de findings.


I have to think their Fiji average of 160 FPS is about right, since Hawaii isn't far behind.


Spoiler: Warning: Spoiler!


----------



## OneB1t

"@idSoftwareTiago we will support async for other AA modes also on a later update"


----------



## Glottis

Quote:


> Originally Posted by *flippin_waffles*
> 
> these latest results are without Async, so it may seem down to earth but the amazing results in the OP are accurate due to the additional use ot Async.


not true. digital foundry results use async (tssaa enabled as they clearly state)



digital foundry difference between 980ti and fury x is 7.5%

computerbase.de difference between 980ti and fury x is ~35%

that was my point. i'm wondering which publication is lying?


----------



## OneB1t

Test scene maybe?


----------



## EightDee8D

Quote:


> Originally Posted by *Glottis*
> 
> not true. digital foundry results use async (tssaa enabled as they clearly state)
> 
> 
> 
> digital foundry difference between 980ti and fury x is 7.5%
> 
> computerbase.de difference between 980ti and fury x is ~35%
> 
> that was my point. i'm wondering which publication is lying?


Different maps, different amount of aa maybe ? funny to see a website getting a blame for something fishy where other times they are used by nvidia fans. loool


----------



## flippin_waffles

Quote:


> Originally Posted by *Glottis*
> 
> not true. digital foundry results use async (tssaa enabled as they clearly state)
> 
> 
> 
> digital foundry difference between 980ti and fury x is 7.5%
> 
> computerbase.de difference between 980ti and fury x is ~35%
> 
> that was my point. i'm wondering which publication is lying?


Gamegpu is not. So add the performance boost from async to those scores and you have scores that look like computerbase. It looks like the results.you are favoring are the odd ones out.

Not that it matters much as it Vulkan has just shot AMD's already world leading perf/$ through the roof with any of.these results.


----------



## Remij

Quote:


> Originally Posted by *EightDee8D*
> 
> Different maps, different amount of aa maybe ? funny to see a website getting a blame for something fishy where other times they are used by nvidia fans. loool


The guy is making a fair assumption basing it off the fact that other review sites don't show the same discrepancy in their results. It's the AMD fans running up yelling that Async isn't active.. or they used these settings and not these ones.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Glottis*
> 
> not true. digital foundry results use async (tssaa enabled as they clearly state)
> 
> 
> 
> digital foundry difference between 980ti and fury x is 7.5%
> 
> computerbase.de difference between 980ti and fury x is ~35%
> 
> that was my point. i'm wondering which publication is lying?


Before you try to push the DF results as absolute fact in order to find a conspiracy, here's some quotes from their article.
Quote:


> Our initial tests suggest anything from a 30 to 40 per cent increase in gaming performance for Radeon users but these are rough, initial numbers. *It could actually be higher*.


Quote:


> There's just one problem here - there is no support for FCAT right now in Doom itself or via Vulkan in general, while the game's OSD cumulative GPU render time average didn't seem to work for us on AMD hardware. To get some numbers together, *we used a very simple approach* - to visit three very different scenes and to measure the performance differential across a range of GPUs.


Quote:


> *It can only be considered as a very basic way to judge the potential differential*


Quote:


> *And we should stress again* that we've only tested here on a small selection of relatively light scenes. What's clear is that AMD's CPU utilisation has dropped significantly, *so there may be even bigger gains in more action-packed scenes*.


Their wording implies they're not 100% confident in their results.


----------



## EightDee8D

Quote:


> Originally Posted by *Remij*
> 
> The guy is making a fair assumption basing it off the fact that other review sites don't show the same discrepancy in their results. It's the AMD fans running up yelling that Async isn't active.. or they used these settings and not these ones.


And both are right. his assumption and that those websites are using different settings. but what i find funny here is this kind of assumptions doesn't show every time. but only when nvidia looses.


----------



## Glottis

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Before you try to push the DF results as absolute fact in order to find a conspiracy, here's some quotes from their article.
> 
> Their wording implies they're not 100% confident in their results.


and let me guess, computerbase.de results ofcourse are 100% confident. if you bothered to read into what they aren't confident about is because they couldn't use FCAT measure, because it's not yet available for vulcant. but you know what, neither could computerbase.de


----------



## GorillaSceptre

Quote:


> Originally Posted by *Glottis*
> 
> and let me guess, computerbase.de results ofcourse are 100% confident


No, but infranoia's results and tons of others all over the web are saying they are getting gains closer to theirs/exceeding them.


----------



## Remij

Quote:


> Originally Posted by *EightDee8D*
> 
> And both are right. his assumption and that those websites are using different settings. but what i find funny here is this kind of assumptions doesn't show every time. but only when nvidia looses.


I hear ya.

But you and I have differing opinions of Nvidia losing. That's not what I see in that graph.


----------



## magnek

I like how GameGPU results either get promoted or summarily dismissed depending on how convenient their results are to the intended narrative.


----------



## infranoia

Quote:


> Originally Posted by *magnek*
> 
> I like how GameGPU results either get promoted or summarily dismissed depending on how convenient their results are to the intended narrative.


If the point of Doom Vulkan is Async Compute, then they're not valid to the test.

If the point of Doom Vulkan is to test Vulkan baseline without Async Compute, perhaps to 'level the field' against Nvidia, then sure, they're perfectly valid.

That is, until id 'fixes' the error and enables Async Compute on the other AA modes, as they said they would. Then what? Wait for Nvidia to come up with that mythical AC driver before declaring any AC benchmark valid? Gameworks indeed.

I am really very, very curious how 3dmark is going to handle this.


----------



## scorch062

BTW, did not one pick up in this chart that 370, even without async, beats 780ti? That is pretty embarrassing.


----------



## EightDee8D

Quote:


> Originally Posted by *scorch062*
> 
> 
> 
> BTW, did not one pick up in this chart that 370, even without async, beats 780ti? That is pretty embarrassing.


That's premium novidya for you. lol


----------



## FlawleZ

I gained almost 30 FPS on average with my overclocked 7950. At 1080P ultra settings 8x AA I'm averaging 90 FPS


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> and let me guess, computerbase.de results ofcourse are 100% confident. if you bothered to read into what they aren't confident about is because they couldn't use FCAT measure, because it's not yet available for vulcant. but you know what, neither could computerbase.de


One is using TSAA and the other is not.

TSSAA enables Asynhronous compute. Oddly enough... you get better performance once TSSAA is enabled.

Basically... all those games coming out with Async Compute enabled are going to be quite interesting. Then again... I cannot say that I am surprised seeing as I have been banging on this drum for some time.

Now imagine an aftermarket slightly Overclocked RX 480? It would match a GTX 1070 in this title and for half the price.



As I had stated a few times before... one of the advantages that Vulkan has over DX12 is that Vulkan does not rely on API side synching between contexts. The hardware handles the synching if it supports multi-threaded rendering.

As for nVIDIAs mythical driver... where is the Maxwell driver that nVIDIA said would enabled Async Compute? We are to believe that Pascal will get such a driver now as well?


----------



## jtom320

Quote:


> Originally Posted by *FlawleZ*
> 
> I gained almost 30 FPS on average with my overclocked 7950. At 1080P ultra settings 8x AA I'm averaging 90 FPS


The benchmark above you directly contradicts this claim.

Not saying you are lying it's just funny how widely inconsistent the claims made by people here and benchmarkers themselves are here with this.

Anyway. Glad I have a 1080.


----------



## scorch062

Quote:


> Originally Posted by *Mahigan*
> 
> As for nVIDIAs mythical driver... where is the Maxwell driver that nVIDIA said would enabled Async Compute? We are to believe that Pascal will get such a driver now as well?


Did Nvidia even made a statement that they are working to resolve this issue? I only remember id Software saying that they asked Nvidia to enable Async.


----------



## daviejams

Quote:


> Originally Posted by *jtom320*
> 
> The benchmark above you directly contradicts this claim.
> 
> Not saying you are lying it's just funny how widely inconsistent the claims made by people here and benchmarkers themselves are here with this.
> 
> Anyway. Glad I have a 1080.


It depends on where people are measuring the frame rate. If it's looking at a wall it's going to be a really large number compared to the frame rate when it's all kicking off

The Eurogamer article on the patch was pretty good and really does show the gains for AMD. I tried the game myself after the patch with my 290x and it really is night and day. Massive difference


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> One is using TSAA and the other is not.
> 
> TSSAA enables Asynhronous compute. Oddly enough... you get better performance once TSSAA is enabled.


nope, Eurogamer DF benchmark is using TSSAA with Async.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> nope, Eurogamer DF benchmark is using TSSAA with Async.


Yes... and their benchmark run was a tiny run. They stated that their benchmark run could not be considered as being the definitive performance seeing as they did not have the proper tools to really capture the performance. Worth mentioning that the scene being used under the benchmark run would also make a difference.

There might also be a CPU difference here as well.

Either way... what we do know is that AMD gain a heck of a lot of performance under Vulkan with Async Compute.


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> Yes... and their benchmark run was a tiny run. They stated that their benchmark run could not be considered as being the definitive performance seeing as they did not have the proper tools to really capture the performance.
> 
> There might also be a CPU difference here as well.
> 
> Either way... what we do know is that AMD gain a heck of a lot of performance under Vulkan with Async Compute.


how long was computerbase benchmark run, and why can it be considered, because it's convenient for your argument?

BY THE WAY, using TSSAA only enables Async for AMD but not yet for Nvidia.

"*Currently asynchronous compute is only supported on AMD GPUs and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.*"


----------



## CriticalHit

Quote:


> Originally Posted by *Glottis*
> 
> yes 7.5% and what's you point? that's down from outrageous 35% fury x lead over 980ti that computerbase.de claimed and then all websites regurgitated like sheeple without doing any research whatsoever. now that more and more sources tested themselves no one can reproduce computerbase results.
> 
> here's another one.


to think i almost went 780 instead of 290x when upgrading my PC a few years ago ( though i didnt think about it long ) .
went with 3x 290x. Bring on more Vulkan i say !







... after 3-4 years the single card is still relevant for high end gaming, and still 2 more sitting next to it


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> how long was computerbase benchmark run, and why can it be considered, because it's convenient for your argument?
> 
> BY THE WAY, using TSSAA only enables Async for AMD but not yet for Nvidia.
> 
> "*Currently asynchronous compute is only supported on AMD GPUs and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.*"


"As for nVIDIAs mythical driver... where is the Maxwell driver that nVIDIA said would enabled Async Compute? We are to believe that Pascal will get such a driver now as well?"

Oh and Pascal does not support Asynchronous Compute + Graphics. What Pascal supports is improved pre-emption coupled with Dynamic Load Balancing. Basically... attempting to run async compute + graphics will cut the Pascal GPU into little pieces (figuratively). What I mean is that the Pascal GPU is separated into GPC clusters. One cluster will handle the Graphics and an adjacent cluster will handle the compute. While one cluster is handling graphics... it cannot handle compute in parallel and vice versa. So if the GPCs are already busy then you will not get a performance boost (as seen under AotS at higher resolutions). At lower resolutions you might get a slight boost (tiny).

I will not hold my breath waiting on an nVIDIA driver to enable this feature (because you need a driver and code to enable it).

By this time next year... the FuryX may in fact be faster than a GTX 1070 in most of the newer titles... as has been the case for AMD GPUs for quite some time now. Even an R9 390x will be giving the GTX 1070 a run for its money as more and more titles optimize for these new APIs (DX12 and Vulkan).

This is sort of the trend since Tahiti.


----------



## Potatolisk

Is there a hardware reason why Kepler is doing so poor? Or is it just that nVidia doesn't care anymore?


----------



## OneB1t

both


----------



## Glottis

this forum is funny. when some benchmarks didn't use Async enabling TSSAA setting for AMD poeple cried fault. but apparently for Nvidia Async doesn't work yet in DOOM, no one cares to even mention that until I just found that out. doesn't matter if it gives only little performance for nvidia, benchmarks should still be invalidated and recalculated when GPUs from both brands run under same settings!


----------



## OneB1t

problem is that nvidia cant run async







because their hardware lacks parts that AMD have as standart 4 years ago


----------



## Glottis

Quote:


> Originally Posted by *OneB1t*
> 
> problem is that nvidia cant run async
> 
> 
> 
> 
> 
> 
> 
> because their hardware lacks parts that AMD have as standart 4 years ago


except where Pascal has pretty good gains in DX12 with Async. funny how you accept that benchmarks not run under same identical circumstances are OK. if it was other way around I bet you would have different stance.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> this forum is funny. when some benchmarks didn't use Async enabling TSSAA setting for AMD poeple cried fault. but apparently for Nvidia Async doesn't work yet in DOOM, no one cares to even mention that until I just found that out. doesn't matter if it gives only little performance for nvidia, benchmarks should still be invalidated and recalculated when GPUs from both brands run under same settings!


I disagree... because if the driver is never released (as was the case with Maxwell) then people go on faith alone thinking that a driver release from nVIDIA is imminent when in reality... it will never happen.

The benchmarks are not invalidated... where they invalidated due to the fact that the AMD cards were running an older version/path of OpenGL than the nVIDIA cards? I did not see you making that argument then.

For now... this is the reality under Doom. AMD Radeons perform as shown here. If nVIDIA release a driver and the Doom developers release a patch on their end then we can re-evaluate but until then... this is where we are at... just as prior the Radeon poor showings is what every review site showed under Doom and nobody was complaining then.

As for "good gains" under AotS for Pascal using Async-compute + graphics...

Not really...






What we are mostly seeing is the pre-emption and load balancing being used to negate the performance losses we saw with Maxwell. What we also see is that under Extreme or High preset there can be a slight performance boost due to the fact that the GPCs are not overly busy but once we move to the crazy preset (or towards 4K) the GPCs are overloaded and thus we do not get a boost.


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> I disagree... because if the driver is never released (as was the case with Maxwell) then people go on faith alone thinking that a driver release from nVIDIA is imminent when in reality... it will never happen.
> 
> The benchmarks are not invalidated... where they invalidated due to the fact that the AMD cards were running an older version/path of OpenGL than the nVIDIA cards? I did not see you making that argument then.
> 
> For now... this is the reality under Doom. AMD Radeons perform as shown here. If nVIDIA release a driver and the Doom developers release a patch on their end then we can re-evaluate but until then... this is where we are at... just as prior the Radeon poor showings is what every review site showed under Doom and nobody was complaining then.


so this about some childish revenge for you. you don't actually care about fair results. and by the way, plenty people complained about OpenGL AMD, it's just there was nothing to do about it, as OpenGL 4.5 wasn't happening for AMD, but Bethesda and Nvidia ARE working on Async for Vulcan. maybe it will only give 1-2% boost, that's still important, and would mean that digital foundrys furyx 7.5% lead would shrink to 5% or even less!


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> so this about some childish revenge for you. you don't actually care about fair results. and by the way, plenty people complained about OpenGL AMD, it's just there was nothing to do about it, as OpenGL 4.5 wasn't happening for AMD, but Bethesda and Nvidia ARE working on Async for Vulcan. maybe it will only give 1-2% boost, that's still important, and would mean that digital foundrys furyx 7.5% lead would shrink to 5% or even less!


Nothing I said could be construed as being about "childish revenge". I accepted the Doom figures back then and I accept them now.

As for nVIDIA working with Bethesda in order to implement Async Compute + Graphics (or in their case pre-emption + Dynamic load balancing) I will believe it when I see it. I wish I could say otherwise but I am still waiting on the nVIDIA MAxwell drivers which enable Async Compute.


----------



## OneB1t

1. digital foundry results are not correct their fury x card is prolly not working as should for some reason (thermal throttling, power limit etc..)
2. 1-2% boost can be achieved just by optimalizing current build so thats probably what is nvidia doing now







no async for them as they missing HW for that


----------



## provost

I haven't been keeping up with this thread, but just wanted to share that I ran doom briefly on Fury and the in game fps vulkan counter, or whatever it's called, shows results similar to the one in OP. It's also very smooth. If this trend continues, we will finally have something to cheer about Vis a Vis PC ports and general PC gaming, regardless of the brand. As I am sure Nvidia will catch up sometime with their next arch, as it relates to async compute, etc.


----------



## GoLDii3

Quote:


> Originally Posted by *Glottis*
> 
> except where Pascal has pretty good gains in DX12 with Async. funny how you accept that benchmarks not run under same identical circumstances are OK. if it was other way around I bet you would have different stance.


Pascal will never support async compute like AMD,it has been said millions of times.

Whatever gains it had on DX12 it certainly has nothing to do with Async,and there's plenty of proof.

You sound like some kid crying because his brother got ice cream and he didn't.


----------



## infranoia

Way too much babywhining. If DX12 / Vulkan gives AMD an advantage by supporting hardware that Nvidia does not have, it's no more a crime than DX11 and OpenGL playing to Nvidia's serial architectural strengths and punishing AMD's.

Where was your righteous indignation then? Instead all I've ever heard on OCN is how AMD dropped the ball.

This industry needs more parity. Sounds like that's starting to happen, and I'm glad for it. Architectures can now be judged on hardware capability, not software limitations.


----------



## Noobism

Quote:


> Originally Posted by *Glottis*
> 
> this forum is funny. when some benchmarks didn't use Async enabling TSSAA setting for AMD poeple cried fault. but apparently for Nvidia Async doesn't work yet in DOOM, no one cares to even mention that until I just found that out. doesn't matter if it gives only little performance for nvidia, benchmarks should still be invalidated and recalculated when GPUs from both brands run under same settings!


So because one card does not have async and the other does, it should be called invalid? This is a *HARDWARE* limitation, get it? Nvidia made the choice to not include this in their own card, where AMD did. So your point doesn't hold water.


----------



## magnek

Quote:


> Originally Posted by *Glottis*
> 
> except where Pascal has pretty good gains in DX12 with Async. funny how you accept that benchmarks not run under same identical circumstances are OK. if it was other way around I bet you would have different stance.


*2.8%* _in the best case scenario_ is "pretty good gains" now.















Quote:


> Originally Posted by *Glottis*
> 
> so this about some childish revenge for you. you don't actually care about fair results. and by the way, plenty people complained about OpenGL AMD, it's just there was nothing to do about it, as OpenGL 4.5 wasn't happening for AMD, but Bethesda and Nvidia ARE working on Async for Vulcan. maybe it will only give 1-2% boost, that's still important, and would mean that digital foundrys furyx 7.5% lead would shrink to 5% or even less!


You keep harping on other people for using Computerbase.de's result because "it's convenient for [their] argument". Well you're doing the exact same thing by only pointing to Eurogamer's result. Irony is you don't even seem to realize you're doing it.


----------



## Glottis

Quote:


> Originally Posted by *magnek*
> 
> *2.8%* _in the best case scenario_ is "pretty good gains" now.


so nvidia should gimp OGL performance so that Vulcan gains percentage appears higher?







Quote:


> Originally Posted by *magnek*
> 
> You keep harping on other people for using Computerbase.de's result because "it's convenient for [their] argument". Well you're doing the exact same thing by only pointing to Eurogamer's result. Irony is you don't even seem to realize you're doing it.


i'm pointing to Eurogamer results because i have a GTX980Ti and my avg fps is inline with their findings, not computerbase, which are weirdly very low for all geforce cards.


----------



## poii

Quote:


> Originally Posted by *Glottis*
> 
> so nvidia should gimp OGL performance so that Vulcan gains percentage appears higher?


He's talking about Async compute gains in AotS (aka difference DX12 with and without Async compute). Both results are better than DX11 just in case you try to make an argument about that, too.


----------



## magnek

Quote:


> Originally Posted by *Glottis*
> 
> so nvidia should gimp OGL performance so that Vulcan gains percentage appears higher?
> 
> 
> 
> 
> 
> 
> 
> 
> i'm pointing to Eurogamer results because i have a GTX980Ti and my avg fps is inline with their findings, not computerbase, which are weirdly very low for all geforce cards.


You have GOT TO BE kidding me.
Quote:


> Originally Posted by *Glottis*
> 
> except where Pascal has *pretty good gains in DX12 with Async*. funny how you accept that benchmarks not run under same identical circumstances are OK. if it was other way around I bet you would have different stance.


Async = no OpenGL gimping needed. Now please stop moving the goalposts.

As for Computerbase.de's results, if you're not happy with them, why don't you shoot an email to Wolfgang and Jan-Frederik and ask them for some testing details?


----------



## Glottis

i'll get on that as soon as you prove to me why Eurogamer's results aren't valid


----------



## magnek

i'll get on that as soon as you prove to me why Computerbase.de's results aren't valid









And no, "because I got different results" without knowing if you even properly repeated what Computerbase.de did doesn't count


----------



## Glottis

you ask me to prove it but my results don't count? troll harder please


----------



## flippin_waffles

Quote:


> Originally Posted by *Glottis*
> 
> you ask me to prove it but my results don't count? troll harder please


I think the majority would suggest that he is not the one trolling.


----------



## magnek

Quote:


> Originally Posted by *Glottis*
> 
> you ask me to prove it but my results don't count? troll harder please


EL OH EL

You cannot be serious.

Setting aside the fact that different spots in the game get wildly different frames, you're also saying we should invalidate a review site's result because YOU (singular) got different results, *without knowing any details on how Computerbase.de tested and got their results*.

If you still think I'm trolling after this, consider this conversation done.


----------



## Glottis

Quote:


> Originally Posted by *flippin_waffles*
> 
> I think the majority would suggest that he is not the one trolling.


half of posts in his almost 4000 post history are one line replies bashing Nvidia every chance he gets. yet he owns a 980ti. he seems like a masochistic troll, owning nvidia yet hating it with passion. man, internet is a strange place


----------



## infranoia

Quote:


> Originally Posted by *Glottis*
> 
> half of posts in his almost 4000 post history are one line replies bashing Nvidia every chance he gets. yet he owns a 980ti. he seems like a masochistic troll, owning nvidia yet hating it with passion. man, internet is a strange place


It's a product in a duopoly, and it was the fastest, most efficient part in its time. Fanboy is not a pre-requisite to ownership. You can also buy and use an Intel processor without having to give daily reach-arounds to Intel.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> half of posts in his almost 4000 post history are one line replies bashing Nvidia every chance he gets. yet he owns a 980ti. he seems like a masochistic troll, owning nvidia yet hating it with passion. man, internet is a strange place


I do not think he is bashing nVIDIA. He is admitting to what his eyes are seeing rather than denying what he is seeing. That is an admirable trait.

As for me... I am not invalidating either result. I think they are likely both valid yet tested in different areas of the game and likely using different tools to capture the framerates.

We know two things...

1. computerbase tested the lava opening scene area which is quite demanding while the other result from Digital foundry was not in the same area. Performance varies wildly based on the area due to CPU usage and compute shader usage. Volcano is very hard on the GPU due to the compute shader effects whereas the indoor scenes tend to be hard on the CPU (especially when blowing up oil drums).
2. Both used the same CPU.

There is also this segment added by computerbase.de:
Quote:


> Other systems show different results
> 
> As 12 already shown in the past when changing other games on the low-level API DirectX repeated the gains are highly dependent on the requirements of the selected scene and the system used. Examples deliver Tomb Raider , Hitman and Ashes of the Singularity .
> 
> Also in Doom other testers can certainly achieve considerable gains on graphics cards from Nvidia. Golem.de came on a not overclocked Core i7-6700K in another scene with the GeForce GTX 1080 in Full HD on a growth of 25 percent, a user Forum of 3DCenter.de with an AMD FX-8320e to a growth of nearly 50 percent on the GeForce GTX 970th


So yeah... both results are valid. Volcano was just one of the most demanding areas in the game and AMD seems to be able to eat it up when using Vulkan.


----------



## Imouto

The point of Vulkan and DX12 is removing the need of bloated drivers and day one patches to run games as intended. Both Nvidia and AMD complained about this in the past because developers kept their awful praxis to meet deadlines or achieve some effects.

This only shows how much Nvidia relies on its prowess delivering such fixes and the utter incompetence of AMD.

Nvidia is fine. By the time Vulkan/DX12 matters they will have a card able to fully take advantage of it and some gimmicks making theirs the better cards.

I'm sorry for AMD but these benches don't matter at all. A handful of games in the next two years doesn't make up for worse performance in all the other games.


----------



## caswow

Quote:


> Originally Posted by *Imouto*
> 
> The point of Vulkan and DX12 is removing the need of bloated drivers and day one patches to run games as intended. Both Nvidia and AMD complained about this in the past because developers kept their awful praxis to meet deadlines or achieve some effects.
> 
> This only shows how much Nvidia relies on its prowess delivering such fixes and the utter incompetence of AMD.
> 
> Nvidia is fine. By the time Vulkan/DX12 matters they will have a card able to fully take advantage of it and some gimmicks making theirs the better cards.
> 
> I'm sorry for AMD but these benches don't matter at all. A handful of games in the next two years doesn't make up for worse performance in all the other games.


you make it sound like amd cards are not competetive in other games


----------



## infranoia

Quote:


> Originally Posted by *Imouto*
> 
> I'm sorry for AMD but these benches don't matter at all. A handful of games in the next two years doesn't make up for worse performance in all the other games.


So your argument boils down to "AMD should just stop trying." Yeah, listen-- that's not a rational argument, it's just a sour partisan rant.

Or perhaps you mean to say, "AMD should improve, but on Nvidia's own terms and stop trying to change industry APIs." Again, you're wearing your colors.


----------



## Noufel

At least now with pascal nvidia won't lose perf with dx12/vulkan ( my 1080 is safe for now till i get a vega







) not like maxwell .


----------



## Imouto

Quote:


> Originally Posted by *infranoia*
> 
> So your argument boils down to "AMD should just stop trying." Yeah, listen-- that's not a rational argument, it's just a sour partisan rant.


I've had this conversatiom several times before and every time it came down to:

"Thank you so much for bringing the industry forward AMD but please make some money to stay alive and relevant"


----------



## Mahigan

Quote:


> Originally Posted by *Noufel*
> 
> At least now with pascal nvidia won't lose perf with dx12/vulkan ( my 1080 is safe for now till i get a vega
> 
> 
> 
> 
> 
> 
> 
> ) not like maxwell .


Unless the load is too demanding. With AotS under crazy preset... there is no gain with Async turned on at 1080p on a GTX 1080. This same trend occurs when we look at Extreme preset at 4K on the GTX 1080 where a performance loss occurs.

It all depends on how overloaded the GPCs get.


----------



## infranoia

Quote:


> Originally Posted by *Imouto*
> 
> I've had this conversatiom several times before and every time it came down to:
> 
> "Thank you so much for bringing the industry forward AMD but please make some money to stay alive and relevant"


Well then we agree on that point, for sure. You have to see though that they're moving mountains (APIs) to try to make that happen. Driver updates won't cut it at this point.


----------



## Mahigan

Quote:


> Originally Posted by *Imouto*
> 
> I've had this conversatiom several times before and every time it came down to:
> 
> "Thank you so much for bringing the industry forward AMD but please make some money to stay alive and relevant"


Relevant....

Considering AMD are gaining market share in the PC Desktop segment while dominating the consoles while having both DX12 and Vulkan based on their own Mantle API and leading the way with Developer relations as well as now having arguably the best drivers in the industry then I would say that they are pretty relevant.

If you mean making profits... well why is AMDs stock value on the rise despite their lack of profitability?


I think that profitability will come... for now investors are pushing the stock up due to the relevancy they see in AMD.


----------



## kaosstar

So I keep hearing whispers of a magical async compute driver on the way from Nvidia any day now. Is that just a rumor, or something that's actually happening?


----------



## magnek

Quote:


> Originally Posted by *Glottis*
> 
> half of posts in his almost 4000 post history are one line replies bashing Nvidia every chance he gets. yet he owns a 980ti. he seems like a masochistic troll, owning nvidia yet hating it with passion. man, internet is a strange place


You just can't win with these people, if you own AMD and criticize nVidia you're obviously a salty, jealous AMD owner, but if you own nVidia and criticize nVidia you''re a masochistic troll.









First of all, you committed an ad hominem, so thank you for conceding the argument.

Second, just because I own an nVidia card doesn't mean I have to fawn over them incessantly or kiss the ground that Jen-Hsun walks on. In fact the most loyal fans also tend to be the most vocal critics, because they want to push their company to do better and not become complacent. I simply call a spade a spade, find me just one post where I _*baselessly*_ "bashed" nVidia. But of course for those with pre-conceived notions even the tiniest bit of criticism is perceived as "bashing", so I'm not surprised.

Also, there's a fine distinction between hating the hardware and hating their way of doing business, and I don't usually do one liners unless it's for comedic effect or what I'm trying to say is plain obvious.


----------



## Console-hater

Quote:


> Originally Posted by *kaosstar*
> 
> So I keep hearing whispers of a magical async compute driver on the way from Nvidia any day now. Is that just a rumor, or something that's actually happening?


Async with Nvidia is software based since their hardware can't handle proper async. Their driver is likely to improve the performance by single digit percentage. Or it'll never happen because they've said the same thing with AOTS benchmark.


----------



## Mahigan

Quote:


> Originally Posted by *kaosstar*
> 
> So I keep hearing whispers of a magical async compute driver on the way from Nvidia any day now. Is that just a rumor, or something that's actually happening?


I am still waiting on the Maxwell driver from nVIDIA.


As for what Kollock stated about Ashes of the Singularity:


Sound familiar? It is the same thing Bethesda are saying.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> I am still waiting on the Maxwell driver from nVIDIA.


Now lets sue them for false advertising.


----------



## SuperZan

Quote:


> Originally Posted by *Mahigan*
> 
> I am still waiting on the Maxwell driver from nVIDIA.
> 
> 
> Spoiler: Warning: Spoiler!


If Roy had made that tweet it would have been repost fodder to this day.


----------



## infranoia

Quote:


> Originally Posted by *Mahigan*
> 
> I am still waiting on the Maxwell driver from nVIDIA.
> 
> 
> As for what Kollock stated about Ashes of the Singularity:
> 
> 
> Sound familiar? It is the same thing Bethesda are saying.


To be fair that statement could just as easily been tweeted from Hallock and AMD.

There is literally nothing in there about a promise.

/ah, you ninja'd the Oxide quote. Yep, that's the same thing that id was saying. I wonder if it was just a simple email exchange:

"Hey, you guys are working on Async Compute right?"

"yep."
Quote:


> Originally Posted by *SuperZan*
> 
> If Roy had made that tweet it would have been repost fodder to this day.


Jeez, I'm just getting Ur-Ninja'd today. I'll see myself out.


----------



## Imouto

You don't bring stock to a conversation unless you want to make a fool of yourself.


----------



## Mahigan

Quote:


> Originally Posted by *Imouto*
> 
> You don't bring stock to a conversation unless you want to make a fool of yourself.


I see... yet for the past 3 years everyone has been bringing the stock price into various topics whilst predicting AMDs doom. Now that the price is up.... "You cannot bring up the stock price unless you want to make a fool out of yourself"....

I see...


----------



## FLCLimax

Two years of sweet green goblin tears. Gonna be good.


----------



## Themisseble

Quote:


> Originally Posted by *Mahigan*
> 
> I see... yet for the past 3 years everyone has been bringing the stock price into various topics whilst predicting AMDs doom. Now that the price is up.... "You cannot bring up the stock price unless you want to make a fool out of yourself"....
> 
> I see...


Nvidia stock went from 20$ to 50$+... NVIDIA made great deal too.


----------



## infranoia

Quote:


> Originally Posted by *Mahigan*
> 
> I see... yet for the past 3 years everyone has been bringing the stock price into various topics whilst predicting AMDs doom. Now that the price is up.... "You cannot bring up the stock price unless you want to make a fool out of yourself"....
> 
> I see...


Well, stock price was irrelevant then, and it's irrelevant now. Investors are opportunistic morons. One look at SeekingAlpha will tell you that.

Wait... what? Do my eyes deceive me? A *positive* AMD article on SA? Unbelievable, the end is truly nigh.

http://seekingalpha.com/article/3988436-amd-critics-pick-negative-nits-ignoring-positive-elephants


----------



## Imouto

Quote:


> Originally Posted by *Mahigan*
> 
> I see... yet for the past 3 years everyone has been bringing the stock price into various topics whilst predicting AMDs doom. Now that the price is up.... "You cannot bring up the stock price unless you want to make a fool out of yourself"....
> 
> I see...


And every time their stock raised AMD loyalists forecasted the second coming of Jesus. Guess what happened.

You don't bring stock to a conversation. Full stop.


----------



## Glottis

but but nvidia doesn't have async compute gains, they said, DOOM not yet supporting async compute for nvidia doesn't matter, they said.


----------



## FLCLimax

All the green kids gonna be playing 3DMark while we play games. And they call the actual games AMD excels in "just benchmarks", lmao.


----------



## magnek

Quote:


> Originally Posted by *Glottis*
> 
> but but nvidia doesn't have async compute gains, they said, DOOM not yet supporting async compute for nvidia doesn't matter, they said.


When AMD fans point to AotS nVidia fans say "it's just a benchmark". But because Pascal is finally showing a small amount of gains with async *in an actual 3DMark bench* all of a sudden it matters 100%.

LMAO


----------



## SuperZan

Quote:


> Originally Posted by *magnek*
> 
> When AMD fans point to AotS nVidia fans say "it's just a benchmark". But because Pascal is finally showing a small amount of gains with async *in an actual 3DMark bench* all of a sudden it matters 100%.
> 
> LMAO


How dare you, sir. Time Spy is even better than FireStrike, coz you don't get confused about which guy you're playing. HINT: You're always the Time Spy!


----------



## criminal

Quote:


> Originally Posted by *magnek*
> 
> When AMD fans point to AotS nVidia fans say "it's just a benchmark". But because Pascal is finally showing a small amount of gains with async *in an actual 3DMark bench* all of a sudden it matters 100%.
> 
> LMAO


Rabid fans are bad aren't they? lol


----------



## aberrero

I pronounced it "wulkan". Then I became sad :/


----------



## iRUSH

Quote:


> Originally Posted by *aberrero*
> 
> I pronounced it "wulkan". Then I became sad :/


What a horrible and unfortunate loss. Such a strange way to go.


----------



## magnek

Quote:


> Originally Posted by *SuperZan*
> 
> How dare you, sir. Time Spy is even better than FireStrike, coz you don't get confused about which guy you're playing. HINT: You're _always_ the Time Spy!


True true. Plus you just have to press a few buttons and then sit back and enjoy the show. None of this real time strategic thinking nonsense.
Quote:


> Originally Posted by *criminal*
> 
> Rabid fans are bad aren't they? lol


Not as bad as squeaky or dead fans.

..
..

Oh you weren't talking about _that_ kind of fan, my bad.


----------



## junkman

Well... that was entertaining.

In other news, I tried TSAA/Vulkan on my RX 480. At 4k, I was getting ~40 FPS. Pleasantly surprised.

I got about ~55 FPS on my overclocked 980 ti, and ~65FPS on my 1080.

Edit: Mind you, I sold the NV cards before the patch, so add about ~2-5 FPS to those scores with their software-level async adaptation.

I'm pretty pleased with these results.


----------



## Klocek001

Quote:


> Originally Posted by *FLCLimax*
> 
> All the green kids gonna be playing 3DMark while we play games. And they call the actual games AMD excels in "just benchmarks", lmao.


lol so now it's the other way round all of the sudden, and because rx480 lost a dx12 async benchmark to 980? somebody's a fussy britches today








realistically speaking it's actually better than 7% gain on 1080 since I was sure async on is gonna be up to 10% slower than async off so make that 7% gain a 17% one








but enough about that 3dmark already

by the way, can't help but crack up when I see another thread like this being 99,9% pure sarcasm with 0,1% of data.


----------



## FLCLimax

Quote:


> Originally Posted by *magnek*
> 
> When AMD fans point to AotS nVidia fans say "it's just a benchmark". But because Pascal is finally showing a small amount of gains with async *in an actual 3DMark bench* all of a sudden it matters 100%.
> 
> LMAO


----------



## Killacaps

lol flood threads with a benchmark i cant believe people pay for that stuff go play a game.

I downloaded the new 3dmark for giggles I got the wrong version first then i got the right one on another mirror guess they didn't update them all.

Come to find out it's only for dx12 yea i forgot about that.

Also it has been reading my 8350 wrong for a while says 3.2ghz mine at 4.5ghz.

And it has yelled at me the last couple of version for having admin privileges all though i have run it many times before with no issues weird.

Give me Vulkan.

Also the reason i came in this thread i bought Doom the other day fun game so far i just got outside so ill see more.

My 7970 runs it real good maxed on Vulkan i tried that first then Opengl 4.3/5 whatever it is im at 720p right now.

So far have a lot less fps drops from doing anything from explosion to turning around real fast 1.2ghz 7970 gives me some more boost and just to see how it scales pretty good.

I love the built in fps system since nothing works on Vulcan just like nothing works on Mantle.

Now be nice if the turn that fps counter into a tool like Dxtory and Fraps but for Vulkan/Mantle and a nice benchmark would be superb.


----------



## FLCLimax

Quote:


> Originally Posted by *junkman*
> 
> Well... that was entertaining.
> 
> In other news, I tried TSAA/Vulkan on my RX 480. At 4k, I was getting ~40 FPS. Pleasantly surprised.
> 
> I got about ~55 FPS on my overclocked 980 ti, and ~65FPS on my 1080.
> 
> Edit: Mind you, I sold the NV cards before the patch, so add about ~2-5 FPS to those scores with their software-level async adaptation.
> 
> I'm pretty pleased with these results.


Idk what's up with the Fury X getting 45fpsin the test at 4K, i'm getting 60 on a Fury. I will upload a short video in 20 hours when the effects of tonight wear off.


----------



## bmgjet

Decrease in average performance on my 980ti OC.
OpenGL averages 140fps drops to 90fps.
Vulkan averages 130fps drops to 110fps.
Crazy as coil wine on vulkan as well, opengl there was no coil wine at all and on other DX11 games very slightly noise.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> but but nvidia doesn't have async compute gains, they said, DOOM not yet supporting async compute for nvidia doesn't matter, they said.


I am not disputing that Pascal can get a boost under light loads... never have. As for 3D Mark Time Fly... See concurrent vs parallel execution. All of the current games supporting Asynchronous Compute make use of parallel execution of compute and graphics tasks. 3D Mark Time Fly support concurrent. It is not the same Asynchronous Compute.

Concurrency fills in gaps which are in the execution pipeline. Parallelism executes two tasks at the same time.


Notice the context switch involved?

If 3D Mark Time fly were using Parallel executions then there would be synchronization points between the two contexts (Graphics and Compute). There would also be pipeline stalls on Maxwell GPUs. Both the pipeline stalls and the flush required for a synchronization point would add latency thus leading to Maxwell losing performance when running this variant of Asynchronous compute. We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics. You see parallel executions = Asynchronous Compute + Graphics. Concurrent execution = Asynchronous Compute. They are not the same thing.
Quote:


> With DirectX 12, GPUs that support *asynchronous compute can process work from multiple queues in parallel*.


They can but that is not what 3D Mark is doing.
Quote:


> In Time Spy, *asynchronous compute is used heavily to overlap rendering passes* to maximize GPU utilization. The asynchronous compute workload per frame varies between 10-20%. To observe the benefit on your own hardware, you can optionally choose to disable async compute using the Custom run settings in 3DMark Advanced and Professional Editions.


That is from 3DMark and can be found in the PC Per review. http://www.pcper.com/reviews/Graphics-Cards/3DMark-Time-Spy-Looking-DX12-Asynchronous-Compute-Performance

Yeah... even PCPer went a step further and attacked "AMD Fanboys" when in reality... PC Per do not even know the difference. Tech journalism....









What is concurrency?
https://en.wikipedia.org/wiki/Concurrent_computing
Quote:


> *Concurrent computing* is a form of computing in which several computations are *executed during overlapping time periods* -concurrently- instead of sequentially (one completing before the next starts).


So yeah... 3D Mark does not use the same type of Asynchronous compute found in all of the recent game titles. Instead.. 3D Mark appears to be specifically tailored so as to show nVIDIA GPUs in the best light possible. It makes use of Context Switches (good because Pascal has that improved pre-emption) as well as the Dynamic Load Balancing on Maxwell through the use of concurrent rather than parallel Asynchronous compute tasks. If parallelism was used then we would see Maxwell taking a performance hit under Time Fly as admitted by nVIDIA in their GTX 1080 white paper and as we have seen from AotS.


GCN can handle these tasks but performs even better when Parallelism is thrown in as seen in the Doom Vulkan results. How? By reducing the per Frame latency through the parallel executions of Graphics and Compute Tasks. A reduction in the per-frame latency means that each frame takes less time to execute and process. The net result is a higher frame rate. 3DMark lacks this. AotS makes use of both parallelism and concurrency... as does Doom with the new Vulkan patch. See below...


If 3D Mark Time Fly had implemented a separate path and enabled both concurrency and parallelism for the FuryX... it would have caught up to the GTX 1070. No joke.

If both AMD and nVIDIA are running the same code then Pascal would either gain a tiny bit or even lose performance. This is why Bethesda did not enable the Asynchronous Compute + Graphics from the AMD path for Pascal. Instead... Pascal will get its own optimized path. They will also call it Asynchronous Compute... people will think it is the same thing when in reality... two completely different things are happening behind the scene.

See why understanding what is actually happening behind the scenes is important rather than just looking at numbers? Not all Asynchronous Compute implementations are equal. You would do well to take note of this.

Where are the tech journalists these days?


----------



## FLCLimax

NvidiaMark once again.


----------



## EightDee8D

Quote:


> Originally Posted by *Mahigan*
> 
> Where are the tech journalists these days?


Bought out by big bad boyz (intl/nvda) to slack, so they can con many peoples.


----------



## Randomdude

I am very underwhelmed by this GPU cycle. Honestly, new memory,new node,new architecture - these things had my hopes up for something different. What it turns out is that nVidia is yet again staying in the way of innovation (to make more money, who can blame them) with their Pascal shenanigans, especially after seeing Mahigan's Time Spy posts. And on the other hand is AMD that might have the better hardware, but you'd be a fool to buy them when the market is controlled by nV. I really want to buy a Vega, but I know that the Titan will be likely faster. This is like an exact repeat of the Maxwell/Fiji generation. I am not going to buy a Pascal (Maxwell v3) regardless of its performance when I know that it will be times more obsolete than Maxwell is now when Volta hits. If I go with a Vega then I'll have to deal with nVidia strong arming the market and it would be just as obsolete (having features it can't make use of) as Fiji was... Really sad, imo.


----------



## scorch062

DF did another video, now mostly about Fury X at 4k with some discussion:



Pretty impressive to sustain 50 to 60 FPS at 4K for Fury X on Ultra. 1070, as little has been in the video, gains nothing so far and under Vulkan the V-sync cannot be disabled for it.


----------



## FLCLimax

No need to upload my video then.


----------



## Remij

Quote:


> Originally Posted by *Randomdude*
> 
> I am very underwhelmed by this GPU cycle. Honestly, new memory,new node,new architecture - these things had my hopes up for something different. What it turns out is that nVidia is yet again staying in the way of innovation (to make more money, who can blame them) with their Pascal shenanigans, especially after seeing Mahigan's Time Spy posts. And on the other hand is AMD that might have the better hardware, but you'd be a fool to buy them when the market is controlled by nV. I really want to buy a Vega, but I know that the Titan will be likely faster. This is like an exact repeat of the Maxwell/Fiji generation. I am not going to buy a Pascal (Maxwell v3) regardless of its performance when I know that it will be times more obsolete than Maxwell is now when Volta hits. If I go with a Vega then I'll have to deal with nVidia strong arming the market and it would be just as obsolete (having features it can't make use of) as Fiji was... Really sad, imo.


Oh lord... if you like AMD and what they are doing.. support them. They need it.

But it's funny to me that people like you claim that Nvidia are holding technology back, having to debate which GPU vendor to buy from because on one hand.. AMD has innovative hardware, yet on the other hand Nvidia has the best performance.. It doesn't sound like they are holding technology back to me...

Blame AMD for not having an ultra high-end enthusiast card out there to battle the 1080 yet

If you like that AMD cards hold performance due to more forward thinking technologies, then support AMD for doing that. But I think it's time we stop blaming Nvidia for holding things back when right now AMD isn't even giving them any challenge on the high end.

Mid-range, I think the choice is fairly clear which company stands to provide the best performance/value ratio. AMD has always been good in that regard, and always been a more value oriented purchase for consumers in that range.

But all this strong arming that people say Nvidia is doing with their marketshare influence is no worse than the advantage AMD has with the GCN architecture being in all 3 current gen consoles and those coming in the foreseeable future. Nvidia has the right and duty to push it's platform and make it THE hardware to want to play games on. AMD has the same duty. They have the hardware, it's their job to make sure developers use it to it's fullest extent. There was a wait, but now things are starting to come together for them. They've got much better APIs for their hardware, they have better drivers (seemingly) than before. They have their hardware in all game consoles. And now they have a newer architecture on the horizon.

However, after all that if you're still not convinced to buy from them (especially because the competition still has a faster GPU out) then you need to just drop the moral dilemma from the equation and buy what suits your needs and be happy with it.


----------



## ChevChelios

Quote:


> 1070, as little has been in the video, gains nothing so far and under Vulkan the V-sync cannot be disabled for it.


dunno about 1070, bur neither of those is true for my G1 1080

there is a slight increase in avg fps, and a decent one in min fps

Vsync can absolutely be disabled with Vulkan, Vsync line in Options says Off and my fps counter isnt 60, but up to 200 fps, as it should be


----------



## daviejams

Fury x 27% faster than the 1070 in the first scene in the Eurogamer video above using vulcan

Impressive


----------



## ZealotKi11er

Quote:


> Originally Posted by *FLCLimax*
> 
> NvidiaMark once again.


3DMark is 3DMark and nothing more. FireStrike did not even bother to expose AMDs problem with CPU overhead even at 1080p. This means 3DMark is at a point where its too far off actual game experience. I only use it as a tool to test my own cards and overclocking.


----------



## Clocknut

what would have been if 290X is paired with 6Ghz GDDR5, and No crap driver from AMD, all at launch day back then with 780ti. that 780Ti would completely destroyed


----------



## ZealotKi11er

Quote:


> Originally Posted by *Clocknut*
> 
> what would have been if 290X is paired with 6Ghz GDDR5, and No crap driver from AMD, all at launch day back then with 780ti. that 780Ti would completely destroyed


290X was paired with 6GHz GDDR5. Reason it was run at 5GHz was to keep internal memory controller at lower voltage to reduce power. 290X would have never won again 780 Ti. Reason it does now it's all because of Nvidia support lacking and not AMD doing better.


----------



## Mahigan

Quote:


> Originally Posted by *ZealotKi11er*
> 
> 290X was paired with 6GHz GDDR5. Reason it was run at 5GHz was to keep internal memory controller at lower voltage to reduce power. 290X would have never won again 780 Ti. Reason it does now it's all because of Nvidia support lacking and not AMD doing better.


Benchmarks and reviews have shown that the support for the GTX 780 Ti is still there. nVIDIA are not as evil as some people think. They are still actively fixing issues with the GTX 780 Ti.

The issue is with the games. The games are now tailored for GCN (most of them at least) so nVIDIA is being pulled towards producing more GCN-like architectures. Not only are the new APIs better suited towards GCN (being Mantle derived) but so are the game titles (being produced to run on GCN based consoles).

It used to be that a majority of titles partnered with nVIDIA and added nVIDIA centric Gameworks optimizations throughout the rendering pipeline. Now... even the titles partnering with nVIDIA were first conceived to run on GCN based consoles.

The end result is a much better showing for GCN based architectures relative to older Kepler and now Maxwell based architectures. This trend does not appear like it is going to change. nVIDIA lack the x86 license required to change this trend.

In the meantime... developers are learning how to fully optimize games for GCN forcing nVIDIA to brute force their way to a win. Like a pentium IV however... nVIDIA will hit a ceiling (run a GTX 1080 and GTX 980 Ti at the same clocks and you will see that the per clock performance benefits are not as huge). Therefore Pascal is more of a stop-gap measure (and an impressive one as well). In my opinion Volta will incorporate many GCN-like characteristics.

My 2 cents.


----------



## Evil Penguin

Quote:


> Originally Posted by *ZealotKi11er*
> 
> 290X was paired with 6GHz GDDR5. Reason it was run at 5GHz was to keep internal memory controller at lower voltage to reduce power. 290X would have never won again 780 Ti. *Reason it does now it's all because of Nvidia support lacking* and not AMD doing better.


At this point what are the odds of NVIDIA addressing that?

You'd think a 561 mm^2 GPU from the previous node generation would be doing better than it currently is against similar GPUs from the competition.


----------



## iRUSH

Quote:


> Originally Posted by *ZealotKi11er*
> 
> 3DMark is 3DMark and nothing more. FireStrike did not even bother to expose AMDs problem with CPU overhead even at 1080p. This means 3DMark is at a point where its too far off actual game experience. I only use it as a tool to test my own cards and overclocking.


This is where I stand in it too, 100%. Especially regarding the physics score.

Sure the FX 8 core scores more than a Skylake i3. Doesn't mean it's going to out perform it gaming.


----------



## Glottis

yet another website that completely contradicts computerbase's findings. massive Fury X advantage over 980Ti nowhere to be seen. awkward.


----------



## sugarhell

Quote:


> Originally Posted by *Glottis*
> 
> yet another website that completely contradicts computerbase's findings. massive Fury X advantage over 980Ti nowhere to be see. awkward.







Why do you even compare benchmarks that use different scenes. You know you can't compare them?


----------



## Glottis

Async isn't yet working in Vulcan DOOM for Geforce graphics cards, and Pascal does have gains from Async. That's why I was more interested in 980Ti vs Fury X. Too bad all tech sites seem to ignore or aren't aware of this fact. We need to wait for Async to be enabled for Geforce before we can properly compare Fury X to Pascal (1070/1080).


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> yet another website that completely contradicts computerbase's findings. massive Fury X advantage over 980Ti nowhere to be seen. awkward.


Not really... it depends on the area being tested in the game. Nobody is testing the same area of the game. Everyone is running tests in different spots of the game. The heavy compute spots will benefit the FuryX (and kill the nVIDIA cards due to a lack of Async compute + Graphics support). The heavy graphics areas will benefit the nVIDIA cards due to a more robust Graphics rendering pipeline.

Doom does not have a timedemo (as in previous Doom and Quake titles).


----------



## scorch062

Quote:


> Originally Posted by *Glottis*
> 
> Too bad all tech sites seem to ignore or aren't aware of this fact. We need to wait for Async to be enabled for Geforce before we can properly compare Fury X to Pascal (1070/1080).


Are you serious? With this logic, for instance, tech sites should have not benchmark Fallout 4 because AMD did not have drivers ready back then.

Benchmarks will be renewed *IF* Nvidia releases Async drivers.


----------



## Mahigan

Quote:


> Originally Posted by *scorch062*
> 
> Are you serious? With this logic, for instance, tech sites should have not benchmark Fallout 4 because AMD did not have drivers ready back then.
> 
> Benchmarks will be renewed *IF* Nvidia releases Async drivers.


I guess this invalidates 3D Mark Time Fly due to its lack of Asynchronous Compute + Graphics. We will have to wait until 3DMark incorporate this feature (never).


----------



## FLCLimax

Quote:


> Originally Posted by *scorch062*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Glottis*
> 
> Too bad all tech sites seem to ignore or aren't aware of this fact. We need to wait for Async to be enabled for Geforce before we can properly compare Fury X to Pascal (1070/1080).
> 
> 
> 
> Are you serious? With this logic, for instance, tech sites should have not benchmark Fallout 4 because AMD did not have drivers ready back then.
> 
> Benchmarks will be renewed *IF* Nvidia releases Async drivers.
Click to expand...

Yup. And from the looks of it they'll have to release async drivers for every game that supports it.


----------



## Mahigan

Quote:


> Originally Posted by *FLCLimax*
> 
> Yup. And from the looks of it they'll have to release async drivers for every game that supports it.


Just like the initial Doom results were invalidated due to AMD running on a different version of OpenGL thus lacking features... oh wait... Glottis never mentioned how we should invalidate those results.

Does this mean we have to still keep on waiting for nVIDIA to release their AotS Async Drivers before we draw any conclusions on a test done back in 2015?


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> Just like the initial Doom results were invalidated due to AMD running on a different version of OpenGL thus lacking features... oh wait... Glottis never mentioned how we should invalidate those results.
> 
> Does this mean we have to still keep on waiting for nVIDIA to release their AotS Async Drivers before we draw any conclusions on a test done back in 2015?


I hope you are joking. Pretty much every time new AMD drivers is released all big sites re-run benchmarks and publish gains. Nvidia hardly ever get same treatment. I don't know why do I even bother replying to you. When I posted some benchmark which forgot to enabled TSSAA which enables Async for AMD you gone mental instantly screaming fault. But when Nvidia doesn't even get to use Async at all it's all good in your eyes. There is no reasoning with someone like you.


----------



## FLCLimax

Quote:


> "DOOM ALPHA BENCHMARK: AMD DOMINATES OVER NVIDIA"
> 
> http://vrworld.com/2016/02/29/doom-alpha-benchmark-amd-dominates-over-nvidia/


Quote:


> "AMD beats NVIDIA in early Doom benchmarks, with AMD dominating at 4K"
> 
> http://www.tweaktown.com/news/50737/amd-beats-nvidia-early-doom-benchmarks-dominating-4k/index.html


----------



## Yttrium

When people dont realise that there is no "half implementation of async"

Nvidia cards gain a few percent from vulkan because it has more than just async to offer.

As for 3Dmark, its just a test on how well a card can run said test. For those who think its biased towards Nvidia due to no proper async (see mahigan's exellent post somewhere above) I can agree it looks biased however never exclude incompetence as a cause.


----------



## Mahigan

What we do know about Doom Vulkan at 1080p is that a
FuryX can range between around 154 - 161 FPS varying on the scene being rendered.
GTX 980 Ti can range between around 135 - 148 FPS varying on the scene being rendered.
GTX 1070 can range between around 136 - 143 FPS varying on the scene being rendered.

What we do know about Doom Vulkan at 1440p is that a
FuryX can range between around 97 - 111 FPS varying on the scene being rendered.
GTX 980 Ti can range between around 85 - 101 FPS varying on the scene being rendered.
GTX 1070 can range between around 88 - 98 FPS varying on the scene being rendered.

So yeah... the FuryX beats out the GTX 980 Ti and GTX 1070. If Bethesda does include nVIDIAs support for concurrent execution of tasks (Asynchronous Compute but not Asynchronous Compute + Graphics) then we may see the GTX 1070 gaining ground on the FuryX.


----------



## ChevChelios

Quote:


> Originally Posted by *Glottis*
> 
> yet another website that completely contradicts computerbase's findings. massive Fury X advantage over 980Ti nowhere to be seen. awkward.


just shows how bad OpenGL on AMD was in Doom

480 was well behind a 970 (!) .. hell, 390X was behind the 970 .. Fury X on 980 level

as with other titles - a big reason for such a large relative gain is bringing up the poor DX11/OpenGL performance (480 should have been between 970 & 980 on OpenGL and then gaining a bit more to be on 980 level on Vulkan)

the more I see things like that the more I am convinced that Pascal is the golden middle for the lengthy DX11 --> DX12 transition period .. it keeps the good Maxwell DX11/OpenGL performance, but also gains more than Maxwell from DX12/Vulkan


----------



## BrightCandle

Quote:


> Originally Posted by *Yttrium*
> 
> When people dont realise that there is no "half implementation of async"


Developers code games towards hardware that exists or they believe will exist in volume in a few years time. They have decent predictions on what will be available and hence target that hardware. In practice then what happens is they have to either make separate paths for certain features for the differences in performance between the cards or alternatively reduce the quality of some effects. Nvidia has better tessellation performance and fill rate but clearly worse theoretical compute performance. So in the end what happens is they "optimise" their scenes to meet the minimum requirements of both cards. Async compute is definitely going to be one of those former things where different paths for different vendors is required, and if all they do is lazy port from the console its going to be a bad time for Nvidia because that is not the optimal way to do it on their architecture. But there are always these differences in performance for particular parts of cards, its been this way since the beginning of Video cards and its something as developers we just deal with, we are required to ensure it works well enough on both companies hardware.

It might give an advantage to AMD at a given die size in some circumstances but its not going to be the only dominating factor in performance in the future, there are lots of areas where these cards vastly differ in performance.


----------



## Mahigan

Quote:


> Originally Posted by *Glottis*
> 
> I hope you are joking. Pretty much every time new AMD drivers is released all big sites re-run benchmarks and publish gains. Nvidia hardly ever get same treatment. I don't know why do I even bother replying to you. When I posted some benchmark which forgot to enabled TSSAA which enables Async for AMD you gone mental instantly screaming fault. But when Nvidia doesn't even get to use Async at all it's all good in your eyes. There is no reasoning with someone like you.


I am being sarcastic. Dark humor.

As for me going mental... do you have a quote which indicates me going mental? Or is it more like my pointing out that the Async was not enabled in a test which supported Async and was meant to showcase that ability? nVIDIA cannot do Async in this particular test therefore we have to go with what is available just as we did with the initial Doom results. At first AMD was in the lead during the Alpha stages... then nVIDIA as they gained better OpenGL optimizations while AMD concentrated on Vulkan optimizations. Now that AMDs results are out... we go with those and if nVIDIA and Bethesda release the nVIDIA Vulkan optimizations we will have to also consider those.

I am not the one who has issues reasoning bud. All I do is reason and post reasoned explanations and tech observations. I have to deal with people like you... I am not talking about nVIDIA fans but just FANS. I am either insulted by AMD fans (over at Anandtech from my Pascal coverage for daring to suggest that Pascal could do Asynchronous Compute but not Asynchronous Compute + Graphics) or I have to deal with NV fans who claim everything I say is BS when for the most part it is true.

I was right about the MAxwell caching issues (now fixed with Pascal) and I was right about Maxwell lacking Async support as well. At some point in time I hope that I will not have to deal with such hostilities when commenting on the inner happenings/behind the scenes happenings of GPU architectures.

I have been pissed off at NV at times and recently AMD annoyed me with their 150W TDP claim for Polaris (110W for the GPU is not a defense). That being said... I am still waiting for the Maxwell Async drivers folks shoved in my face the last time around. I still have not gotten over that one. That was a mighty big lie on nVIDIAs part.


----------



## BrightCandle

Quote:


> Originally Posted by *ChevChelios*
> 
> just shows how bad OpenGL on AMD was in Doom


AMDs drivers in openGL have been laughably bad for as long as they have been making GPUs. The sad part is they never bothered to improve them for their customers sake.


----------



## Mahigan

Quote:


> Originally Posted by *BrightCandle*
> 
> AMDs drivers in openGL have been laughably bad for as long as they have been making GPUs. The sad part is they never bothered to improve them for their customers sake.


100% true. They likely did not see the value in updating those drivers seeing as few games utilize the OpenGL API. In a way.. AMD were also late to the game with Vulkan drivers as they concentrated on DX12. Now AMD seems to have pretty robust DX and Vulkan support so those past issues from AMD are likely just that... the past.


----------



## Glottis

Quote:


> Originally Posted by *Mahigan*
> 
> I am being sarcastic. Dark humor.
> 
> As for me going mental... do you have a quote which indicates me going mental? Or is it more like my pointing out that the Async was not enabled in a test which supported Async and was meant to showcase that ability? nVIDIA cannot do Async in this particular test therefore we have to go with what is available just as we did with the initial Doom results. At first AMD was in the lead during the Alpha stages... then nVIDIA as they gained better OpenGL optimizations while AMD concentrated on Vulkan optimizations. Now that AMDs results are out... we go with those and if nVIDIA and Bethesda release the nVIDIA Vulkan optimizations we will have to also consider those.
> 
> I am not the one who has issues reasoning bud. All I do is reason and post reasoned explanations and tech observations. I have to deal with people like you... I am not talking about nVIDIA fans but just FANS. I am either insulted by AMD fans (over at Anandtech from my Pascal coverage for daring to suggest that Pascal could do Asynchronous Compute but not Asynchronous Compute + Graphics) or I have to deal with NV fans who claim everything I say is BS when for the most part it is true.
> 
> I was right about the MAxwell caching issues (now fixed with Pascal) and I was right about Maxwell lacking Async support as well. At some point in time I hope that I will not have to deal with such hostilities when commenting on the inner happenings/behind the scenes happenings of GPU architectures.
> 
> I have been pissed off at NV at times and recently AMD annoyed me with their 150W TDP claim for Polaris (110W for the GPU is not a defense). That being said... I am still waiting for the Maxwell Async drivers folks shoved in my face the last time around. I still have not gotten over that one. That was a mighty big lie on nVIDIAs part.


didn't mean to offend you, merely used word mental to say that you are passionate in your posts. i agree you were right on a lot of things, but there's a lot inconsistency in DOOM Vulcan benchmarks and there's no denying that. i don't know why that is, but for example TimeSpy benchmark is very consistent across all websites.


----------



## FLCLimax

Couldn't use video capture program...didn't work. crappy hand held camera time, sorry. So nice on the Fury. It does dip into the 50's when doing melee finishers and when the room is full of enemies and explosions, but i believe this is higher than all nvidia cards @ 4K or equal to the 1080?.



Spoiler: Warning: Spoiler!





















Spoiler: Warning: Spoiler!


----------



## airfathaaaaa

Quote:


> Originally Posted by *Mahigan*
> 
> What we do know about Doom Vulkan at 1080p is that a
> FuryX can range between around 154 - 161 FPS varying on the scene being rendered.
> GTX 980 Ti can range between around 135 - 148 FPS varying on the scene being rendered.
> GTX 1070 can range between around 136 - 143 FPS varying on the scene being rendered.
> 
> What we do know about Doom Vulkan at 1440p is that a
> FuryX can range between around 97 - 111 FPS varying on the scene being rendered.
> GTX 980 Ti can range between around 85 - 101 FPS varying on the scene being rendered.
> GTX 1070 can range between around 88 - 98 FPS varying on the scene being rendered.
> 
> So yeah... the FuryX beats out the GTX 980 Ti and GTX 1070. If Bethesda does include nVIDIAs support for concurrent execution of tasks (Asynchronous Compute but not Asynchronous Compute + Graphics) then we may see the GTX 1070 gaining ground on the FuryX.


someone took your posts and made a reddit post for it
https://www.reddit.com/r/Amd/comments/4t5ckj/apparently_3dmark_doesnt_really_use_any/


----------



## kfxsti

Woohoo the high end cards run it above 60fps. Sweet lol


----------



## DaaQ

Quote:


> Originally Posted by *Glottis*
> 
> Async isn't yet working in Vulcan DOOM for Geforce graphics cards, and Pascal does have gains from Async. That's why I was more interested in 980Ti vs Fury X. Too bad all tech sites seem to ignore or aren't aware of this fact. *We need to wait for Async to be enabled for Geforce before we can properly compare Fury X to Pascal* (1070/1080).


Oh the irony.
The waiting game ensues.


----------



## FLCLimax

Haha, yea. And what's worse is that NVIDIA can't just put some software out and say "Hey, come get your Async Compute for GeForce!". They have to make a driver on their end(and the devs have to make special optimizations) in each and every individual game that uses the feature.

*NVIDIA, soon­™.*


----------



## Pereb

Quote:


> Originally Posted by *Mahigan*
> 
> We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics


This seems wrong to me... async is supposed to be disabled for Maxwell on the driver level. It should be normal for the performance to be the same with async on and off since it's never actually enabled. The 0,1% difference is just margin of error.


----------



## airfathaaaaa




----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> I am not disputing that Pascal can get a boost under light loads... never have. As for 3D Mark Time Fly... See concurrent vs parallel execution. All of the current games supporting Asynchronous Compute make use of parallel execution of compute and graphics tasks. 3D Mark Time Fly support concurrent. It is not the same Asynchronous Compute.
> 
> Concurrency fills in gaps which are in the execution pipeline. Parallelism executes two tasks at the same time.
> 
> 
> Notice the context switch involved?
> 
> If 3D Mark Time fly were using Parallel executions then there would be synchronization points between the two contexts (Graphics and Compute). There would also be pipeline stalls on Maxwell GPUs. Both the pipeline stalls and the flush required for a synchronization point would add latency thus leading to Maxwell losing performance when running this variant of Asynchronous compute. We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics. You see parallel executions = Asynchronous Compute + Graphics. Concurrent execution = Asynchronous Compute. They are not the same thing.
> They can but that is not what 3D Mark is doing.
> That is from 3DMark and can be found in the PC Per review. http://www.pcper.com/reviews/Graphics-Cards/3DMark-Time-Spy-Looking-DX12-Asynchronous-Compute-Performance
> 
> Yeah... even PCPer went a step further and attacked "AMD Fanboys" when in reality... PC Per do not even know the difference. Tech journalism....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What is concurrency?
> https://en.wikipedia.org/wiki/Concurrent_computing
> So yeah... 3D Mark does not use the same type of Asynchronous compute found in all of the recent game titles. Instead.. 3D Mark appears to be specifically tailored so as to show nVIDIA GPUs in the best light possible. It makes use of Context Switches (good because Pascal has that improved pre-emption) as well as the Dynamic Load Balancing on Maxwell through the use of concurrent rather than parallel Asynchronous compute tasks. If parallelism was used then we would see Maxwell taking a performance hit under Time Fly as admitted by nVIDIA in their GTX 1080 white paper and as we have seen from AotS.
> 
> 
> GCN can handle these tasks but performs even better when Parallelism is thrown in as seen in the Doom Vulkan results. How? By reducing the per Frame latency through the parallel executions of Graphics and Compute Tasks. A reduction in the per-frame latency means that each frame takes less time to execute and process. The net result is a higher frame rate. 3DMark lacks this. AotS makes use of both parallelism and concurrency... as does Doom with the new Vulkan patch. See below...
> 
> 
> If 3D Mark Time Fly had implemented a separate path and enabled both concurrency and parallelism for the FuryX... it would have caught up to the GTX 1070. No joke.
> 
> If both AMD and nVIDIA are running the same code then Pascal would either gain a tiny bit or even lose performance. This is why Bethesda did not enable the Asynchronous Compute + Graphics from the AMD path for Pascal. Instead... Pascal will get its own optimized path. They will also call it Asynchronous Compute... people will think it is the same thing when in reality... two completely different things are happening behind the scene.
> 
> See why understanding what is actually happening behind the scenes is important rather than just looking at numbers? Not all Asynchronous Compute implementations are equal. You would do well to take note of this.
> 
> Where are the tech journalists these days?


question then seeem to be to ask this, *did Nvidia pay off Futuremark then?*
or is there another reason for this to be implemented this way and not to show graphics compute?


----------



## ivymaxwell

this seems too much of a coincidence for Futuremark and Nvidia to not have colluded.


----------



## airfathaaaaa

i wonder who is able to run codexl on that test? will be very interesting to see what is actually going on on nvidia..


----------



## ChevChelios

Quote:


> We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics


cause you say so ? nope


----------



## airfathaaaaa

Quote:


> Originally Posted by *ChevChelios*
> 
> cause you say so ? nope


hmm let me guess

dx12/vulkan games comes all show regression on maxwell cards when async is on

random "async" test comes actually shows something different "its all good nvidia can do async"


----------



## FLCLimax

Quote:


> Originally Posted by *magnek*
> 
> When AMD fans point to AotS nVidia fans say "it's just a benchmark". But because Pascal is finally showing a small amount of gains with async *in an actual 3DMark bench* all of a sudden it matters 100%.
> 
> LMAO


This about sums it up.


----------



## ku4eto

Quote:


> Originally Posted by *ChevChelios*
> 
> cause you say so ? nope


Oh my god, 2k posts and you go out and call Mahigans' post BS. I am about to block you, because you are just not contributing with absolutely ANYTHING (instead bash, lies, and call other people liars).


----------



## Pereb

Quote:


> Originally Posted by *airfathaaaaa*
> 
> dx12/vulkan games comes all show regression on maxwell cards when async is on


Source? Again, async should be disabled on the driver level for Maxwell.


----------



## Kpjoslee

cocurrency =/= Asynchronous. Perhaps we should call Time Spy Cocurrency test.


----------



## Dudewitbow

Quote:


> Originally Posted by *flopper*
> 
> question then seeem to be to ask this, *did Nvidia pay off Futuremark then?*
> or is there another reason for this to be implemented this way and not to show graphics compute?


I don't think any money was involved with anything, 3dmark just took the async compute sort of by definition, closer to what Nvidia had defined it, rather than the parallel nature in which AMD has defined it. It's basically under-using AMD hardware. By recommendations iirc Nvidia says it can do async compute, but it recommends developers to utilize switches sparsely (because it cannot do it in a parallel nature)

edit: found that page
Quote:


> Don't toggle between compute and graphics on the same command queue more than absolutely necessary
> 
> This is still a heavyweight switch to make


https://developer.nvidia.com/dx12-dos-and-donts


----------



## Potatolisk

The question is whether any game developer will be using async the same way the 3Dmark does. If not it would be rather useless benchmark.


----------



## Kpjoslee

Quote:


> Originally Posted by *Potatolisk*
> 
> The question is whether any game developer will be using async the same way the 3Dmark does. If not it would be rather useless benchmark.


It would be lot easier for game developers to utilize GCN method of Async because consoles share the same architecture.


----------



## nagle3092

Quote:


> Originally Posted by *Dudewitbow*
> 
> I don't think any money was involved with anything, 3dmark just took the async compute sort of by definition, closer to what Nvidia had defined it, rather than the parallel nature in which AMD has defined it. *It's basically under-using AMD hardware*. By recommendations iirc Nvidia says it can do async compute, but it recommends developers to utilize switches sparsely (because it cannot do it in a parallel nature)
> 
> edit: found that page
> https://developer.nvidia.com/dx12-dos-and-donts


I think this is the case. My 1350/2250 run I did ran with no issues in time spy. Fired up heaven and it crashed almost instantly.


----------



## JackCY

Quote:


> Originally Posted by *Mahigan*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I am not disputing that Pascal can get a boost under light loads... never have. As for 3D Mark Time Fly... See concurrent vs parallel execution. All of the current games supporting Asynchronous Compute make use of parallel execution of compute and graphics tasks. 3D Mark Time Fly support concurrent. It is not the same Asynchronous Compute.
> 
> Concurrency fills in gaps which are in the execution pipeline. Parallelism executes two tasks at the same time.
> 
> 
> Notice the context switch involved?
> 
> If 3D Mark Time fly were using Parallel executions then there would be synchronization points between the two contexts (Graphics and Compute). There would also be pipeline stalls on Maxwell GPUs. Both the pipeline stalls and the flush required for a synchronization point would add latency thus leading to Maxwell losing performance when running this variant of Asynchronous compute. We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics. You see parallel executions = Asynchronous Compute + Graphics. Concurrent execution = Asynchronous Compute. They are not the same thing.
> They can but that is not what 3D Mark is doing.
> That is from 3DMark and can be found in the PC Per review. http://www.pcper.com/reviews/Graphics-Cards/3DMark-Time-Spy-Looking-DX12-Asynchronous-Compute-Performance
> 
> Yeah... even PCPer went a step further and attacked "AMD Fanboys" when in reality... PC Per do not even know the difference. Tech journalism....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What is concurrency?
> https://en.wikipedia.org/wiki/Concurrent_computing
> So yeah... 3D Mark does not use the same type of Asynchronous compute found in all of the recent game titles. Instead.. 3D Mark appears to be specifically tailored so as to show nVIDIA GPUs in the best light possible. It makes use of Context Switches (good because Pascal has that improved pre-emption) as well as the Dynamic Load Balancing on Maxwell through the use of concurrent rather than parallel Asynchronous compute tasks. If parallelism was used then we would see Maxwell taking a performance hit under Time Fly as admitted by nVIDIA in their GTX 1080 white paper and as we have seen from AotS.
> 
> 
> GCN can handle these tasks but performs even better when Parallelism is thrown in as seen in the Doom Vulkan results. How? By reducing the per Frame latency through the parallel executions of Graphics and Compute Tasks. A reduction in the per-frame latency means that each frame takes less time to execute and process. The net result is a higher frame rate. 3DMark lacks this. AotS makes use of both parallelism and concurrency... as does Doom with the new Vulkan patch. See below...
> 
> 
> If 3D Mark Time Fly had implemented a separate path and enabled both concurrency and parallelism for the FuryX... it would have caught up to the GTX 1070. No joke.
> 
> If both AMD and nVIDIA are running the same code then Pascal would either gain a tiny bit or even lose performance. This is why Bethesda did not enable the Asynchronous Compute + Graphics from the AMD path for Pascal. Instead... Pascal will get its own optimized path. They will also call it Asynchronous Compute... people will think it is the same thing when in reality... two completely different things are happening behind the scene.
> 
> See why understanding what is actually happening behind the scenes is important rather than just looking at numbers? Not all Asynchronous Compute implementations are equal. You would do well to take note of this.
> 
> 
> Where are the tech journalists these days?












Journalists? They copy paste news and make money on ads, they can't be bothered for about 10 years to make any thought and proper reviews of technical equipment or software.

It's pretty obvious to anyone who knows software and hardware a bit that 3D Mark is not a good benchmark to compare GPUs as it does not use each GPU to it's maximum potential.


----------



## xboxshqip

3dmark dev on steam.


----------



## magnek

eh screw it nvm


----------



## FLCLimax

Quote:


> Originally Posted by *xboxshqip*
> 
> 3dmark dev on steam.


Yea he's made an account here.


----------



## PriestOfSin

Very cool!

Maybe we will see a rise in AMD builds... Zen + whatever AMD's new hotness GPU is? Here's hoping!


----------



## Bauxno

So they can trow a tesselation test that heavy favors Nvidia on dx 11 but cant do the same for AMD with using heavy parrallel work .


----------



## rosade

Quote:


> Originally Posted by *Mahigan*
> 
> I am not disputing that Pascal can get a boost under light loads... never have. As for 3D Mark Time Fly... See concurrent vs parallel execution. All of the current games supporting Asynchronous Compute make use of parallel execution of compute and graphics tasks. 3D Mark Time Fly support concurrent. It is not the same Asynchronous Compute.
> 
> Concurrency fills in gaps which are in the execution pipeline. Parallelism executes two tasks at the same time.
> 
> 
> Notice the context switch involved?
> 
> If 3D Mark Time fly were using Parallel executions then there would be synchronization points between the two contexts (Graphics and Compute). There would also be pipeline stalls on Maxwell GPUs. Both the pipeline stalls and the flush required for a synchronization point would add latency thus leading to Maxwell losing performance when running this variant of Asynchronous compute. We do not see Maxwell losing performance under 3D Mark Time Spy. We see a tiny performance boost. Thus 3D Mark Time Spy is not running Asynchronous Compute + graphics. You see parallel executions = Asynchronous Compute + Graphics. Concurrent execution = Asynchronous Compute. They are not the same thing.
> They can but that is not what 3D Mark is doing.
> That is from 3DMark and can be found in the PC Per review. http://www.pcper.com/reviews/Graphics-Cards/3DMark-Time-Spy-Looking-DX12-Asynchronous-Compute-Performance
> 
> Yeah... even PCPer went a step further and attacked "AMD Fanboys" when in reality... PC Per do not even know the difference. Tech journalism....
> 
> 
> 
> 
> 
> 
> 
> 
> 
> What is concurrency?
> https://en.wikipedia.org/wiki/Concurrent_computing
> So yeah... 3D Mark does not use the same type of Asynchronous compute found in all of the recent game titles. Instead.. 3D Mark appears to be specifically tailored so as to show nVIDIA GPUs in the best light possible. It makes use of Context Switches (good because Pascal has that improved pre-emption) as well as the Dynamic Load Balancing on Maxwell through the use of concurrent rather than parallel Asynchronous compute tasks. If parallelism was used then we would see Maxwell taking a performance hit under Time Fly as admitted by nVIDIA in their GTX 1080 white paper and as we have seen from AotS.
> 
> 
> GCN can handle these tasks but performs even better when Parallelism is thrown in as seen in the Doom Vulkan results. How? By reducing the per Frame latency through the parallel executions of Graphics and Compute Tasks. A reduction in the per-frame latency means that each frame takes less time to execute and process. The net result is a higher frame rate. 3DMark lacks this. AotS makes use of both parallelism and concurrency... as does Doom with the new Vulkan patch. See below...
> 
> 
> If 3D Mark Time Fly had implemented a separate path and enabled both concurrency and parallelism for the FuryX... it would have caught up to the GTX 1070. No joke.
> 
> If both AMD and nVIDIA are running the same code then Pascal would either gain a tiny bit or even lose performance. This is why Bethesda did not enable the Asynchronous Compute + Graphics from the AMD path for Pascal. Instead... Pascal will get its own optimized path. They will also call it Asynchronous Compute... people will think it is the same thing when in reality... two completely different things are happening behind the scene.
> 
> See why understanding what is actually happening behind the scenes is important rather than just looking at numbers? Not all Asynchronous Compute implementations are equal. You would do well to take note of this.
> 
> Where are the tech journalists these days?


Partially wrong explanations, From AMD itself,
Async is "Concurrent execution from parallel queues"
https://www.youtube.com/watch?v=v3dUhep0rBs
This is what NVIDIA also does
The only difference NVIDIA implements it form dynamic load balancer and AMD in form of Async Shaders
Async Compute is parallel queues in Concurrent execution not Parallel execution
NVIDIA too has separate buffers/queue for compute and graphics

So overall Async Compute does concurrent execution


----------



## deadman3000

Futuremark developer responds to accusations of Time Spy cheating.

http://steamcommunity.com/app/223850/discussions/0/366298942110944664/


----------



## Remij

Quote:


> Originally Posted by *deadman3000*
> 
> Futuremark developer responds to accusations of Time Spy cheating.
> 
> http://steamcommunity.com/app/223850/discussions/0/366298942110944664/


What an absolutely embarrassing thread. I knew this would happen. No amount of work put in by developers will ever be enough if AMD is not on top or shows the improvements they think Async should bring.


----------



## pengs

Quote:


> Originally Posted by *Glottis*
> 
> I hope you are joking. Pretty much every time new AMD drivers is released all big sites re-run benchmarks and publish gains. *Nvidia hardly ever get same treatment*.











You can't be serious....
Release day benchmarks are all that matter. Release day makes headlines, gets attention and burns a perception about how the game performs into gamers heads. Performance enhancements 'after the fact' are only interesting to the enthusiasts and a small portion of people.


----------



## NuclearPeace

Quote:


> Originally Posted by *deadman3000*
> 
> Futuremark developer responds to accusations of Time Spy cheating.
> 
> http://steamcommunity.com/app/223850/discussions/0/366298942110944664/


That whole thread pretty much sums up everything I dislike about how we discuss hardware today.

You have the developer of the game clearly saying that the software supports asynchronous compute and explains how it effects both AMD and NVIDIA cards. The developer also explains why NVIDIA cards dont have a performance penalty. His explanations of the results people are finding in Time Spy are collaborated by other developers who also found little gains out of using Asynchronous compute.

But this cant be! All of those equally as uninformed Youtube pundits with no industry experience told me that Asynchronous compute was supposed to roflstomp NVIDIA! Obviously that makes the benchmarks biased since I didnt see the results. Its really, really sad how people these days are more interested in filling their head with garbage so that they can justify their purchases.


----------



## Slomo4shO

3DMark has always favored Nvidia cards, why it is worthy of discussion all of a sudden?


----------



## magnek

Quote:


> Originally Posted by *NuclearPeace*
> 
> That whole thread pretty much sums up everything I dislike about how we discuss hardware today.
> 
> You have the developer of the game clearly saying that the software supports asynchronous compute and explains how it effects both AMD and NVIDIA cards. The developer also explains why NVIDIA cards dont have a performance penalty. His explanations of the results people are finding in Time Spy are collaborated by other developers who also found little gains out of using Asynchronous compute.
> 
> But this cant be! All of those equally as uninformed Youtube pundits with no industry experience told me that Asynchronous compute was supposed to roflstomp NVIDIA! Obviously that makes the benchmarks biased since I didnt see the results. Its really, really sad how people these days are more interested in filling their head with garbage so that they can justify their purchases.


Well db_smooth et al are already knee deep in decompiled code, and will totally prove 100% that this is a Nvidia Tailored benchmark and is using load balancing and not cocurrency.

So let's just wait for the big reveal.


----------



## orlfman

http://steamcommunity.com/app/223850/discussions/0/366298942110944664/
Quote:


> FM_Jarnis [developer] 4 hours ago
> Yes it does.
> 
> http://www.futuremark.com/downloads/3DMark_Technical_Guide.pdf
> 
> It was not tailored for any specific architecture. It overlaps different rendering passes for asynchronous compute, in paraller when possible. Drivers determine how they process these - multiple paraller queues are filled by the engine.
> 
> The reason Maxwell doesn't take a hit is because NVIDIA has explictly disabled async compute in Maxwell drivers. So no matter how much we pile things to the queues, they cannot be set to run asynchronously because the driver says "no, I can't do that". Basically NV driver tells Time Spy to go "async off" for the run on that card. If NVIDIA enables Asynch Compute in the drivers, Time Spy will start using it. Performance gain or loss depends on the hardware & drivers.


Quote:


> Edit: Quoting 3DMark Technical guide
> Asynchronous Compute
> With DirectX 11, all rendering work is executed in one queue with the driver deciding the order of the tasks.
> 
> With DirectX 12, GPUs that support asynchronous compute can process work from multiple queues in parallel.
> 
> There are three types of queue: 3D, compute, and copy. A 3D queue executes rendering commands and can also handle other work types. A compute queue can handle compute and copy work. A copy queue only accepts copy operations. The queues all race for the same resources so the overall benefit depends on the workload.
> 
> In Time Spy, asynchronous compute is used heavily to overlap rendering passes to maximize GPU utilization. The asynchronous compute workload per frame varies between 10 - 20%.
> 
> To observe the benefit on your own hardware, you can optionally choose to disable async compute using the Custom run settings.


so 3dmark is using proper async. the graphic drivers determine how async will be processed. amd and nvidia just do async differently. amd i guess is pure hardware while nvidia is 50/50? software and hardware?

edit:
all these people claiming 3dmark favors nvidia... i really don't get it. 3dmark if anything proved amd's strength with async. the 200 series and 300 / fury series all have HIGHER gains than nvidia with async. 900 series async is disabled and pascal only has small gains. amd on the other hand has large gains.

this does nothing but benefit amd. if amd released their 1070 / 1080 counter parts they would of most likely topped the charts.


----------



## Kpjoslee

Quote:


> Originally Posted by *magnek*
> 
> Well db_smooth et al are already knee deep in decompiled code, and will totally prove 100% that this is a Nvidia Tailored benchmark and is using load balancing and not cocurrency.
> 
> So let's just wait for the big reveal.


who are they anyways?







Their steam profile doesn't reveal anything.


----------



## Slomo4shO

Quote:


> Originally Posted by *magnek*
> 
> Well db_smooth et al are already *knee deep* in decompiled code


Such a waste of paper


----------



## magnek

Quote:


> Originally Posted by *Kpjoslee*
> 
> who are they anyways?
> 
> 
> 
> 
> 
> 
> 
> Their steam profile doesn't reveal anything.


Obviously programmers who know their stuff duh.
Quote:


> Originally Posted by *Slomo4shO*
> 
> Such a waste of paper


It's for a noble cause.


----------



## Remij

Quote:


> Originally Posted by *Kpjoslee*
> 
> who are they anyways?
> 
> 
> 
> 
> 
> 
> 
> Their steam profile doesn't reveal anything.


Probably AMD staff that are going to go through every line of code in every benchmark/game ever to find ways that they should have taken advantage of GCN hardware and try to tarnish developers names in the process.


----------



## SuperZan

Totally. I forget, are they in financial trouble or a monolithic organisation able to spare NSA-level resources investigating and discrediting their rivals. If the latter, why haven't these coordinated campaigns dented their -actual- rivals?


----------



## Bauxno

But then they should have say they are using Nvidia async on x part and AMD async on y part and give option to force one or the other on both hardware. If is done that way then we will not know if of a 30sec bench how much is async from NVidia and how much from AMD.


----------



## GorillaSceptre

Quote:


> Originally Posted by *SuperZan*
> 
> Totally. I forget, are they in financial trouble or a monolithic organisation able to spare NSA-level resources investigating and discrediting their rivals. If the latter, why haven't these coordinated campaigns dented their -actual- rivals?


Don't underestimate AMD.. They can investigate Intel and Nvidia in parallel.


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Don't underestimate AMD.. They can investigate Intel and Nvidia in parallel.


But Nvidia already begun their pre-emptive attack


----------



## GorillaSceptre

Quote:


> Originally Posted by *Kpjoslee*
> 
> But Nvidia already begun their pre-emptive attack


----------



## Remij

Quote:


> Originally Posted by *SuperZan*
> 
> Totally. I forget, are they in financial trouble or a monolithic organisation able to spare NSA-level resources investigating and discrediting their rivals. If the latter, why haven't these coordinated campaigns dented their -actual- rivals?


If random forum members can do it, why not they themselves?


----------



## magnek

Are you actually taking what they said at face value?


----------



## SuperZan

Sure, they can spare someone to look at some code. Seriously doubt theyre investing in a smear campaign. I could be wrong but I really doubt it. Nobody's ever needed one to find flaws in a benchmark.


----------



## daunow

bias everywhere


----------



## Remij

Quote:


> Originally Posted by *magnek*
> 
> Are you actually taking what they said at face value?


lol no









Most of my posts on this topic haven't been serious and admittedly were a bit too troll-y. I deserved to have them deleted in the Futuremark thread.

Quite honestly, I think this is the first time in a long time that good competition has started to brew, and it's this kind of competition that pushes technology forward. AMD really needs this right now, and AMD is needed to keep Nvidia in check.

I was being serious about the fact that I think AMD fans will definitely be really vocal about their perceived injustice if games/benchmarks don't show the improvements they think these new APIs should bring.

Unfortunately, the architectures are currently very different and as such, developers will naturally have to make choices to benefit their market. This is normal, and AMD has a good start with all consoles having their hardware. As we see Nvidia adopt a similar architecture, then we can get back to focusing on which architecture is better instead of the shady area of blaming developers.


----------



## PontiacGTX

also vulkan/opengl can be monitored at low level like DX11/DX12 to see how asynchronous compute works?


----------



## sugarhell

Quote:


> Originally Posted by *PontiacGTX*
> 
> also vulkan/opengl can be monitored at low level like DX11/DX12 to see how asynchronous compute works?


What that even means?

You can create asynchronous compute with dx12/vulcan. Do you expect that they cant debug them?


----------



## PontiacGTX

Quote:


> Originally Posted by *sugarhell*
> 
> What that even means?
> 
> You can create asynchronous compute with dx12/vulcan. Do you expect that they cant debug them?


I mean if you can check how the game queues are done like GPUView allows


----------



## magnek

Quote:


> Originally Posted by *Remij*
> 
> lol no
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Most of my posts on this topic haven't been serious and admittedly were a bit too troll-y. I deserved to have them deleted in the Futuremark thread.
> 
> Quite honestly, I think this is the first time in a long time that good competition has started to brew, and it's this kind of competition that pushes technology forward. AMD really needs this right now, and AMD is needed to keep Nvidia in check.
> 
> I was being serious about the fact that I think AMD fans will definitely be really vocal about their perceived injustice if games/benchmarks don't show the improvements they think these new APIs should bring.
> 
> Unfortunately, the architectures are currently very different and as such, developers will naturally have to make choices to benefit their market. This is normal, and AMD has a good start with all consoles having their hardware. As we see Nvidia adopt a similar architecture, then we can get back to focusing on which architecture is better instead of the shady area of blaming developers.


lol I see

As for perceived injustices, I think it's fair to say people shouldn't expect more than ~15% improvement with async, given the results we've seen with AotS. But I can agree that nobody should have the expectation that Fury X is going to suddenly see 60% gains in every title across the board.


----------



## Eroticus

Quote:


> Originally Posted by *magnek*
> 
> lol I see
> 
> As for perceived injustices, I think it's fair to say people shouldn't expect more than ~15% improvement with async, given the results we've seen with AotS. But I can agree that nobody should have the expectation that Fury X is going to suddenly see 60% gains in every title across the board.


Ashes of the Singularity

31 > 41 = +32%

DX11
http://i.imgur.com/BsE3XRH.jpg

DX12
http://i.imgur.com/noR7OWl.jpg

Total Warhammer DX11VS DX12

290x +[1080p Ultra Settings] >54 > 81 = +49.9% Performance gain.

DX11
http://i.imgur.com/mgmozPw.jpg

DX12
http://i.imgur.com/YF5fBgG.jpg


----------



## xboxshqip

Ok wait a second now, since 3dmar invalidates our score whoever we disables tess in driver, why is no the same with async on Maxwell?
That's a driver cheat too.


----------



## Remij

Quote:


> Originally Posted by *Eroticus*
> 
> Ashes of the Singularity
> 
> 31 > 41 = +32%
> 
> DX11
> http://i.imgur.com/BsE3XRH.jpg
> 
> DX12
> http://i.imgur.com/noR7OWl.jpg
> 
> Total Warhammer DX11VS DX12
> 
> 290x +[1080p Ultra Settings] >49.9% Performance gain.
> 
> http://i.imgur.com/YF5fBgG.jpg
> 
> http://i.imgur.com/mgmozPw.jpg


See this is the problem. First off, people think DX11 = async off, and DX12 = async on. The reason those gains are as such is due to much more than async.. so it's not even comparable... unless there is a test where you can specifically turn async on/off. And even then it might not be 'fully' on or off.

Async utilization will naturally vary between games/benchmarks and even GPUs within the GCN architecture. It's about constantly keeping the GPU fed, lower end GPUs with less resources sitting idle will not see the same potential benefit as a higher end GPU with more CUs to spare to handle things in parallel. So it's not just a case of turning it on or off and expecting gains. It's going to be highly specific to each game and how the developers choose to offload and balance the work distribution to the benefit the game's performance.

Also, it's not so trivial and developers are themselves learning as they go. So blaming developers by citing improvements from other games in completely different scenarios is dubious.


----------



## DeathMade

Timespy Graphics test 1 Async view. R9 290.


----------



## dagget3450

Quote:


> Originally Posted by *xboxshqip*
> 
> Ok wait a second now, since 3dmar invalidates our score whoever we disables tess in driver, why is no the same with async on Maxwell?
> That's a driver cheat too.


Very interesting way to look at it...


----------



## magnek

Quote:


> Originally Posted by *Eroticus*
> 
> Ashes of the Singularity
> 
> 31 > 41 = +32%
> 
> DX11
> http://i.imgur.com/BsE3XRH.jpg
> 
> DX12
> http://i.imgur.com/noR7OWl.jpg
> 
> Total Warhammer DX11VS DX12
> 
> 290x +[1080p Ultra Settings] >54 > 81 = +49.9% Performance gain.
> 
> DX11
> http://i.imgur.com/mgmozPw.jpg
> 
> DX12
> http://i.imgur.com/YF5fBgG.jpg


No no no no no that's not how you calculate gains from async compute. You look at DX12 async on vs off, NOT DX11 vs DX12, which just completely muddles the data.

Here, look at these results: https://www.computerbase.de/2016-02/ashes-of-the-singularity-directx-12-amd-nvidia/3/

1080p:

Fury X DX12 async on: 67.9 FPS
Fury X DX12 async off: 61.0 FPS
*Async gain = 11.3%*

1440p:

Fury X DX12 async on: 70.9 FPS
Fury X DX12 async off: 64.8 FPS
*Async gain = 9.4%*

4K:

Fury X DX12 async on: 57.3 FPS
Fury X DX12 async off: 51.0 FPS
*Async gain = 12.4%*

Average gain: 11%


----------



## Eroticus

Quote:


> Originally Posted by *magnek*
> 
> No no no no no that's not how you calculate gains from async compute. You look at DX12 async on vs off, NOT DX11 vs DX12, which just completely muddles the data.
> 
> Here, look at these results: https://www.computerbase.de/2016-02/ashes-of-the-singularity-directx-12-amd-nvidia/3/
> 
> 1080p:
> 
> Fury X DX12 async on: 67.9 FPS
> Fury X DX12 async off: 61.0 FPS
> *Async gain = 11.3%*
> 
> 1440p:
> 
> Fury X DX12 async on: 70.9 FPS
> Fury X DX12 async off: 64.8 FPS
> *Async gain = 9.4%*
> 
> 4K:
> 
> Fury X DX12 async on: 57.3 FPS
> Fury X DX12 async off: 51.0 FPS
> *Async gain = 12.4%*
> 
> Average gain: 11%


Oh great, thanks...

Can you run same game ? with same settings ? you should get 20% performance increase minimum, no ? or something else is broken?


----------



## magnek

Why don't you tell me what issues you have with the data I presented? And no "it doesn't show 20%+ gains so they did something wrong" is NOT a valid reason.


----------



## JackCY

Well there is a lot of confusion about what async is and isn't and you really should look it up and educate.
There are ways like 3DMark and some DX12 titles that just scratch the surface of possible parallelism which gives tiny boost to Paxwell and GCN.
And then you have Doom with Vulkan and apparently a lot of parallelism which gives GCN massive boost and utilizes the GPUs properly = fully.

Here is something to read and see the limits of various architectures.


----------



## Slomo4shO

Quote:


> Originally Posted by *magnek*
> 
> No no no no no that's not how you calculate gains from async compute.
> 
> 
> Spoiler: Data
> 
> 
> 
> 1080p:
> 
> Fury X DX12 async on: 67.9 FPS
> Fury X DX12 async off: 61.0 FPS
> *Async gain = 11.3%*
> 
> 1440p:
> 
> Fury X DX12 async on: 70.9 FPS
> Fury X DX12 async off: 64.8 FPS
> *Async gain = 9.4%*
> 
> 4K:
> 
> Fury X DX12 async on: 57.3 FPS
> Fury X DX12 async off: 51.0 FPS
> *Async gain = 12.4%*
> 
> Average gain: 11%


Please leave logic at the door henceforth.


----------



## magnek

Quote:


> Originally Posted by *JackCY*
> 
> Well there is a lot of confusion about what async is and isn't and you really should look it up and educate.
> There are ways like 3DMark and some DX12 titles that just scratch the surface of possible parallelism which gives tiny boost to Paxwell and GCN.
> And then you have Doom with Vulkan and apparently a lot of parallelism which gives GCN massive boost and utilizes the GPUs properly = fully.


Do you not realize the absurdity of trying to calculate async compute gains *across completely different APIs*, some of which don't even support the function in the first place?

The only proper way to do it is to use the same API, then toggle async on and off, and see what the difference ends up being.


----------



## criminal

Quote:


> Originally Posted by *Eroticus*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Ashes of the Singularity
> 
> 31 > 41 = +32%
> 
> DX11
> http://i.imgur.com/BsE3XRH.jpg
> 
> DX12
> http://i.imgur.com/noR7OWl.jpg
> 
> Total Warhammer DX11VS DX12
> 
> 290x +[1080p Ultra Settings] >54 > 81 = +49.9% Performance gain.
> 
> DX11
> http://i.imgur.com/mgmozPw.jpg
> 
> DX12
> http://i.imgur.com/YF5fBgG.jpg


DX11 = AMD driver overhead sucks = Lower performance than it should be
DX12 = AMD driver overhead fixed = Correct performance
DX12 with Async = AMD driver overhead fixed + takes advantage of AMD strength = Better performance

Regarding Vulcan... that can be chalked up to AMD finally having good drivers instead of the crap OpenGL drivers.


----------



## Eroticus

Quote:


> Originally Posted by *magnek*
> 
> Why don't you tell me what issues you have with the data I presented? And no "it doesn't show 20%+ gains so they did something wrong" is NOT a valid reason.




You just said ASYNC = 5~15%.

So why did Nvidia lost performance ?

While Fury X gained that much ?

Why did i gained that much ? why nvidia lost permanence everywhere ... even with next gen gpus .... ?


----------



## GorillaSceptre

Quote:


> Originally Posted by *magnek*
> 
> Do you not realize the absurdity of trying to calculate async compute gains *across completely different APIs*, some of which don't even support the function in the first place?
> 
> The only proper way to do it is to use the same API, then toggle async on and off, and see what the difference ends up being.


Quote:


> Originally Posted by *criminal*
> 
> DX11 = AMD driver overhead sucks = Lower performance than it should be
> DX12 = AMD driver overhead fixed = Correct performance
> DX12 with Async = AMD driver overhead fixed + takes advantage of AMD strength = Better performance


Yup.

Async is just the cherry on top.. It won't give massive boosts unless more of the game is done in compute and then done concurrently (afaik). Doom takes advantage of Vulkan really well, there's also many more things involved besides async, Shader Intrinsic Functions being another which Doom also takes advantage of.


----------



## criminal

Quote:


> Originally Posted by *Eroticus*
> 
> 
> 
> You just said ASYNC = 5~15%.
> 
> So why did Nvidia lost performance ?
> 
> While Fury X gained 22%
> 
> 22%-ASYNC=10%+/- without ASYNC ?
> 
> What happened to nvidias extra 10% ? so not only ASYNC is broken ?


Lost performance? That looks like margin of error to me.


----------



## Mahigan

Actually.. 3DMArk are wrong and we should all know this by now from our interactions with Kollock. Async Compute + Graphics is not enabled in the nVIDIA driver for their Maxwell cards... yet if you enable it in AotS... what happens? You get a performance loss attributed to the Synchronization points implemented in the AotS path. Try it yourself. This is because AotS is making use of Asynchronous Compute + Graphics which is both parallel and concurrent. Each time a parallel workload is requested... there is a synch point between the Graphics and Compute contexts involved in that workload.

Since we do not see this loss in performance for a GTX 980 Ti under 3DMark Time spy then obviously it is not doing Parallel executions because if it were there would be Synch points adverly affecting Maxwell performance whn Async is turned on (even if the driver does not support the feature).

This was my argument. It stands.

There are two other possibilities one is if the Graphics and Compute workloads were specifically tailored to take the same amount of time to complete and this would make 3DMark unreliable as a Gaming performance metric tool due to the fact that games do not behave in this manner.

The other is if a special Maxwell path was created within the benchmark which did not include the synch points but this would be misleading as the benchmark allows you to "turn on" asynch compute for the Maxwell GPUs. In any other game in which we can turn on Asynch compute or disable it... Maxwell loses performance due to these synch points. Kollock had a post explaining it all.


----------



## magnek

Quote:


> Originally Posted by *Eroticus*
> 
> 
> 
> You just said ASYNC = 5~15%.
> 
> So why did Nvidia lost performance ?
> 
> While Fury X gained that much ?
> 
> Why did i gained that much ? why nvidia lost permanence everywhere ... even with next gen gpus .... ?


Quote:


> Originally Posted by *magnek*
> 
> *Do you not realize the absurdity of trying to calculate async compute gains across completely different APIs, some of which don't even support the function in the first place?*
> 
> The only proper way to do it is to use the same API, then toggle async on and off, and see what the difference ends up being.


Quote:


> Originally Posted by *criminal*
> 
> DX11 = AMD driver overhead sucks = Lower performance than it should be
> DX12 = AMD driver overhead fixed = Correct performance
> DX12 with Async = AMD driver overhead fixed + takes advantage of AMD strength = Better performance
> 
> Regarding Vulcan... that can be chalked up to AMD finally having good drivers instead of the crap OpenGL drivers.


----------



## Eroticus

Quote:


> Originally Posted by *criminal*
> 
> Lost performance? That looks like margin of error to me.


https://www.youtube.com/watch?v=RqK4xGimR7A

Sorry but did you saw any gains ? 3 games 3 Resolutions

Lost fps everywhere , gained only in 3DMARK.

Conclusion:
3DMARK is right, Game developers wrong.


----------



## sugarhell

I will compare dx8 and dx12 to check these async compute bonus gains.

It should be valid, right?


----------



## SuperZan

Quote:


> Originally Posted by *criminal*
> 
> DX11 = AMD driver overhead sucks = Lower performance than it should be
> DX12 = AMD driver overhead fixed = Correct performance
> DX12 with Async = AMD driver overhead fixed + takes advantage of AMD strength = Better performance
> 
> Regarding Vulcan... that can be chalked up to AMD finally having good drivers instead of the crap OpenGL drivers.


Perfecto.

Quote:


> Originally Posted by *Mahigan*





Spoiler: Warning: Spoiler!






> Actually.. 3DMArk are wrong and we should all know this by now from our interactions with Kollock. Async Compute + Graphics is not enabled in the nVIDIA driver for their Maxwell cards... yet if you enable it in AotS... what happens? You get a performance loss attributed to the Synchronization points implemented in the AotS path. Try it yourself. This is because AotS is making use of Asynchronous Compute + Graphics which is both parallel and concurrent. Each time a parallel workload is requested... there is a synch point between the Graphics and Compute contexts involved in that workload.
> 
> Since we do not see this loss in performance for a GTX 980 Ti under 3DMark Time spy then obviously it is not doing Parallel executions because if it were there would be Synch points adverly affecting Maxwell performance whn Async is turned on (even if the driver does not support the feature).
> 
> This was my argument. It stands.
> 
> There are two other possibilities one is if the Graphics and Compute workloads were specifically tailored to take the same amount of time to complete and this would make 3DMark unreliable as a Gaming performance metric tool due to the fact that games do not behave in this manner.
> 
> The other is if a special Maxwell path was created within the benchmark which did not include the synch points but this would be misleading as the benchmark allows you to "turn on" asynch compute for the Maxwell GPUs. In any other game in which we can turn on Asynch compute or disable it... Maxwell loses performance due to these synch points. Kollock had a post explaining it all.






This all makes sense, and it doesn't mean that 'Nvidia cheated/paid off the devs' or anything like that, but is simply a flaw in the benchmark. All benchmarks have them, so it's silly when some of the people in these threads try to extrapolate data that the benchmark just doesn't support due to its design/execution.


----------



## EightDee8D

Just look at nvidia's stance on 4way sli only being supported on benchmarks like this, they care about benchmarks more than actual games because it sells more gpu.

that's why they made sure it's using async in a way they can utilize idle shaders, but just that. that's why it has lesser gains than gcn.
it's not an indicator of actual gaming performance in future titles. it's only here for fanboys to say they support async too and pascal is future proof and good buy over amd. but we know reality.

If they actually support it, why not enable it on actual games that people actually play ? heck even their sponsored game doesn't support it. and maxwell, lol.


----------



## magnek

Quote:


> Originally Posted by *sugarhell*
> 
> I will compare dx8 and dx12 to check these async compute bonus gains.
> 
> It should be valid, right?


No you have to use DX9 because AMD only fixed frame pacing all the way back to DX9.


----------



## Mahigan

Here is what Kollock said...
Quote:


> *The tasks have fences and signals on them, and I believe as part of the D3D12 specification a fence ends up flushing the GPU, which could mean a 100 us stall or so. TLDR is that adding a command to a compute queue or any queue , or rather the act of synchronizing it will have a tiny bit of overhead. Thus, if the hardware doesn't have some intrinsic gain from doing it parallel, you'll likely end up with a tiny loss. Even an architecture that can do them in parrelel will likely lose a little bit. It's just that the net gain is more then the loss.*
> 
> Unfortunately, fences in D3D12 are a bit expensive because they operate at an OS level. I don't believe Vulkan would have this limitation. Fine grained synchronization probably won't be expensive if the hardware supports it.


- Kollock
http://www.overclock.net/t/1592431/anand-ashes-of-the-singularity-revisited-a-beta-look-at-directx-12-asynchronous-shading/790#post_24969132
Quote:


> I'm talking OS level pre-emption. What you are talking about on GCN is more like hyperthreading. You have the equivalent of multiple threads being executed at hardware level, but you can't really pre-empt a specific thread. That is, the OS can't stop a task on a specific GPU queue, then switch something, then switch the old job out. It has to let the current GPU job continue on that queue. For Multi-GPU, we actually submit our command lists in sections so that the OS has a chance to swap in our present during the middle of a frame so we can flip the backbuffer. I suspect the way D3D12 works is that the GPU creates a CPU interrupt when a queue signals something, then the Windows Kernal submits the next task to the GPU for completion if there is a ->Wait sitting on the queue. You could see that this round trip to the CPU back to the GPU could slow it down a tiny bit, because *AFAIK the GPU is required to flush before continuing. You can also see why MS made this setup - because now dozens of applications could actually be giving work to the GPU, and theoretically the OS could schedule them all. Windows is more then a game OS, afterall.*
> 
> *Yes, you are correct about what happens with async off. We submit the tasks to the universal queue and then don't bother submitting fences and signals because we don't need.*
> 
> *GCN has multiple queues, which can execute in parallel, but that's not the same thing as pre-emption. Time slicing on GCN occurs on a hardware scheduler, and GCN can actually synchronize at a very fine grain. The best way to think about it the ACEs basically look like extra GPUs. However, D3D12 synchronization primitives are at an OS level, the hardware doesn't actually see them. I think they did it this way so that it's more unified across different hardware types. AFAIK, Fences and signals aren't actually directly visible to the driver. TLDD: GCN can actually do much more fine grained synchronization then D3D12 allows right now.
> *
> *All of this can be seen by an ETW trace of Ashes, you can actually see tiny GPU stalls on GCN where the signals happen. I'd guess we are losing 2-3% perf because of it, but that's the price you pay for being on a multi-tasking OS.*


- Kollock
http://www.overclock.net/t/1592431/anand-ashes-of-the-singularity-revisited-a-beta-look-at-directx-12-asynchronous-shading/820#post_24970191


----------



## Mahigan

Quote:


> Originally Posted by *SuperZan*
> 
> Perfecto.
> 
> This all makes sense, and it doesn't mean that 'Nvidia cheated/paid off the devs' or anything like that, but is simply a flaw in the benchmark. All benchmarks have them, so it's silly when some of the people in these threads try to extrapolate data that the benchmark just doesn't support due to its design/execution.


It does not mean that 3D Mark cheated. Some people love to take what I say out of context and that includes AMD fans. What it means is that 3D Mark chose nVIDIAs implementation over AMDs. This will likely not be the case for games because games are using AMDs implementation for the consoles.


----------



## Slomo4shO

Quote:


> Originally Posted by *Eroticus*
> 
> Conclusion:
> 3DMARK is right, Game developers wrong.


That is a perfectly valid conclusion to draw from the available data.


















Wondering how long it's going to take to sink in that DX11 --> DX12 or OpenGL --> Vulcan doesn't equate to no async compute --> async compute


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> It does not mean that 3D Mark cheated. Some people love to take what I say out of context and that includes AMD fans. What it means is that 3D Mark chose nVIDIAs implementation over AMDs. This will likely not be the case for games because games are using AMDs implementation for the consoles.


That is too early to judge atm since we do not know precisely how it runs on AMD gpus and how it runs on Nvidia gpus. AMD gpus still gains more from turning on Asyc compute than off.
For Maxwell, they will probably have to release a statement since async doesn't function on Maxwell even with Async on because it is disabled on Maxwell driver side. Since it gives out inaccurate result.


----------



## magnek

Quote:


> Originally Posted by *Slomo4shO*
> 
> Wondering how long it's going to take to sink in that DX11 --> DX12 or OpenGL --> Vulcan doesn't equate to no async compute --> async compute


Something about Brawndo and electrolytes


----------



## sugarhell

Doom support intrinsic shader functions. It is a bigger feature than async support for GCN gpus. I bet intrinsic shader functions + async compute is the big reason that gcn performs so well on Doom.

We cant compare 3dmarks and Doom no matter what.


----------



## Mahigan

Quote:


> Originally Posted by *Kpjoslee*
> 
> That is too early to judge atm since we do not know precisely how it runs on AMD gpus and how it runs on Nvidia gpus. AMD gpus still gains more from turning on Asyc compute than off.
> For Maxwell, they will probably have to release a statement since async doesn't function on Maxwell even with Async on because it is disabled on Maxwell driver side. Since it gives out inaccurate result.


AMD would still gain from using nVIDIAs implementation but it simply would not enjoy the same potential gains. We still have Asynchronous Compute (execution of tasks without a defined order) happening in a concurrent manner. What we do not appear to have is the parallel execution of Graphics and Compute tasks making use of a Synchronization point (fence). If a Fence were used... we would see a pretty hefty performance loss on Maxwell generation GPUs. Of course this pretty hefty performance loss would be determined by the amount of parallel tasks being submitted as each task requires a fence and the more fences... the more GPU idle time (stalls) are introduced.

3DMark claim that they make heavy use of Async Compute (20-30%)... so this sort of does not play into there being only a few parallel tasks per frame.

If you look at the Pascal gains under AotS (especially under the crazy preset) and you compare them to the gains under 3DMark Time Spy you start to see that 3DMark Time Spy is not very realistic because... well... AotS is an actual game. The same can be said for Doom under Vulkan. The Async gains are quite hefty for GCN yet tiny in comparison when running 3DMark Time Spy. Of course Vulkan does not have this OS level pre-emption that Kollock was talking about. Vulkan is a better gaming API than DX12 overall.

We saw similar behavior under Total War Warhammer (lower pascal gains) and we will likely keep seeing this behavior in titles which incorporate Async Compute. 3DMark Timespy just stands out as being out of the norm.

This can be explained by the type of implementation being used by 3DMark... you are right though... only time will tell what is going on. I could be wrong and 3DMark were magically able to enable Async Compute without a performance loss for Maxwell. That would be an industry first.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> AMD would still gain from using nVIDIAs implementation but it simply would not enjoy the same potential gains. We still have Asynchronous Compute (execution of tasks without a defined order) happening in a concurrent manner. What we do not appear to have is the parallel execution of Graphics and Compute tasks making use of a Synchronization point (fence). If a Fence were used... we would see a pretty hefty performance loss on Maxwell generation GPUs. Of course this pretty hefty performance loss would be determined by the amount of parallel tasks being submitted as each task requires a fence and the more fences... the more GPU idle time (stalls) are introduced.
> 
> 3DMark claim that they make heavy use of Async Compute (20-30%)... so this sort of does not play into there being only a few parallel tasks per frame.
> 
> If you look at the Pascal gains under AotS (especially under the crazy preset) and you compare them to the gains under 3DMark Time Spy you start to see that 3DMark Time Spy is not very realistic because... well... AotS is an actual game. The same can be said for Doom under Vulkan. The Async gains are quite hefty for GCN yet tiny in comparison when running 3DMark Time Spy. Of course Vulkan does not have this OS level pre-emption that Kollock was talking about. Vulkan is a better gaming API than DX12 overall.
> 
> We saw similar behavior under Total War Warhammer (lower pascal gains) and we will likely keep seeing this behavior in titles which incorporate Async Compute. 3DMark Timespy just stands out as being out of the norm.
> 
> This can be explained by the type of implementation being used by 3DMark... you are right though... only time will tell what is going on. I could be wrong and 3DMark were magically able to enable Async Compute without a performance loss for Maxwell. That would be an industry first.


I think you are little overestimating how much Async compute gains on AoTS n Doom. AMD already benefited a lot from going from DX11 to DX12 in AoTS, without Async, and far more going from OpenGL to Vulkan.
However, within DirectX12, turning on Async only resulted in gains of 10-15% on AoTS. Under Doom, it gained about similar percentage (From MSAA to Async enabled TSSAA). Lot of people are trying to use Doom as an example of async doing massive performance boost but most of the gains came from poor OpenGL driver to great Vulkan driver lol, not from utilizing Async.

But yeah, I think it is better to wait until we can get some analysis about how it runs on either GPUs.


----------



## Mahigan

Quote:


> Originally Posted by *Eroticus*
> 
> 
> 
> You just said ASYNC = 5~15%.
> 
> So why did Nvidia lost performance ?
> 
> While Fury X gained that much ?
> 
> Why did i gained that much ? why nvidia lost permanence everywhere ... even with next gen gpus .... ?


Yes... when running the crazy preset... Pascal gains no performance from using Async Compute in AotS. Why? Because the GPU is overloaded and there are no idling resources in each GPC in order to balance the load too and speed up the processing of a long running compute task.

Like i have said before... Pascal can only handle minute amounts of Asynchronous Compute before it becomes overloaded. The architecture is not as Parallel as GCN in that respect.


We have to remember that a GTX 1080 is made up of 4 GPCs.
Each GPC can only handle one type of Context (Graphics or Compute) and not both at the same time.
A single Async Compute Task occupies resources in two separate GPCs.
These resources (when running Asynchronous Compute + Graphics) operate under a fence.
A Fence is a synchronization point
Under Maxwell.. If a Graphics task is done processing in one GPC but the Compute task is running longer in another GPC then the GPC with the Graphics task stalls (goes idle) and waits for the Compute task in the other GPC to complete before taking on new work (this is due to the Fence which forces a synchronization point between the two). This is why Oxide disabled Asynch Compute on Maxwell and why nVIDIA removed the exposed feature from their Maxwell drivers when AotS was in Alpha stages of development (see Async Compute controversy).
Pascal improves upon this by allowing more resources in the GPC containing the Compute task to be allocated (dynamically) in order to complte the long running compute task quicker. This reduces the amount of idle time.
If there are no extra resources available in the GPC containing the long running compute task due to the GPU being under heavy load then you end up with no performance benefits from enabling Async Compute.
You can even lose performance due to the GPU stalls introduced by a fence/synchronization point which is the case for Maxwell.
This is what happens under AotS at higher resolutions under the Extreme Preset (1440p and higher) or even at 1080p under the Crazy Preset.
This should help many people understand what Pascal has as it pertains to Asynchronous Compute Capabilities (Dynamic Load Balancing).


----------



## SuperZan

Just as support to the above, https://developer.nvidia.com/dx12-dos-and-donts hints pretty much at a lot of what Mahigan has been saying in these threads. They're using a lot of the same jargon as AMD but read the full text and it's taking a different tack.


----------



## orlfman

Quote:


> Originally Posted by *Mahigan*
> 
> Actually.. 3DMArk are wrong and we should all know this by now from our interactions with Kollock. Async Compute + Graphics is not enabled in the nVIDIA driver for their Maxwell cards... yet if you enable it in AotS... what happens? You get a performance loss attributed to the Synchronization points implemented in the AotS path. Try it yourself. This is because AotS is making use of Asynchronous Compute + Graphics which is both parallel and concurrent. Each time a parallel workload is requested... there is a synch point between the Graphics and Compute contexts involved in that workload.
> 
> Since we do not see this loss in performance for a GTX 980 Ti under 3DMark Time spy then obviously it is not doing Parallel executions because if it were there would be Synch points adverly affecting Maxwell performance whn Async is turned on (even if the driver does not support the feature).
> 
> This was my argument. It stands.
> 
> There are two other possibilities one is if the Graphics and Compute workloads were specifically tailored to take the same amount of time to complete and this would make 3DMark unreliable as a Gaming performance metric tool due to the fact that games do not behave in this manner.
> 
> The other is if a special Maxwell path was created within the benchmark which did not include the synch points but this would be misleading as the benchmark allows you to "turn on" asynch compute for the Maxwell GPUs. In any other game in which we can turn on Asynch compute or disable it... Maxwell loses performance due to these synch points. Kollock had a post explaining it all.


couldn't it just be nvidia has it enabled for AoTS in the drivers but doesn't have it enabled for time spy or other things?


----------



## Potatolisk

Quote:


> Originally Posted by *Mahigan*
> 
> 3DMark claim that they make heavy use of Async Compute (20-30%)... so this sort of does not play into there being only a few parallel tasks per frame.


Didn't they say 10-20%?

Anyway would it be realistic to code a game in such way that would get the most out of GCN and Pascal? Implementing both ways?


----------



## EightDee8D

Quote:


> Originally Posted by *Potatolisk*
> 
> Didn't they say 10-20%?
> 
> Anyway would it be realistic to code a game in such way that would get the most out of GCN and Pascal? Implementing both ways?


Games already take full advantage of pascal, but not gcn. because pascal doesn't really have much to offer compared to gcn.

think of it like this -

a 100cc bike will be already maxed out on local traffic and normal roads, but 1000cc bike won't. the bigger bike needs better roads and probably a race track to show it's full potential. that doesn't mean 100cc bike will be under utilized on race track because it doesn't really have much left to show anyway.


----------



## Mahigan

Quote:


> Originally Posted by *orlfman*
> 
> couldn't it just be nvidia has it enabled for AoTS in the drivers but doesn't have it enabled for time spy or other things?


Well as Kollock explained... this is outside driver or hardware control. This is OS based. DX12 requires these fences in order for Asynchronous Compute + Graphics to operate.
Quote:


> Asynchronous compute and graphics example
> 
> This next example allows graphics to render asynchronously from the compute queue. There is still a fixed amount of buffered data between the two stages, however now graphics work proceeds independently and uses the most up-to-date result of the compute stage as known on the CPU when the graphics work is queued. This would be useful if the graphics work was being updated by another source, for example user input. There must be multiple command lists to allow the ComputeGraphicsLatency frames of graphics work to be in flight at a time, and the function UpdateGraphicsCommandList represents updating the command list to include the most recent input data and read from the compute data from the appropriate buffer.
> The compute queue must still wait for the graphics queue to finish with the pipe buffers, but a third fence (pGraphicsComputeFence) is introduced so that the progress of graphics reading compute work versus graphics progress in general can be tracked. This reflects the fact that now consecutive graphics frames could read from the same compute result or could skip a compute result. A more efficient but slightly more complicated design would use just the single graphics fence and store a mapping to the compute frames used by each graphics frame.


Here is an example of Asynchronous Compute + Graphics code provided by Microsoft...
Quote:


> void AsyncPipelinedComputeGraphics()
> {
> const UINT CpuLatency = 3;
> const UINT ComputeGraphicsLatency = 2;
> 
> // Compute is 0, graphics is 1
> ID3D12Fence *rgpFences[] = { pComputeFence, pGraphicsFence };
> HANDLE handles[2];
> handles[0] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> handles[1] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> UINT FrameNumbers[] = { 0, 0 };
> 
> ID3D12GraphicsCommandList *rgpGraphicsCommandLists[CpuLatency];
> CreateGraphicsCommandLists(ARRAYSIZE(rgpGraphicsCommandLists),
> rgpGraphicsCommandLists);
> 
> // Graphics needs to wait for the first compute frame to complete, this is the
> // only wait that the graphics queue will perform.
> *pGraphicsQueue->Wait(pComputeFence, 1);*
> 
> while (1)
> {
> for (auto i = 0; i < 2; ++i)
> {
> if (FrameNumbers _> CpuLatency)
> {
> rgpFences_->SetEventOnFenceCompletion(
> FrameNumbers _- CpuLatency,
> handles_);
> }
> else
> {
> SetEvent(handles_);
> }
> }
> 
> auto WaitResult = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
> auto Stage = WaitResult = WAIT_OBJECT_0;
> ++FrameNumbers[Stage];
> 
> switch (Stage)
> {
> case 0:
> {
> if (FrameNumbers[Stage] > ComputeGraphicsLatency)
> {
> pComputeQueue->Wait(pGraphicsComputeFence,
> FrameNumbers[Stage] - ComputeGraphicsLatency);
> }
> pComputeQueue->ExecuteCommandLists(1, &pComputeCommandList);
> pComputeQueue->Signal(pComputeFence, FrameNumbers[Stage]);
> break;
> }
> case 1:
> {
> // Recall that the GPU queue started with a wait for pComputeFence, 1
> UINT64 CompletedComputeFrames = min(1,
> pComputeFence->GetCurrentFenceValue());
> UINT64 PipeBufferIndex =
> (CompletedComputeFrames - 1) % ComputeGraphicsLatency;
> UINT64 CommandListIndex = (FrameNumbers[Stage] - 1) % CpuLatency;
> // Update graphics command list based on CPU input and using the appropriate
> // buffer index for data produced by compute.
> UpdateGraphicsCommandList(PipeBufferIndex,
> rgpGraphicsCommandLists[CommandListIndex]);
> 
> // Signal *before* new rendering to indicate what compute work
> // the graphics queue is DONE with
> pGraphicsQueue->Signal(pGraphicsComputeFence, CompletedComputeFrames - 1);
> pGraphicsQueue->ExecuteCommandLists(1,
> rgpGraphicsCommandLists + PipeBufferIndex);
> pGraphicsQueue->Signal(pGraphicsFence, FrameNumbers[Stage]);
> break;
> }
> }
> }
> }_


Source https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

Even if the driver does not have Asynchronous Compute + Graphics support and even if the feature is disabled in the driver... the game code executes the Fence/Synchronization point at the API (OS) level before the driver even has a say in the matter.


----------



## ZealotKi11er

The way I see it in DX11 3Dmark AMD was not suffering from CPU overhead. The only difference AMD has in DX12 from Nvidia is ASync advantage. That makes up the difference of 980 Ti and Fury X. The only anomaly is the Pascal cards.


----------



## Bauxno

Quote:


> Originally Posted by *Potatolisk*
> 
> Didn't they say 10-20%?
> 
> Anyway would it be realistic to code a game in such way that would get the most out of GCN and Pascal? Implementing both ways?


well maybe make two separate bench seccion one for nvidia an another for amd. I mean they can easy do that and the score for each bench is a fixed percentage on the total bench .


----------



## Master__Shake

Quote:


> Originally Posted by *flopper*
> 
> question then seeem to be to ask this, *did Nvidia pay off Futuremark then?*
> or is there another reason for this to be implemented this way and not to show graphics compute?


remember the 3dmark vantage and the great physx debacle?

yeah, probably on purpose.


----------



## mtcn77

I think the verdict of Eurogamer on Doom as, "The most well-optimized game in existence at this point" sums up the picture quite nicely. [email protected] FPS at maximum settings, what is left to criticize? Asyncronous compute shaders on Vulkan with gpu intrinsics should be the reference point for the other titles.


----------



## Mahigan

Quote:


> Originally Posted by *Potatolisk*
> 
> Didn't they say 10-20%?
> 
> Anyway would it be realistic to code a game in such way that would get the most out of GCN and Pascal? Implementing both ways?


DX12 is supposed to operate on separate paths for each architecture. The same goes for Vulkan. So yes... you can code a game to offer the best possible performance for both GCN and Pascal.

And you are right.. they did say 10-20%.. just checked..
Quote:


> The asynchronous compute workload per frame varies between 10-20%. To observe the benefit on your own hardware, you can optionally choose to disable async compute using the Custom run settings in 3DMark Advanced and Professional Editions.


That is some very light Asynchronous Compute going on...


----------



## Defoler

Quote:


> Originally Posted by *flopper*
> 
> question then seeem to be to ask this, *did Nvidia pay off Futuremark then?*
> or is there another reason for this to be implemented this way and not to show graphics compute?


It could also be that AMD paid ID to specifically tailor doom for vulkan to fit better on GCN than pascal, reasons why you see such a gain in vulkan, while 3dmark shows only a small increase.
After all, aots, hitman, rotr, and others, have been heavily AMD sponsored games. Especially since doom vulkan was also co-announced with AMD claiming that only AMD can do async at all.


----------



## Pyrotagonist

Doom Vulkan was demoed with GTX 1080 far in advance of the demo with Polaris.


----------



## sweetusernames

Quote:


> Originally Posted by *Defoler*
> 
> It could also be that AMD paid ID to specifically tailor doom for vulkan to fit better on GCN than pascal, reasons why you see such a gain in vulkan, while 3dmark shows only a small increase.
> After all, aots, hitman, rotr, and others, have been heavily AMD sponsored games. Especially since doom vulkan was also co-announced with AMD claiming that only AMD can do async at all.


NVidia unveiled Vulkan on Pascal before AMD did (Well, the Macau event was under NDA). Plus Rise of the Tomb Raider is a Nvidia sponsored title.


----------



## EightDee8D

Quote:


> Originally Posted by *Defoler*
> 
> It could also be that AMD paid ID to specifically tailor doom for vulkan to fit better on GCN than pascal, reasons why you see such a gain in vulkan, while 3dmark shows only a small increase.
> After all, aots, hitman, *rotr*, and others, have been heavily AMD sponsored games. Especially since doom vulkan was also co-announced with AMD claiming that only AMD can do async at all.


Again it is a Nvidia sponsored game stop with that BS, which doesn't support async on nvidia, and how about async driver for maxwell ? they enabled it for pascal in a useless benchmark but not on actual games. it's time to ask nvidia some questions instead of spreading BS.









Where is nvidia's army of engineers ? 1 year and still no async on maxwell ? even on so called "unbiased" 3dmark benchmark it's not enabled. i think they switched sides or something


----------



## orlfman

Quote:


> Originally Posted by *Mahigan*
> 
> Well as Kollock explained... this is outside driver or hardware control. This is OS based. DX12 requires these fences in order for Asynchronous Compute + Graphics to operate.
> Here is an example of Asynchronous Compute + Graphics code provided by Microsoft...
> Source https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx
> 
> Even if the driver does not have Asynchronous Compute + Graphics support and even if the feature is disabled in the driver... the game code executes the Fence/Synchronization point at the API (OS) level before the driver even has a say in the matter.


would of been nice if amd released their 1070 / 1080 counterparts instead of holding off. would gladly send back my 1080 for it.


----------



## Defoler

Quote:


> Originally Posted by *EightDee8D*
> 
> Again it is a Nvidia sponsored game stop with that BS, which doesn't support async on nvidia, and how about async driver for maxwell ? they enabled it for pascal in a useless benchmark but not on actual games. it's time to ask nvidia some questions instead of spreading BS.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Where is nvidia's army of engineers ? 1 year and still no async on maxwell ? even on so called "unbiased" 3dmark benchmark it's not enabled. i think they switched sides or something


Crystal dynamics have been the ones developing tressfx with AMD and the game before Square Enix even had dealing with nvidia. The game was AMD's main showcase of tressfx tech.
And the deal with nvidia was about streaming support and the new GFE features as well as adding to the GPU sales, not necessarily for development.
So calling about spreading BS... you should take your own advice sometimes









And I guess it is now a "useless benchmark" because nvidia shows it can run async there? If a game showed it but a benchmark didn't, than nvidia must have payed the developers right?
Of course it can't run in a game if the developer disable it deliberately.

Where are the nvidia engineers? According to ID, they are working with them only now. Why hadn't they work with them before? Maybe because ID were working with AMD and AMD didn't want to include nvidia in it?


----------



## EightDee8D

Quote:


> Originally Posted by *Defoler*
> 
> Crystal dynamics blablabla my maxwell is obsolete mimimi




Should i also post 1080 launch stream where they showed 1080 running doom on vulkan ? or you will stop spreading bs ? a dev who has worked with amd in past doesn't mean they won't get sponsored by nvidia.

And if 3dmark can run async on pascal, why can't on maxwell ? who is stopping them to enable it on driver ? oh nothing for conspiracy now ? awww

And Aots was a game, but called out as a benchmark but nothing, 3dmark is a freaking benchmark, why does it count now ? because hypocrisy ? .lol


----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> Actually.. 3DMArk are wrong and we should all know this by now from our interactions with Kollock. Async Compute + Graphics is not enabled in the nVIDIA driver for their Maxwell cards... yet if you enable it in AotS... what happens? You get a performance loss attributed to the Synchronization points implemented in the AotS path. Try it yourself. This is because AotS is making use of Asynchronous Compute + Graphics which is both parallel and concurrent. Each time a parallel workload is requested... there is a synch point between the Graphics and Compute contexts involved in that workload.
> 
> Since we do not see this loss in performance for a GTX 980 Ti under 3DMark Time spy then obviously it is not doing Parallel executions because if it were there would be Synch points adverly affecting Maxwell performance whn Async is turned on (even if the driver does not support the feature).
> 
> This was my argument. It stands.
> 
> There are two other possibilities one is if the Graphics and Compute workloads were specifically tailored to take the same amount of time to complete and this would make 3DMark unreliable as a Gaming performance metric tool due to the fact that games do not behave in this manner.
> 
> The other is if a special Maxwell path was created within the benchmark which did not include the synch points but this would be misleading as the benchmark allows you to "turn on" asynch compute for the Maxwell GPUs. In any other game in which we can turn on Asynch compute or disable it... Maxwell loses performance due to these synch points. Kollock had a post explaining it all.


seems obvious that Futuremark are PBN and its now a worthless test of dx12.


----------



## flopper

Quote:


> Originally Posted by *Mahigan*
> 
> It does not mean that 3D Mark cheated. Some people love to take what I say out of context and that includes AMD fans. What it means is that 3D Mark chose nVIDIAs implementation over AMDs. This will likely not be the case for games because games are using AMDs implementation for the consoles.


and confirmed, PBN.








(PBN (Paid By Nvidia)

A test if it uses the wrong suit for the real world application (Games) then the test is worthless to show what the game itself is doing with the same said application aka dx12.

The only choice Futuremark have now is to remove time spy.


----------



## airfathaaaaa

so here is the trick question

how diffucult is to use nvidia way and amd way on games?(cause rest assure nvidia will pay a lot to use their way)


----------



## Dudewitbow

Quote:


> Originally Posted by *airfathaaaaa*
> 
> so here is the trick question
> 
> how diffucult is to use nvidia way and amd way on games?(cause rest assure nvidia will pay a lot to use their way)


making it more parallel is probably slightly more difficult, the only problem is if its on the console(which most of the DX12/Vulcan games are), the devs already know how to implement it(or else the consoles are going to get a really bad performance day e.g Lichdom Battlemage on console, an example of making a game on PC, the doing a poor console port), so complication isn't as big of a factor.


----------



## Semel

Quote:


> Originally Posted by *Mahigan*
> 
> This will likely not be the case for games because games are using AMDs implementation for the consoles.


Nvidia has tons of money. Make no mistake they are going to do everything to ensure their win. Like in case of Crapworks.


----------



## PontiacGTX

Quote:


> Originally Posted by *ZealotKi11er*
> 
> The way I see it in DX11 3Dmark AMD was not suffering from CPU overhead. The only difference AMD has in DX12 from Nvidia is ASync advantage. That makes up the difference of 980 Ti and Fury X. The only anomaly is the Pascal cards.


It is a synthethic benchmark it is quite a bit different to Dx11 games, even games which have overhead dont show it on all levels/maps, and some DX11 games dont have issues with DX11 draw calls limit at all


----------



## FLCLimax

Quote:


> Originally Posted by *EightDee8D*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Defoler*
> 
> Crystal dynamics blablabla my maxwell is obsolete mimimi
> 
> 
> 
> 
> 
> Should i also post 1080 launch stream where they showed 1080 running doom on vulkan ? or you will stop spreading bs ? a dev who has worked with amd in past doesn't mean they won't get sponsored by nvidia.
> 
> And if 3dmark can run async on pascal, why can't on maxwell ? who is stopping them to enable it on driver ? oh nothing for conspiracy now ? awww
> 
> And Aots was a game, but called out as a benchmark but nothing, 3dmark is a freaking benchmark, why does it count now ? because hypocrisy ? .lol
Click to expand...


----------



## Bauxno

Is the firestrike bench tuned for AMD on the tesselation part of the bench?


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> It is a synthethic benchmark it is quite a bit different to Dx11 games, even games which have overhead dont show it on all levels/maps, and some DX11 games dont have issues with DX11 draw calls limit at all


What I liked was hearing that 3DMark spokesperson claiming that the "driver" is responsible for the behavior we see in a DX12 benchmark. DX12 and Driver... let that sink-in.

This may be true for nVIDIA (due to their inclusion of static scheduling) but it most certainly is not as true for AMD (due to their hardware scheduling). Kollock stated the same thing regarding AMDs hardware scheduler).

It seems to me that optimizations are lacking for the AMD path (if there are actually separate AMD and nVIDIA paths to begin with). The programmer is the one responsible for "marking up" the tasks he/she wants executed in parallel (as the Microsoft sample code I shared shows and as Kollock explained). So if the programmer did not mark up many of these tasks for the AMD hardware then of course you are not going to receive all of the potential performance. A low amount of marked up work would fit well for nVIDIAs Pascal architecture but would end up under-utilizing GCN for the reasons I mentioned in a previous post (Pascal GPCs and Dynamic Load Balancing explanation).

All of the games we have seen "mark up" a lot more work to be executed in parallel than what 3DMark stated with their "10-20%" claim. It seems to me that 3DMark should have gone for 40% of a frame being executed in parallel for AMD (which is what AotS does) and stuck to 10-20% for nVIDIA. That way they would have two perfectly optimized paths for both architectures. This is how games are being programmed (like Ashes of the Singularity) with separate optimized paths for both AMD and nVIDIA. The kicker is that it is nVIDIAs driver which is responsible for handling the scheduling of such tasks to the nVIDIA hardware. This means that nVIDIA would incur a larger CPU overhead (as we have seen under AotS). We also see that this will be the case for nVIDIA hardware under Doom Vulkan as absent Asynchronous Compute + Graphics... the nVIDIA hardware is tied with the AMD hardware in terms of CPU overhead. Once the Async path is implemented... nVIDIAs CPU overhead will be higher as I had mentioned in my initial coverage of nVIDIAs Async Compute capabilities.

We will likely end up with a version of 3DMark which will not at all represent the performance we will be seeing in upcoming DX12 titles for AMD. I think that the nVIDIA performance is perfectly optimized though... so what we see in 3DMark perfectly highlights what we can expect from Pascal.

As for Maxwell... when Async Compute is enabled... we should be seeing a drop in performance due to the GPU stalls caused by the fences. Even if the nVIDIA driver says "No Async Compute" the fences remain. This is what Kollock mentioned and what we have seen thus far in actual games making use of the technology.


----------



## LionS7

Quote:


> Originally Posted by *Bauxno*
> 
> Is the firestrike bench tuned for AMD on the tesselation part of the bench?


No, indeed the tesselation is killing Radeon cards.


----------



## EightDee8D

I think there's not enough tessellation, otherwise 480 will be faster than 390/x. it has better tessellation performance.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> What I liked was hearing that 3DMark spokesperson claiming that the "driver" is responsible for the behavior we see in a DX12 benchmark. DX12 and Driver... let that sink-in.
> 
> This may be true for nVIDIA (due to their inclusion of static scheduling) but it most certainly is not as true for AMD (due to their hardware scheduling). Kollock stated the same thing regarding AMDs hardware scheduler).
> 
> It seems to me that optimizations are lacking for the AMD path (if there are actually separate AMD and nVIDIA paths to begin with). The programmer is the one responsible for "marking up" the tasks he/she wants executed in parallel (as the Microsoft sample code I shared shows and as Kollock explained). So if the programmer did not mark up many of these tasks for the AMD hardware then of course you are not going to receive all of the potential performance. A low amount of marked up work would fit well for nVIDIAs Pascal architecture but would end up under-utilizing GCN for the reasons I mentioned in a previous post (Pascal GPCs and Dynamic Load Balancing explanation).
> 
> All of the games we have seen "mark up" a lot more work to be executed in parallel than what 3DMark stated with their "10-20%" claim. It seems to me that 3DMark should have gone for 40% of a frame being executed in parallel for AMD (which is what AotS does) and stuck to 10-20% for nVIDIA. That way they would have two perfectly optimized paths for both architectures. This is how games are being programmed (like Ashes of the Singularity) with separate optimized paths for both AMD and nVIDIA. The kicker is that it is nVIDIAs driver which is responsible for handling the scheduling of such tasks to the nVIDIA hardware. This means that nVIDIA would incur a larger CPU overhead (as we have seen under AotS). We also see that this will be the case for nVIDIA hardware under Doom Vulkan as absent Asynchronous Compute + Graphics... the nVIDIA hardware is tied with the AMD hardware in terms of CPU overhead. Once the Async path is implemented... nVIDIAs CPU overhead will be higher as I had mentioned in my initial coverage of nVIDIAs Async Compute capabilities.
> 
> We will likely end up with a version of 3DMark which will not at all represent the performance we will be seeing in upcoming DX12 titles for AMD. I think that the nVIDIA performance is perfectly optimized though... so what we see in 3DMark perfectly highlights what we can expect from Pascal.
> 
> As for Maxwell... when Async Compute is enabled... we should be seeing a drop in performance due to the GPU stalls caused by the fences. Even if the nVIDIA driver says "No Async Compute" the fences remain. This is what Kollock mentioned and what we have seen thus far in actual games making use of the technology.


But then the overhead for DX11 AMD driver werent sorely for draw calls? because it seems the overhead you are talking about DX12 and async compute is about compute+graphics nvidia can either avoid using async compute+graphics and just focus on DX12 path. or optimize async compute only like 3dmark time spy is doing (but then this is a syncthehic benchmark, it isnt a game)
Quote:


> Originally Posted by *EightDee8D*
> 
> I think there's not enough tessellation, otherwise 480 will be faster than 390/x. it has better tessellation performance.


From anandtech


Quote:


> Under the hood, the engine only makes use of FL 11_0 features, which means it can run on video cards as far back as GeForce GTX 680 and Radeon HD 7970. At the same time it doesn't use any of the features from the newer feature levels, so while it ensures a consistent test between all cards, it doesn't push the very newest graphics features such as conservative rasterization.
> 
> That said, Futuremark has definitely set out to make full use of FL 11_0. Futuremark has published an excellent technical guide for the benchmark, which should go live at the same time as this article, so I won't recap it verbatim. But in brief, everything from asynchronous compute to resource heaps get used. In the case of async compute, Futuremark is using it to overlap rendering passes, though they do note that "the asynchronous compute workload per frame varies between 10-20%." On the work submission front, they're making full use of multi-threaded command queue submission, noting that every logical core in a system is used to submit work.


----------



## Bauxno

Quote:


> Originally Posted by *LionS7*
> 
> No, indeed the tesselation is killing Radeon cards.


so.for some unknow reason they made a code path so even when asymc is set to on maxwell card dont get the hit on perf they get on almost every game with dx12 but didnt do the same for a critical part on their dx11 bench to balance the tesselation between amd and nvidia?

That tells me a lot about this company.


----------



## EightDee8D

Quote:


> Originally Posted by *PontiacGTX*
> 
> From anandtech


I saw that, but why 480 isn't faster than 390/x ? maybe not optimized for polaris ?


----------



## LionS7

Quote:


> Originally Posted by *Bauxno*
> 
> so.for some unknow reason they made a code path so even when asymc is set to on maxwell card dont get the hit on perf they get on almost every game with dx12 but didnt do the same for a critical part on their dx11 bench to balance the tesselation between amd and nvidia?
> 
> That tells me a lot about this company.


Richard Huddy said it in the interview with PCPER. He said it about the cape of Batman in Arkham City. It was overtesselated from Nvidia to kill more performance of Radeon other then GeForce. So Huddy said that Radeon is weaker than GeForce in tesselation. You can find the interview on the official channel of the PCPER in youtube. We can find many examples if we want.


----------



## PontiacGTX

Quote:


> Originally Posted by *EightDee8D*
> 
> I saw that, but why 480 isn't faster than 390/x ? maybe not optimized for polaris ?


it isnt faster in overall?

or graphics?
Quote:


> Graphics test 1
> Graphics test 1 focuses more on rendering of transparent elements. It utilizes
> the A-buffer heavily to render transparent geometries and big particles in an
> order-independent manner. Graphics test 1 draws particle shadows for
> selected light sources. Ray-marched volumetric illumination is enabled only for
> the directional light. All post-processing effects are enabled.
> Graphics test 2
> Graphics test 2 focuses more on ray-marched volume illumination with
> hundreds of shadowed and unshadowed spot lights. The A-buffer is used to
> render glass sheets in an order-independent manner. Also, lots of small
> particles are simulated and drawn into the A-buffer. All post-processing effects
> are enabled.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> But then the overhead for DX11 AMD driver werent sorely for draw calls? because it seems the overhead you are talking about DX12 and async compute is about compute+graphics nvidia can either avoid using async compute+graphics and just focus on DX12 path. or optimize async compute only like 3dmark time spy is doing (but then this is a syncthehic benchmark, it isnt a game)


AMDs overhead was due to the parallel nature of the GCN architecture and the sequential nature of DX11. DX11 could not issue enough draw calls on a single CPU core to keep the GCN architecture fed. GCN would often stall and wait on the CPU for further instructions. This stall occured when the Ultra Dispatch Processor ran out of instructions to assign from a lack of instructions being fed to it by the Graphics Command Processor. The Graphics Command Processor stopped executing instructions due to a lack of instructions being fed to it by the CPU. Compute wise... it was the same issue. Sequentially feeding GCN with compute instructions was not an efficient way to utilize GCNs potential Compute capabilities under DX11.

With DX12... many CPU cores can feed GCN. The hardware scheduler (Ultra Threaded Dispatch Processor) takes care of the rest. As resources become available... the Ultra Threaded Dispatch Processor assigns new work to those newly freed resources. Compute wise... those resources are the Instruction buffers in each CU. FYI... These buffers have been increased in Polaris and Vega from 12 DWORD to 16 if memory serves me right. The buffers then feed instructions to the actual ALU blocks as they become available.

nVIDIA did not have this issue because the static scheduler (software) could feed many more instructions (due to the multi-threaded nature of the nVIDIA driver) and could feed larger amount of instructions per block to the Gigathread Unit found in the nVIDIA GPU architecture.

nVIDIA retains this ability under DX12. Not much has changed other than more CPU cores can potentially be used as part of the nVIDIA static scheduling mechanism (varying on how many CPU cores your CPU encompasses). Of course this also means that if you add more scheduling work (like say Asynchronous Compute + Graphics work) then you will incur larger CPU overhead than on AMD hardware due to the fact that AMD hardware has dedicated ACEs handling the task. This extra CPU overhead could become problematic in game titles which are already heavily Multi-threaded and already hammering the CPU hard. Other than that... it would simply be more CPU overhead and this could affect nVIDIAs GPU performance at lower resolutions when the CPU is more of a bottleneck than the GPU. So we might see a reversal of fortunes under certain DX12/Vulkan titles whereas nVIDIA perform worse than AMD at lower resolutions but then catch up as the resolution is increased.


----------



## Bauxno

I think is more like the 390x its brute forcing that part and the rx480 its using hardware feature to tailor that deficit on tesselation.


----------



## Mahigan

Quote:


> Originally Posted by *LionS7*
> 
> Richard Huddy said it in the interview with PCPER. He said it about the cape of Batman in Arkham City. It was overtesselated from Nvidia to kill more performance of Radeon other then GeForce. So Huddy said that Radeon is weaker than GeForce in tesselation. You can find the interview on the official channel of the PCPER in youtube. We can find many examples if we want.


Yeah... when you over tesselate you end up with tiny triangles which are smaller than a pixel yet which take up the same amount of space as a pixel in the Geometry cache. In other words... you pretty much fill up the cache with useless data which cannot be seen on the screen.

Polaris fixed this by introducing primitive discard acceleration. Basically... these small triangles are culled early enough so as to not hit the cache and/or geometry processing resources. Culled as in "deleted".

So you end up with the same image quality but remove useless junk from being rendered.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> AMDs overhead was due to the parallel nature of the GCN architecture and the sequential nature of DX11. DX11 could not issue enough draw calls on a single CPU core to keep the GCN architecture fed. GCN would often stall and wait on the CPU for further instructions. This stall occured when the Ultra Dispatch Processor ran out of instructions to assign from a lack of instructions being fed to it by the Graphics Command Processor. The Graphics Command Processor stopped executing instructions due to a lack of instructions being fed to it by the CPU. Compute wise... it was the same issue. Sequentially feeding GCN with compute instructions was not an efficient way to utilize GCNs potential Compute capabilities under DX11.
> 
> With DX12... many CPU cores can feed GCN. The hardware scheduler (Ultra Threaded Dispatch Processor) takes care of the rest. As resources become available... the Ultra Threaded Dispatch Processor assigns new work to those newly freed resources. Compute wise... those resources are the Instruction buffers in each CU. FYI... These buffers have been increased in Polaris and Vega from 12 DWORD to 16 if memory serves me right. The buffers then feed instructions to the actual ALU blocks as they become available.
> 
> nVIDIA did not have this issue because the static scheduler (software) could feed many more instructions (due to the multi-threaded nature of the nVIDIA driver) and could feed larger amount of instructions per block to the Gigathread Unit found in the nVIDIA GPU architecture.
> 
> nVIDIA retains this ability under DX12. Not much has changed other than more CPU cores can potentially be used as part of the nVIDIA static scheduling mechanism (varying on how many CPU cores your CPU encompasses). Of course this also means that if you add more scheduling work (like say Asynchronous Compute + Graphics work) then you will incur larger CPU overhead than on AMD hardware due to the fact that AMD hardware has dedicated ACEs handling the task. This extra CPU overhead could become problematic in game titles which are already heavily Multi-threaded and already hammering the CPU hard. Other than that... it would simply be more CPU overhead and this could affect nVIDIAs GPU performance at lower resolutions when the CPU is more of a bottleneck than the GPU. So we might see a reversal of fortunes under certain DX12/Vulkan titles whereas nVIDIA perform worse than AMD at lower resolutions but then catch up as the resolution is increased.


But then some games under DX11 manages to work fine with DX11 overhead without a big performance difference(probably it isnt as CPU dependent, but some like Crysis 3 and Star CItizen performs equally fine on AMD GCN and Maxwell/Kepler) but then it means game devs could equally optimize AMD performance into the game´s engines


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> But then some games under DX11 manages to work fine with DX11 overhead without a big performance difference(probably it isnt as CPU dependent, but some like Crysis 3 and Star CItizen performs equally fine on AMD GCN and Maxwell/Kepler) but then it means game devs could equally optimize AMD performance into the game´s engines


Yes

If a developer takes the time to multi-thread their engine in order to alleviate some of the load on the primary CPU thread then it can properly feed the GPU. Sadly... most developers used DX functions instead (like Multi-threaded Command listing and deferred rendering) which only recorded commands using other CPU threads but still used the primary thread to feed the GPU.

This is why Polaris increased the CU Instruction buffer from 12 to 16 DWORD values each. That way the CUs can buffer up more instructions... freeing space in the Ultra Dispatch Processor which could then take on more instructions per clk from the Graphics Command Processor which means hitting the CPU less often for instructions.


----------



## solariss

http://steamcommunity.com/app/223850/discussions/0/366298942110944664/?ctp=7#c359543951697776388
Quote:


> FM_Jarnis [developer] 1 hour ago
> Note, as a benchmark, 3DMark specifically does not have any vendor-specific code path optimizations because those would turn it an optimization contest, rather than hardware benchmark. All hardware runs the same code path. Yes, even on Maxwell. The driver determines how it handles the queues.


Well there you have it, no special branches.


----------



## PontiacGTX

Quote:


> Originally Posted by *solariss*
> 
> http://steamcommunity.com/app/223850/discussions/0/366298942110944664/?ctp=7#c359543951697776388
> Well there you have it, no special branches.


then it uses Paxwell path


----------



## Mahigan

Quote:


> Originally Posted by *solariss*
> 
> http://steamcommunity.com/app/223850/discussions/0/366298942110944664/?ctp=7#c359543951697776388
> Well there you have it, no special branches.


So it is invalid. Games will be using optimized paths for ALL hardware (as per the nVIDIA and AMD shared presentation at GDC). Since 3DMark does not follow this trend then 3DMark is not representative of DX12 gaming performance.

Thank you for sharing. I think we have now received the clarifications needed.

"3DMark Time Spy - When you like to epeen over nothing"

Great slogan









PS... when you are dabbling in APIs whole sole purpose is to offer more flexibility in terms of how you can optimize the code paths by virtue of being closer to the hardware then claiming that this "would turn it an optimization contest" is absurd. Optimizations (the ability to extract more performance out of the hardware) is the sole purpose of DX12 and Vulkan.

As per AMD and nVIDIA (as well as Microsoft)... if you do not have the time to dedicate programmers to the task of optimizing for different hardware architectures then stick to DX11.

So 3DMark... stick to DX11.

peace.


----------



## CrazyElf

Quote:


> Originally Posted by *Mahigan*
> 
> Yeah... when you over tesselate you end up with tiny triangles which are smaller than a pixel yet which take up the same amount of space as a pixel in the Geometry cache. In other words... you pretty much fill up the cache with useless data which cannot be seen on the screen.
> 
> Polaris fixed this by introducing primitive discard acceleration. Basically... these small triangles are culled early enough so as to not hit the cache and/or geometry processing resources. Culled as in "deleted".
> 
> So you end up with the same image quality but remove useless junk from being rendered.


The big advantage with Polaris is that it was able to close the triangle gap with Nvidia and the Primitive Discard Acceleration. I think that now, Gameworks should not longer have the negative impact that it does. Have there been any review websites that have tried to run Gameworks on full with the Rx480? There is no more triangle gap.

Now all we need to see is linear scaling. A 4096 SP Vega should have perhaps double the triangle output at 10.2 Gtrl/s (which is double the theoretical 5.1 Gtrl/s on the Rx 480 and more than double the 4.2 Gtrl/s on the Fury X (and the 290X)). Then the 6144 SP GPU, assuming it exists will have 15.3 Gtrl/s.

If only they increased the RBEs relative to the amount of CUs, then it would be truly awesome. As I said, I'd like to see 32 RBEs on the 4096 shader part and 48 on the 6144 shader part.

Mahigan, any other ideas of what else they've done for Vega to the CUs themselves? Do you think that they are still split into units of 2, 4, 8, and 16 then power-gated or something else? An other ways to resolve the Occupancy issues?

I'm expecting bigger changes for Vega than Polaris, because Vega is a new architecture (I expect GCN++ in a sense - versus Polaris which is more of a refinement of GCN). I'm not expecting changes as big as say, the migration of VLIW to GCN, but I am expecting some pretty big changes to GCN.

Vega just hit a milestone apparently in June, although we don't know what:
https://twitter.com/GFXChipTweeter/status/745887920809218049

Quote:


> Originally Posted by *Mahigan*
> 
> AMDs overhead was due to the parallel nature of the GCN architecture and the sequential nature of DX11. DX11 could not issue enough draw calls on a single CPU core to keep the GCN architecture fed. GCN would often stall and wait on the CPU for further instructions. This stall occured when the Ultra Dispatch Processor ran out of instructions to assign from a lack of instructions being fed to it by the Graphics Command Processor. The Graphics Command Processor stopped executing instructions due to a lack of instructions being fed to it by the CPU. Compute wise... it was the same issue. Sequentially feeding GCN with compute instructions was not an efficient way to utilize GCNs potential Compute capabilities under DX11.
> 
> With DX12... many CPU cores can feed GCN. The hardware scheduler (Ultra Threaded Dispatch Processor) takes care of the rest. As resources become available... the Ultra Threaded Dispatch Processor assigns new work to those newly freed resources. Compute wise... those resources are the Instruction buffers in each CU. FYI... These buffers have been increased in Polaris and Vega from 12 DWORD to 16 if memory serves me right. The buffers then feed instructions to the actual ALU blocks as they become available.
> 
> nVIDIA did not have this issue because the static scheduler (software) could feed many more instructions (due to the multi-threaded nature of the nVIDIA driver) and could feed larger amount of instructions per block to the Gigathread Unit found in the nVIDIA GPU architecture.
> 
> nVIDIA retains this ability under DX12. Not much has changed other than more CPU cores can potentially be used as part of the nVIDIA static scheduling mechanism (varying on how many CPU cores your CPU encompasses). Of course this also means that if you add more scheduling work (like say Asynchronous Compute + Graphics work) then you will incur larger CPU overhead than on AMD hardware due to the fact that AMD hardware has dedicated ACEs handling the task. This extra CPU overhead could become problematic in game titles which are already heavily Multi-threaded and already hammering the CPU hard. Other than that... it would simply be more CPU overhead and this could affect nVIDIAs GPU performance at lower resolutions when the CPU is more of a bottleneck than the GPU. So we might see a reversal of fortunes under certain DX12/Vulkan titles whereas nVIDIA perform worse than AMD at lower resolutions but then catch up as the resolution is increased.


Hmm, depending on how Volta pans out, wouldn't Nvidia end up with similar challenges? They would have to make something like a Command Processor + ACE equivalent to get a truly "parallel" GPU on their end. is there a way they could combine the best of both worlds? All indications are that Volta will be more like GCN than anything else Nvidia has ever made before. Unless they do something truly unexpected (maybe if they made say, CUDA somehow available for gaming somehow - not sure how that could work).

Unless there is another way to do "parallel" that we don't know about?

Of course, when that happens, I suppose it won't be Polaris or Vega that they will be fighting against, but Navi. I am still very interested to see what "Nextgen Memory" could possibly be. The other is the "scalability part" - a lot of people are speculating that it is multiple dies of GPUs put together in an MxM package that can somehow act as one, especially when combined with the interposers used on HBM. The NextGen memory might be the link between those.


----------



## Mahigan

Quote:


> Well, that was slightly inaccurate. Time Spy does not know this. Time Spy simply sends work to the card, using multiple queues. If the card chooses to run them synchronously because async is not enabled in the driver, Time Spy doesn't know that.


Umm...

It does not matter if Async is not enabled in the driver. That is not the argument. The argument is how Time Spy sends work to the GPU. You see Time Spy has to "mark" every task parallel task (Compute and Graphics) using fences (synchronization point). These fences remain in place even if the driver does not support Asynchronous Compute. These fences show up as a performance loss due to the fact that the GPU resources handling a Graphics task will still "wait" on a long running compute task. This wait is equal to part of the GPU going Idle (Idle as in not doing anything) while it waits for another task to finish.

For Maxwell... this leads to a performance loss. So with 3DMark Time Spy... where is this performance loss on Maxwell?

I mean there are ways you can avert this performance loss...

1. You specifically optimize the code for nVIDIA hardware in that you ensured that there are no long running compute tasks.
2. You keep the level of Async Compute low (which results in less fences) which is perfectly suited for nVIDIA hardware.
3. You do not run Asynchronous Compute + Graphics but instead just Asynchronous Compute (concurrent executions filling up gaps in the pipeline).

Either way... you just optimized everything for nVIDIA hardware yet at the same time claim that creating separate paths for hardware would be tantamount to leading to an optimization war. So AMD hardware received no optimizations while the entire code is optimized for nVIDIA hardware.

That does not sound like an objective way of building a benchmark.

This is precisely why AMD/nVIDIA and Microsoft stated that optimized paths are required for every GPU architecture under DX12. It was an understanding between all three parties which also stated that if you do not have the development time to afford such optimizations then sticking to DX11 would be better.

I think that 3DMark should re-work this benchmark so that it can properly represent the best case scenario for both AMD and nVIDIA.


----------



## Mahigan

Quote:


> Originally Posted by *CrazyElf*
> 
> Q. Mahigan, any other ideas of what else they've done for Vega to the CUs themselves?


A. I think that they may have implemented the power gating which is why Vega appears to have a higher perf/watt than Polaris although that could simply be due to the inclusion of HBM2 memory.
Quote:


> Originally Posted by *CrazyElf*
> 
> Q. Do you think that they are still split into units of 2, 4, 8, and 16 then power-gated or something else?


A. AMD did patent this new approach but for all we know it may only arrive with Navi (hence the scalability nomencloture).
Quote:


> Originally Posted by *CrazyElf*
> 
> Q. An other ways to resolve the Occupancy issues?


A. Not using the current GCN uarch as it stands... no.
Quote:


> Originally Posted by *CrazyElf*
> 
> Q. Hmm, depending on how Volta pans out, wouldn't Nvidia end up with similar challenges?


A. Yes. I also think that Volta will likely be including Hardware-side scheduling.
Quote:


> Originally Posted by *CrazyElf*
> 
> Q. They would have to make something like a Command Processor + ACE equivalent to get a truly "parallel" GPU on their end. is there a way they could combine the best of both worlds?


A. Yes... by increasing the size of the instructions buffers found in each SM... but then again most of the industry will have moved to DX12/Vulkan so there would be no reason to include the best of both worlds.
Quote:


> Originally Posted by *CrazyElf*
> 
> Q. Unless there is another way to do "parallel" that we don't know about?


A. Possible.


----------



## KarathKasun

Even if you optimize everything for the specific target GPU's, each will have strengths and weaknesses depending on how they handle the tasks given to them.

We have seen this argument before with the now ancient 8800 GTX vs HD 2900 XT argument (SIMD vs VLIW respectively). You cant change the fact that both approaches deal with data processing in intrinsically different ways.


----------



## Greenland

Quote:


> Originally Posted by *Mahigan*
> 
> Umm...
> 
> It does not matter if Async is not enabled in the driver. That is not the argument. The argument is how Time Spy sends work to the GPU. You see Time Spy has to "mark" every task parallel task (Compute and Graphics) using fences (synchronization point). These fences remain in place even if the driver does not support Asynchronous Compute. These fences show up as a performance loss due to the fact that the GPU resources handling a Graphics task will still "wait" on a long running compute task. This wait is equal to part of the GPU going Idle (Idle as in not doing anything) while it waits for another task to finish.
> 
> For Maxwell... this leads to a performance loss. So with 3DMark Time Spy... where is this performance loss on Maxwell?
> 
> I mean there are ways you can avert this performance loss...
> 
> 1. You specifically optimize the code for nVIDIA hardware in that you ensured that there are no long running compute tasks.
> 2. You keep the level of Async Compute low (which results in less fences) which is perfectly suited for nVIDIA hardware.
> 3. You do not run Asynchronous Compute + Graphics but instead just Asynchronous Compute (concurrent executions filling up gaps in the pipeline).
> 
> Either way... you just optimized everything for nVIDIA hardware yet at the same time claim that creating separate paths for hardware would be tantamount to leading to an optimization war. So AMD hardware received no optimizations while the entire code is optimized for nVIDIA hardware.
> 
> That does not sound like an objective way of building a benchmark.
> 
> This is precisely why AMD/nVIDIA and Microsoft stated that optimized paths are required for every GPU architecture under DX12. It was an understanding between all three parties which also stated that if you do not have the development time to afford such optimizations then sticking to DX11 would be better.
> 
> I think that 3DMark should re-work this benchmark so that it can properly represent the best case scenario for both AMD and nVIDIA.


"The application doesn't know what the driver does with the queues. Switching async compute on/off in 3DMark Time Spy doesn't really change anything because Maxwell driver does the exact same thing in both cases, so any difference is within the error margins of the test."

From FM_Jarnis.


----------



## Xuper

Question : with Async ON for Maxwell , Is it possible you get massive performance hit If driver says "No i can't do " ?


----------



## Bidz

Quote:


> Originally Posted by *Greenland*
> 
> "The application doesn't know what the driver does with the queues. Switching async compute on/off in 3DMark Time Spy doesn't really change anything because Maxwell driver does *the exact same thing in both cases*, so any difference is within the error margins of the test."
> 
> From FM_Jarnis.


So even if it's being asked to do something different it does the same?

Sounds like BS to me.


----------



## Mahigan

Quote:


> Originally Posted by *Greenland*
> 
> "The application doesn't know what the driver does with the queues. Switching async compute on/off in 3DMark Time Spy doesn't really change anything because Maxwell driver does the exact same thing in both cases, so any difference is within the error margins of the test."
> 
> From FM_Jarnis.


I am not sure if people have asked about fences or not. The fences are the basis of my argument. It does not matter if the driver switches Async Compute on or off. 3DMark has two paths. One with Async Compute and another without. When you toggle Async On... the Async path is used. When you toggle Async Off... the non-Async path is used.

The argument has to do with the fences in the Async path. Those fences would still incur a performance penalty regardless of what the nVIDIA Maxwell driver is doing. Those fences cause tiny delays... the more fences the more delays are introduced. The more delays... the lower your FPS due to the introduced latency.

What I do not see is the performance hit associated with those delays when running TimeSpy on an nVIDIA Maxwell based GPU. Those fences should be negatively affecting the performance of the Maxwell GPU as they synchronize Graphics and Compute tasks. I have already mentioned ways in which this would not be the case and those ways would be if the application was coded to favor nVIDIA hardware (utilizing short running shaders and very few Asynchronous Compute + Graphics workloads).

Since 3DMark apparently only uses a single path for both AMD and nVIDIA hardware then if that 3DMark path was coded to favor nVIDIA hardware then the benchmark is not objective. Why not? Because the AMD hardware can take even more Asynchronous Compute + Graphics workloads in order to gain even more performance than it curently being shown. Therefore if a separate path were coded for AMD... more tasks could be "marked" to run in parallel (more Graphics + Compute jobs) leading to a reduction in per frame latency (resulting in higher FPS).

The main reason both AMD and nVIDIA (as well as Microsoft) stated that separate paths are the way to go for DX12 applications is specifically due to this issue. If a developer codes in favor of nVIDIA then AMD will suffer bad performance and vice versa. In order to avert this... both architectures ought to have their own optimized paths.

This should be mentioned to that 3DMark rep on steam.


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> Question : with Async ON for Maxwell , Is it possible you get massive performance hit If driver says "No i can't do " ?


Yes... because the "fences" in the game code are still there. The driver cannot magically remove the fences. Those fences are part of the API specifications under DX12.


----------



## Greenland

So where can I find these fences, any documentation ?


----------



## lolerk52

Quote:


> Originally Posted by *Mahigan*
> 
> A. I think that they may have implemented the power gating which is why Vega appears to have a higher perf/watt than Polaris although that could simply be due to the inclusion of HBM2 memory.
> 
> A. AMD did patent this new approach but for all we know it may only arrive with Navi (hence the scalability nomencloture).
> 
> A. Not using the current GCN uarch as it stands... no.
> 
> A. Yes. I also think that Volta will likely be including Hardware-side scheduling.
> 
> A. Yes... by increasing the size of the instructions buffers found in each SM... but then again most of the industry will have moved to DX12/Vulkan so there would be no reason to include the best of both worlds.
> 
> A. Possible.


What power gating? Did I miss something?


----------



## Mahigan

Quote:


> Originally Posted by *Greenland*
> 
> So where can I find these fences, any documentation ?


https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

Straight from Microsoft. I have supplied Microsoft code samples for Asynchronous Compute + Graphics as well as quotes from Kollock (Oxide developer) in the previous pages. Have a gander.
Quote:


> void AsyncPipelinedComputeGraphics()
> {
> const UINT CpuLatency = 3;
> const UINT ComputeGraphicsLatency = 2;
> 
> // Compute is 0, graphics is 1
> ID3D12Fence *rgpFences[] = { pComputeFence, pGraphicsFence };
> HANDLE handles[2];
> handles[0] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> handles[1] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> UINT FrameNumbers[] = { 0, 0 };
> 
> ID3D12GraphicsCommandList *rgpGraphicsCommandLists[CpuLatency];
> CreateGraphicsCommandLists(ARRAYSIZE(rgpGraphicsCommandLists),
> rgpGraphicsCommandLists);
> 
> *// Graphics needs to wait for the first compute frame to complete, this is the
> // only wait that the graphics queue will perform.
> pGraphicsQueue->Wait(pComputeFence, 1);*
> 
> while (1)
> {
> for (auto i = 0; i < 2; ++i)
> {
> if (FrameNumbers _> CpuLatency)
> {
> rgpFences_->SetEventOnFenceCompletion(
> FrameNumbers _- CpuLatency,
> handles_);
> }
> else
> {
> SetEvent(handles_);
> }
> }
> 
> auto WaitResult = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
> auto Stage = WaitResult = WAIT_OBJECT_0;
> ++FrameNumbers[Stage];
> 
> switch (Stage)
> {
> case 0:
> {
> if (FrameNumbers[Stage] > ComputeGraphicsLatency)
> {
> pComputeQueue->Wait(pGraphicsComputeFence,
> FrameNumbers[Stage] - ComputeGraphicsLatency);
> }
> pComputeQueue->ExecuteCommandLists(1, &pComputeCommandList);
> pComputeQueue->Signal(pComputeFence, FrameNumbers[Stage]);
> break;
> }
> case 1:
> {
> // Recall that the GPU queue started with a wait for pComputeFence, 1
> UINT64 CompletedComputeFrames = min(1,
> pComputeFence->GetCurrentFenceValue());
> UINT64 PipeBufferIndex =
> (CompletedComputeFrames - 1) % ComputeGraphicsLatency;
> UINT64 CommandListIndex = (FrameNumbers[Stage] - 1) % CpuLatency;
> // Update graphics command list based on CPU input and using the appropriate
> // buffer index for data produced by compute.
> UpdateGraphicsCommandList(PipeBufferIndex,
> rgpGraphicsCommandLists[CommandListIndex]);
> 
> // Signal *before* new rendering to indicate what compute work
> // the graphics queue is DONE with
> pGraphicsQueue->Signal(pGraphicsComputeFence, CompletedComputeFrames - 1);
> pGraphicsQueue->ExecuteCommandLists(1,
> rgpGraphicsCommandLists + PipeBufferIndex);
> pGraphicsQueue->Signal(pGraphicsFence, FrameNumbers[Stage]);
> break;
> }
> }
> }
> }
> _


Basically the Graphics GPC in Maxwell would be going idle until the Compute GPC is done with its work and signals the Graphics GPC that it is time to move onto other work.

This process introduces latency. This latency is true for ALL hardware. AMD and nVIDIA hardware incur this hit but the benefits from Asynchronous Compute + Graphics more than makeup for that hit.


----------



## Xuper

Are you really sure that Driver can't ignore Fence? even it says "I can't do" ( this is what FM_Jarnis Mentioned) ?


----------



## Sleazybigfoot

I've been reading this thread the past hour and a half and I'd like to thank you guys who understand this a lot better than I do for their detailed explanations and comments (Mahigan being one of them







).

Thank you, this thread cleared up a lot of false claims I used to think were correct


----------



## Mahigan

Quote:


> Originally Posted by *Xuper*
> 
> Are you really sure that Driver can't ignore Fence? even it says "I can't do" ( this is what FM_Jarnis Mentioned) ?


Nope. A driver cannot ignore a fence which is why Maxwell incurs a performance penalty under AotS when Async is turned on eventhough the driver does not support Asynchronous Compute. This is why AMD/nVIDIA and Microsoft suggested that every DX12 developer keep a non-Async Compute path in their code.

Have a read..
http://www.dualshockers.com/2016/03/14/directx12-requires-different-optimization-on-nvidia-and-amd-cards-lots-of-details-shared/

Below you can check out a summary of the most interesting points, and the slides that were showcased during the presentation. Most of the data was obviously developer-facing, and very technical, but there are definitely some points even gamers like us can take away from the presentation.


"*Consider architecture specific paths*"


*DirectX 12 is for those who want to achieve maximum GPU and CPU performance, but there's a significant requirement in engineering time, as it demands developers to write code at a driver level that DirectX 11 takes care of automatically,. For that reason, it's not for everyone.*
Since it's "closer to the metal" than DirectX 11, it requires different settings on certain things for Nvidia and AMD cards.
With DirectX 12 you're not CPU-bound for rendering.
The command lists written in DirectX 12 need to be running as much as possible, without any delay at any point. There should be 15-30 of them per frame, bundled into 5-10 "ExecuteCommandList" calls, each of which should include at least 200 microseconds of GPU Work. Preferably more, up to 500 microseconds.
*Scheduling latency on the operating system's side takes 60 microseconds, so developers should put at least more than that in each call, otherwise what's left of the 60 microseconds would be wasted idling.*
Bundles, which are the main new feature of DirectX 12, are great to send work to the GPU very early in each frame, and that's very advantageous for applications that require very low latency like VR.
They're not inherently faster on the GPU. The gain is all on the CPU side, so they need to be used wisely. *Optimizing bundles diverges for Nvidia and AMD cards, and require a different approach. In particular, for AMD cards bundles should be used only if the game is struggling on the CPU side.*
Compute queues still haven't been completely researched on DirectX 12. For the moment, they can offer 10% gains if done correctly, but there might be more gains coming as more research is done on the topic.
Since those gains don't automatically happen unless things are setup correctly, developers should always make sure whether they do or not, as poorly scheduled compute tasks can result in the opposite outcome.
*The use of root signature tables is where optimization between AMD and Nvidia diverges the most, and developers will need brand-specific settings in order to get the best benefits on both vendors' card.*
When developers find themselves with not enough video memory, DirectX 12 allows them to create overflow heaps in system memory, moving resources out of video memory at their own discretion.
Using aliased memory on DirectX 12 allows to save GPU memory even further.
*DirectX 12 introduces Fences, which are basically GPU semaphores, making sure that the GPU has finished working on a resources before it moves on to the next.*
Multi-GPU functiinality is now embedded in the DirectX 12 API.
It's important for developers to keep in mind the limitations in bandwidth of different version of PCI (the interface between motherboard and video card), as PCI 2.0 is still common, and grants half the bandwidth of PCI 3.0.
DirectX 12 includes a "Set Stable Power State" API, and some are using it. It's only really useful for profiling, and even then only some times. It reduces performance and should not be used in a shipped game.
*When deciding whether to use a pixel shader or a compute shader, there are "extreme" difference in pros and cons on Nvidia and AMD cards (as shown by the table in the gallery).*
Conservative rasterization lets you draw all the pixels touched by a triangle of your 3D models. It was possible before using a geometry shader trick, but it was quite slow. Now it's possible to enable neat effects like the ray traced shadows in Tom Clancy's The Division. In the picture in the gallery below you can see the detail of the shadow, with the bike's spokes visible on the ground. That wasn't possible without using a tray traced twchnique, which is enabled only with conservative rasterization.
Tiled resources can now be used on 3D assets, and grant "extreme" performance and memory saving benefits.
DirectX 11 is still "very much alive" and will continue to be on the side of DirectX 12 for a while.
Developers can't mix and match DirectX 11 and DirectX 12. *Either they commit to DirectX 12 entirely, or they shouldn't use it.*


----------



## Greenland

Then why Maxwell isn't getting loss in Timespy with Async ON?


----------



## airfathaaaaa

Quote:


> Originally Posted by *Greenland*
> 
> Then why Maxwell isn't getting loss in Timespy with Async ON?


either is a very low usage of it
or the async isnt really the async ms dictates


----------



## Mahigan

Quote:


> Originally Posted by *Greenland*
> 
> Then why Maxwell isn't getting loss in Timespy with Async ON?


It could be due to...

Running compute shaders specifically optimized for nVIDIA GPUs (meaning optimized short running shaders).
Making very little use of Asynchronous Compute (less than any game title making use of it so far).
Not running Graphics and Compute in parallel.


----------



## Slomo4shO

Quote:


> Originally Posted by *Mahigan*


The powerpoint is available directly from Nvidia.


----------



## specopsFI

Is it possible that the fences are there even when async toggle is off in the benchmark settings? Wouldn't that explain why Maxwell has no penalty from switching it on?


----------



## lolerk52

@Mahigan, unrelated, but you said that 480 might be ROP bottlenecked, and that it could be 980 Ti performance if it had 64 ROPs.

Do you still think that's the case? That was a couple of weeks ago.


----------



## Xuper

Quote:


> Originally Posted by *specopsFI*
> 
> Is it possible that the fences are there even when async toggle is off in the benchmark settings? Wouldn't that explain why Maxwell has no penalty from switching it on?


Then it's not Benchmark! What is it if they're same ?


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> Since 3DMark apparently only uses a single path for both AMD and nVIDIA hardware then if that 3DMark path was coded to favor nVIDIA hardware then the benchmark is not objective. Why not? Because the AMD hardware can take even more Asynchronous Compute + Graphics workloads in order to gain even more performance than it curently being shown. Therefore if a separate path were coded for AMD... more tasks could be "marked" to run in parallel (more Graphics + Compute jobs) leading to a reduction in per frame latency (resulting in higher FPS).
> 
> The main reason both AMD and nVIDIA (as well as Microsoft) stated that separate paths are the way to go for DX12 applications is specifically due to this issue. If a developer codes in favor of nVIDIA then AMD will suffer bad performance and vice versa. In order to avert this... both architectures ought to have their own optimized paths.
> 
> This should be mentioned to that 3DMark rep on steam.


That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.


----------



## Mahigan

Quote:


> Originally Posted by *specopsFI*
> 
> Is it possible that the fences are there even when async toggle is off in the benchmark settings? Wouldn't that explain why Maxwell has no penalty from switching it on?


No. Because the fences are what mark up Async Compute + Graphics tasks. If the fences are there then every other GPU (Pascal and GCN) would not receive a performance loss when Async is disabled.


----------



## Mahigan

Quote:


> Originally Posted by *Kpjoslee*
> 
> That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.


If we cling to the DX11 world mindset then I would agree with you but in the DX12/Vulkan future... that mindset does not fit. DX12/Vulkan is all about optimizations in order to get the most out of the hardware. It is the point of both of these APIs.


----------



## Kpjoslee

Quote:


> Originally Posted by *Mahigan*
> 
> If we cling to the DX11 world mindset then I would agree with you but in the DX12/Vulkan future... that mindset does not fit. DX12/Vulkan is all about optimizations in order to get the most out of the hardware. It is the point of both of these APIs.


Then it is going to create new set of argument which benchmark was optimized better for certain architecture etc, and lose the whole point of its purpose to begin with.
I think it is just difficult to create DX12/Vulkan benchmark than DX11, latter you can just play by its rule, but former requires optimization per each differing architectures.

Benchmark should just stick to providing single path, but just has to be more transparent about it.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Mahigan*
> 
> If we cling to the DX11 world mindset then I would agree with you but in the DX12/Vulkan future... that mindset does not fit. DX12/Vulkan is all about optimizations in order to get the most out of the hardware. It is the point of both of these APIs.


Exactly..

Pretty disappointed in a lot of developers.. I've heard for years from them how they all want control over the hardware, DX12, Vulkan, etc. Now they get everything they wanted and most of them have now changed their tune, "DX12 is difficult", "don't underestimate how much work driver teams did under DX11", etc., etc.

All big talk until it comes time to put their money where their mouths are.. I tell you, I've nearly had it with PC (high-end GPU's anyway).. Spending $500 on GPU's just to watch the PS4 titles like UC4 frankly embarrass anyone who spent that kind of money.

Until these new API's are used and developers give PC consumers the same amount of effort they give to the consoles (we are paying the same amount for the games after all) then I'm only buying the mid-range cards.. Waste of damn money seeing a GPU like the Fury X go to waste because "too much work"..

Considering Time Spy isn't a game i was hoping they would be on the cutting-edge and push some of the features DX12 offers to the limit.. I guess that's asking to much from a piece of software designed to test PC hardware..


----------



## specopsFI

Quote:


> Originally Posted by *Mahigan*
> 
> No. Because the fences are what mark up Async Compute + Graphics tasks. If the fences are there then every other GPU (Pascal and GCN) would not receive a performance loss when Async is disabled.


I'm definitely not an expert on this, so I most likely am getting it wrong, but wouldn't the benefit come from actually executing something inside those async tasks? So if the fences were there regardless of whether async is set on or off and regardless of uarch of the GPU, then Maxwell would perform the same whether async was set on or off (nothing executed async, just the fences there regardless of settings) while Pascal and GCN would have the same penalty when the toggle is off and both the penalty and the benefit of actual async execution when the toggle is on? I'm not saying that would make any sense, because it would obviously be very bad coding, but couldn't it still, in principle, explain the situation? If the async off setting removes everything from async tasks but leaves the fences there?


----------



## Bidz

Quote:


> Originally Posted by *Kpjoslee*
> 
> Then it is going to create new set of argument which benchmark was optimized better for certain architecture etc, and lose the whole point of its purpose to begin with.
> I think it is just difficult to create DX12/Vulkan benchmark than DX11, latter you can just play by its rule, but former requires optimization per each differing architectures.
> 
> Benchmark should just stick to providing single path, but just has to be more *transparent* about it.


That's not enough, nor really the point. Benchmark is meant to measure the capabilities of the card.

If we look at the past, we saw tessellation was a huge deal in favor of Nvidia, there was no mercy for AMD lower capability and it's fair, but here, IF we are indeed sticking to Nvidia async capabilities and leaving AMD potency aside, it's a clear bias, since they didn't in the past "hold tessellation to levels AMD could afford".


----------



## Mahigan

Quote:


> Originally Posted by *lolerk52*
> 
> @Mahigan, unrelated, but you said that 480 might be ROP bottlenecked, and that it could be 980 Ti performance if it had 64 ROPs.
> 
> Do you still think that's the case? That was a couple of weeks ago.


I still believe that to be the case. When you consider that with half the ROPs of a 390/390x it is often capable of competing with the two Grenada based GPUs then it is not too much of a stretch to imagine what it could do when faced with a GTX 980 Ti and being armed with 64 ROPs.

It is not just the ROPs but the Render Back End units themselves. Polaris has half the Render Back End units of Hawaii/Grenada and Fiji.

RBEs are responsible for depth/stencil tests and any blending with the ROPs specifically helping in terms of final pixel resolution. Of course this has an impact on anti-aliasing performance as well.


----------



## Mahigan

Quote:


> Originally Posted by *Kpjoslee*
> 
> Then it is going to create new set of argument which benchmark was optimized better for certain architecture etc, and lose the whole point of its purpose to begin with.
> I think it is just difficult to create DX12/Vulkan benchmark than DX11, latter you can just play by its rule, but former requires optimization per each differing architectures.
> 
> Benchmark should just stick to providing single path, but just has to be more transparent about it.


It would pretty much lead to an optimizations armament war between AMD and nVIDIA and this would directly translate into them both working more with developers and giving us the utmost performance for our dollar.

In other words... to our benefit as consumers.

I do not think we should concern ourselves with what a few fans are saying but rather with what we get for the money we spend.


----------



## Slomo4shO

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Considering Time Spy isn't a game i was hoping they would be on the cutting-edge and push some of the features DX12 offers to the limit.. I guess that's asking to much from a piece of software designed to test PC hardware..


They should have just stuck to DX 11.3 feature set. Instead, we got a half-arsed DX 12 benchmark that provides zero insight into DX 12 performance.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Slomo4shO*
> 
> They should have just stuck to DX 11.3 feature set. Instead, we got a half-arsed DX 12 benchmark that provides zero insight into DX 12 performance.


Yup.. Pretty annoyed honestly.

Looks like PC is destined to be held back by archaic API's.. But then the same people raving about Nvidias DX11 performance will be the first to moan about AC Unity's performance problems.. If only there were API's that could handle large crowds without ******* the bed..


----------



## comprodigy

Mahigan, you're assumption about fences/async/nvidia is off. Proof in point is the async demo provided by MS and altered by AMD. I can run this on my 980ti and have the same performance as I do with async off. The software isnt written to detect if async is present or not, its just issuing command lists into mulitple queues.


----------



## lolerk52

Quote:


> Originally Posted by *Mahigan*
> 
> I still believe that to be the case. When you consider that with half the ROPs of a 390/390x it is often capable of competing with the two Grenada based GPUs then it is not too much of a stretch to imagine what it could do when faced with a GTX 980 Ti and being armed with 64 ROPs.
> 
> It is not just the ROPs but the Render Back End units themselves. Polaris has half the Render Back End units of Hawaii/Grenada and Fiji.
> 
> RBEs are responsible for depth/stencil tests and any blending with the ROPs specifically helping in terms of final pixel resolution. Of course this has an impact on anti-aliasing performance as well.


Well, it does appear to do better at lower resolutions than its brethren, but nowhere near 980 Ti perf:


----------



## pengs

Quote:


> Originally Posted by *Kpjoslee*
> 
> Then it is going to create new set of argument which benchmark was optimized better for certain architecture etc, and lose the whole point of its purpose to begin with.
> I think it is just difficult to create DX12/Vulkan benchmark than DX11, latter you can just play by its rule, but former requires optimization per each differing architectures.
> 
> Benchmark should just stick to providing single path, but just has to be more transparent about it.


Anyone arguing against DX12 and Vulkan favoring DX11 isn't worth the time, there's a direct correlation with bias or misinformation there and if it's a bias it's not worth the breath.
Quote:


> Originally Posted by *GorillaSceptre*
> 
> Exactly..
> 
> Pretty disappointed in a lot of developers.. I've heard for years from them how they all want control over the hardware, DX12, Vulkan, etc. Now they get everything they wanted and most of them have now changed their tune, "DX12 is difficult", "don't underestimate how much work driver teams did under DX11", etc., etc.
> 
> All big talk until it comes time to put their money where their mouths are.. I tell you, I've nearly had it with PC (high-end GPU's anyway).. Spending $500 on GPU's just to watch the PS4 titles like UC4 frankly embarrass anyone who spent that kind of money.
> 
> Until these new API's are used and developers give PC consumers the same amount of effort they give to the consoles (we are paying the same amount for the games after all) then I'm only buying the mid-range cards.. Waste of damn money seeing a GPU like the Fury X go to waste because "too much work"..
> 
> Considering Time Spy isn't a game i was hoping they would be on the cutting-edge and push some of the features DX12 offers to the limit.. I guess that's asking to much from a piece of software designed to test PC hardware..


Developers are complaining because it's back to base level engine design. To switch completely to a DX12 rendering path requires tailoring the engine for it which means their previous work needs to be largely re-written and they've already spent time and money developing for and around DX11.

The parallax being that while devs need to sink a lot of time into this they will end up with a frame which they will use for each of their IP's, indefinitely.
Low level is low level. It's going to be hard for Microsoft to transition DirectX back into a friendly API and it doesn't matter at this point either, the landscape is immensely different than it used to be. Studio's like Epic and engines like Unity will bridge the void for their users (possibly a market of studio's developing engines for lease) - large developers with the man power developing their own proprietary engines unique to the studio which will be reused and reformed as needed, this has basically already happened.

You take control away from the industry and end up with lot's of cookie cutter pastries, to give the industry control is to allow them to make whatever they want (including the cookie cutter itself).
It will take time, patients and lots of bitc.... uh, complaining.


----------



## GorillaSceptre

Quote:


> Originally Posted by *pengs*
> 
> Anyone arguing against DX12 and Vulkan favoring DX11 isn't worth the time, there's a direct correlation with bias or misinformation there and if it's a bias it's not worth the breath.
> Developers are complaining because it's back to base level engine design. To switch completely to a DX12 rendering path requires tailoring the engine for it which means their previous work needs to be largely re-written and they've already spent time and money developing for and around DX11.
> 
> The parallax being that while devs need to sink a lot of time into this they will end up with a frame which they will use for each of their IP's, indefinitely.
> Low level is low level. It's going to be hard for Microsoft to transition DirectX back into a friendly API and it doesn't matter at this point either, the landscape is immensely different than it used to be. Studio's like Epic and engines like Unity will bridge the void for their users (possibly a market of studio's developing engines for lease) - large developers with the man power developing their own proprietary engines unique to the studio which will be reused and reformed as needed, this has basically already happened.
> 
> You take control away from the industry and end up with lot's of cookie cutter pastries, to give the industry control is to allow them to make whatever they want (including the cookie cutter itself).
> It will take time, patients and lots of bitc.... uh, complaining.


I agree with all of that.









Just taking jabs at certain developers. The ones who blame their lackluster, botched releases on the fact that they have to deal with DX11 and all it's limitations. Now they have what they wanted and they're now making excuses the other way around.


----------



## Bidz

Today tailoring your own engine is a luxury, not a requirement in game development.

And lets be honest, most "custom engines" are bad.


----------



## provost

Quote:


> Originally Posted by *Mahigan*
> 
> It would pretty much lead to an optimizations armament war between AMD and nVIDIA and this would directly translate into them both working more with developers and giving us the utmost performance for our dollar.
> 
> In other words... to our benefit as consumers.
> 
> I do not think we should concern ourselves with what a few fans are saying but rather with what we get for the money we spend.


I suppose AMD can always enlist Microsoft as a stalwart ally in this "optimization armament race" since their respective interests seem to be aligned. Microsoft potentially carries a bigger "carrot and stick" than both AMD and Nvidia combined...lol
Can't see MS not wanting dx12 to as popular and as successful as possible, given how invested it is in this new direction. So get together with Microsoft and go to town on the developers..
I wouldn't be surprised if this wasn't already happening....


----------



## Yttrium

Quote:


> Originally Posted by *Kpjoslee*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Mahigan*
> 
> Since 3DMark apparently only uses a single path for both AMD and nVIDIA hardware then if that 3DMark path was coded to favor nVIDIA hardware then the benchmark is not objective. Why not? Because the AMD hardware can take even more Asynchronous Compute + Graphics workloads in order to gain even more performance than it curently being shown. Therefore if a separate path were coded for AMD... more tasks could be "marked" to run in parallel (more Graphics + Compute jobs) leading to a reduction in per frame latency (resulting in higher FPS).
> 
> The main reason both AMD and nVIDIA (as well as Microsoft) stated that separate paths are the way to go for DX12 applications is specifically due to this issue. If a developer codes in favor of nVIDIA then AMD will suffer bad performance and vice versa. In order to avert this... both architectures ought to have their own optimized paths.
> 
> This should be mentioned to that 3DMark rep on steam.
> 
> 
> 
> That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.
Click to expand...

Well, you sort of can. turn off tesselation and up the resolution for the fury and there you have an AMD favouring benchmark.

I guess the best they can do is to make async optional for AMD and Nvidia gpu's untill Nvidia releases their drivers that implement async on a hardware level. (yes, the last part was satire)

At first I thought they were simply inept and fools for not at least taking a look at async but the steam discussion someone posted made this look more and more unlikely.


----------



## GorillaSceptre

Quote:


> Originally Posted by *provost*
> 
> I suppose AMD can always enlist Microsoft as a stalwart ally in this "optimization armament race" since their respective interests seem to be aligned. Microsoft potentially carries a bigger "carrot and stick" than both AMD and Nvidia combined...lol
> Can't see MS not wanting dx12 to as popular and as successful as possible, given how invested it is in this new direction. So get together with Microsoft and go to town on the developers..
> *I wouldn't be surprised if this wasn't already happening...*.


That's the only saving grace right now. If the X1 wasn't struggling against it's competition MS wouldn't even give a damn..

Vulkan and DX12 are extremely similar so I'm personally hoping Vulkan is the go-to API moving forward. That way everyone who isn't/doesn't want to deal with Win10 gets the benefits too.

Besides support from Microsoft or deals i don't know why anyone would choose DX12 over Vulkan. And now that DX12 has competition i think we'll have MS pushing it very hard going forward.


----------



## Remij

People need to stop blaming developers. It's the GPU vendors job to make sure their hardware is taken advantage of.

Why is it so hard to understand that developers will code for what hardware has the biggest install base in the market they are developing for? Why is it hard to understand that developers will often choose the path of least resistence? Just because there is a new API out there doesn't mean developers have to take advantage of it. It's the GPU vendors who stand to gain from it that need to do the work so that it's adopted.

Console ports are heavily designed for GCN architecture and Nvidia has to combat that by working with developers and adding their own incentives for people to choose their hardware over the competitions.

Devs have limited time and resources and make logical decisions based not on fanboy-ish behavior, but market research. If anything it's AMDs time to bring the heat with DX12 and really work with developers to heavily code for async compute+graphics. If developers don't, or AMD doesn't have enough resources to work with everyone, then it is what it is, they need to pick their battles. If games are developed on consoles then ported to PC, Nvidia has to make it work. If they can't.. then I'd be switching to Team Red. Developers are making decisions that make sense for them. AMD has done a lot of work recently. It's pretty much impossible for game developers to ignore them, and they have 2 new fresh APIs we'll suited to their hardware. They are finally more competitive and regardless of what people think is happening behind the scenes, this is good for competition.

AMD fans need to take their wins and losses graciously. Nvidia fans need to realize that AMD is back in a big way and they're not the same AMD as before. There will be some losses lol


----------



## Themisseble

Quote:


> Originally Posted by *lolerk52*
> 
> Well, it does appear to do better at lower resolutions than its brethren, but nowhere near 980 Ti perf:


Yeah, TPU showing mostlčy nVIDIA titles or old Games which rusn good on NVIDIA put Anno 2205awy and you get +5% on all amd cards.


----------



## Kpjoslee

Quote:


> Originally Posted by *GorillaSceptre*
> 
> That's the only saving grace right now. If the X1 wasn't struggling against it's competition MS wouldn't even give a damn..
> 
> Vulkan and DX12 are extremely similar so I'm personally hoping Vulkan is the go-to API moving forward. That way everyone who isn't/doesn't want to deal with Win10 gets the benefits too.
> 
> *Besides support from Microsoft* or deals i don't know why anyone would choose DX12 over Vulkan. And now that DX12 has competition i think we'll have MS pushing it very hard going forward.


That is the biggest reason why DirectX was preferred over OpenGL. And I think that would be the edge DirectX12 will have over Vulkan. I would definitely prefer Vulkan to be go-to API instead of DirectX12 for the sake of less Windows reliance of PC games going forward but developers unfortunately but understandably prefers having support than not.


----------



## magnek

Quote:


> Originally Posted by *Remij*
> 
> *People need to stop blaming developers. It's the GPU vendors job to make sure their hardware is taken advantage of.*
> 
> Why is it so hard to understand that developers will code for what hardware has the biggest install base in the market they are developing for? Why is it hard to understand that developers will often choose the path of least resistence? Just because there is a new API out there doesn't mean developers have to take advantage of it. It's the GPU vendors who stand to gain from it that need to do the work so that it's adopted.
> 
> Console ports are heavily designed for GCN architecture and Nvidia has to combat that by working with developers and adding their own incentives for people to choose their hardware over the competitions.
> 
> Devs have limited time and resources and make logical decisions based not on fanboy-ish behavior, but market research. If anything it's AMDs time to bring the heat with DX12 and really work with developers to heavily code for async compute+graphics. If developers don't, or AMD doesn't have enough resources to work with everyone, then it is what it is, they need to pick their battles. If games are developed on consoles then ported to PC, Nvidia has to make it work. If they can't.. then I'd be switching to Team Red. Developers are making decisions that make sense for them. AMD has done a lot of work recently. It's pretty much impossible for game developers to ignore them, and they have 2 new fresh APIs we'll suited to their hardware. They are finally more competitive and regardless of what people think is happening behind the scenes, this is good for competition.
> 
> AMD fans need to take their wins and losses graciously. Nvidia fans need to realize that AMD is back in a big way and they're not the same AMD as before. There will be some losses lol


Then how do you explain Ubisoft.









But seriously, ever since the unholy trinity that was Watch Dogs, Ass Creed Unity and Far Cry 4, I've pretty much written them off.


----------



## Remij

Quote:


> Originally Posted by *magnek*
> 
> Then how do you explain Ubisoft.


I'm French myself and trust me, there's no explaining Ubisoft. They're.. uhhh... just special.


----------



## Kuivamaa

Quote:


> Originally Posted by *Remij*
> 
> People need to stop blaming developers. It's the GPU vendors job to make sure their hardware is taken advantage of.
> 
> Why is it so hard to understand that developers will code for what hardware has the biggest install base in the market they are developing for? Why is it hard to understand that developers will often choose the path of least resistence? Just because there is a new API out there doesn't mean developers have to take advantage of it. It's the GPU vendors who stand to gain from it that need to do the work so that it's adopted.
> 
> Console ports are heavily designed for GCN architecture and Nvidia has to combat that by working with developers and adding their own incentives for people to choose their hardware over the competitions.
> 
> Devs have limited time and resources and make logical decisions based not on fanboy-ish behavior, but market research. If anything it's AMDs time to bring the heat with DX12 and really work with developers to heavily code for async compute+graphics. If developers don't, or AMD doesn't have enough resources to work with everyone, then it is what it is, they need to pick their battles. If games are developed on consoles then ported to PC, Nvidia has to make it work. If they can't.. then I'd be switching to Team Red. Developers are making decisions that make sense for them. AMD has done a lot of work recently. It's pretty much impossible for game developers to ignore them, and they have 2 new fresh APIs we'll suited to their hardware. They are finally more competitive and regardless of what people think is happening behind the scenes, this is good for competition.
> 
> AMD fans need to take their wins and losses graciously. Nvidia fans need to realize that AMD is back in a big way and they're not the same AMD as before. There will be some losses lol


Not sure if serious. A console level API better than DX9-DX11 has been a constant demand from developers for several years. It was Johan Andersson of EA DICE that kickstarted this API renaissance we are experiencing today (Mantle, Metal,DX12, Vulkan etc). He shared the concern of the rest of pc devs but he did not just nag. He took it to the next level and pitched the idea,the need if you like for a better API to Intel,MS,nvidia and AMD. It was AMD that shared his vision and they came up with Mantle, which is the father of DX12 and Vulkan. Without devs expressing their discontent and their need for better tools,we would be still left with the obsolete DX11.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Remij*
> 
> People need to stop blaming developers. It's the GPU vendors job to make sure their hardware is taken advantage of.


I'm not blaming indie devs working out of a tiny office trying to make their dreams come true..

I'm talking about big business Tripple-A studios, those are the titles that actually need the new API's (according to them). 99% of the games on Steam will run on weak hardware, DX12 and Vulkan + the extra work that comes with them aren't needed.

These big studios that release games like Arkham Knight, Unity, etc., etc., all blame their broken garbage on restrictive API's like DX11. As for the vendors being responsible, well... i agree to some extent, and AMD went as far to create their own API and push for DX12 and Vulkan in the first place.. But the onus is also on studios who are more than happy to charge $60 + season passes for their products.\

Studios like ID and DICE are few and far between.. Most of the other ones who have been crying for these new API's are now backtracking. Now that refunds are the norm i think they'll all put a bit more effort in.


----------



## Mahigan

Quote:


> Originally Posted by *comprodigy*
> 
> Mahigan, you're assumption about fences/async/nvidia is off. Proof in point is the async demo provided by MS and altered by AMD. I can run this on my 980ti and have the same performance as I do with async off. The software isnt written to detect if async is present or not, its just issuing command lists into mulitple queues.


Not off at all... that Async Compute test you are referencing is not a heavy test at all. It is only meant to show the feature off itself and encompasses no user interactions or need for more than a single Fence between the Compute and Graphics contexts.

In a game... things are much different... have a read here or check out the important parts below https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx


Spoiler: Warning: Spoiler!



*GPU engines*

The following diagram shows a title's CPU threads, each populating one or more of the copy, compute and 3D queues. The 3D queue can drive all three GPU engines, the compute queue can drive the compute and copy engines, and the copy queue simply the copy engine.
As the different threads populate the queues, there can be no simple guarantee of the order of execution, *hence the need for synchronization mechanisms - when the title requires them*.


The following image illustrate how a title might schedule work across multiple GPU engines, including inter-engine synchronization where necessary: it shows the per-engine workloads with inter-engine dependencies. In this example, the copy engine first copies some geometry necessary for rendering. The 3D engine waits for these copies to complete, and renders a pre-pass over the geometry. This is then consumed by the compute engine. The results of the compute engine Dispatch, along with several texture copy operations on the copy engine, are consumed by the 3D engine for the final Draw call.


The following pseudo-code illustrates how a title might submit such a workload.
Quote:


> // Get per-engine contexts. Note that multiple queues may be exposed
> // per engine, however that design is not reflected here.
> copyEngine = device->GetCopyEngineContext();
> renderEngine = device->GetRenderEngineContext();
> computeEngine = device->GetComputeEngineContext();
> copyEngine->CopyResource(geometry, ...); // copy geometry
> copyEngine->Signal(copyFence, 101);
> copyEngine->CopyResource(tex1, ...); // copy textures
> copyEngine->CopyResource(tex2, ...); // copy more textures
> copyEngine->CopyResource(tex3, ...); // copy more textures
> copyEngine->CopyResource(tex4, ...); // copy more textures
> copyEngine->Signal(copyFence, 102);
> renderEngine->Wait(copyFence, 101); // geometry copied
> renderEngine->Draw(); // pre-pass using geometry only into rt1
> renderEngine->Signal(renderFence, 201);
> computeEngine->Wait(renderFence, 201); // prepass completed
> computeEngine->Dispatch(); // lighting calculations on pre-pass (using rt1 as SRV)
> computeEngine->Signal(computeFence, 301);
> renderEngine->Wait(computeFence, 301); // lighting calculated into buf1
> renderEngine->Wait(copyFence, 202); // textures copied
> renderEngine->Draw(); // final render using buf1 as SRV, and tex[1-4] SRVs


The following pseudo-code illustrates synchronization between the copy and 3D engines to accomplish heap-like memory allocation via a ring buffer. Titles have the flexibility to choose the right balance between maximizing parallelism (via a large buffer) and reducing memory consumption and latency (via a small buffer).
Quote:


> device->CreateBuffer(&ringCB);
> for(int i=1;i++){
> if(i > length) copyEngine->Wait(fence1, i - length);
> copyEngine->Map(ringCB, value%length, WRITE, pData); // copy new data
> copyEngine->Signal(fence2, i);
> renderEngine->Wait(fence2, i);
> renderEngine->Draw(); // draw using copied data
> renderEngine->Signal(fence1, i);
> }
> 
> // example for length = 3:
> // copyEngine->Map();
> // copyEngine->Signal(fence2, 1); // fence2 = 1
> // copyEngine->Map();
> // copyEngine->Signal(fence2, 2); // fence2 = 2
> // copyEngine->Map();
> // copyEngine->Signal(fence2, 3); // fence2 = 3
> // copy engine has exhausted the ring buffer, so must wait for render to consume it
> // copyEngine->Wait(fence1, 1); // fence1 == 0, wait
> // renderEngine->Wait(fence2, 1); // fence2 == 3, pass
> // renderEngine->Draw();
> // renderEngine->Signal(fence1, 1); // fence1 = 1, copy engine now unblocked
> // renderEngine->Wait(fence2, 2); // fence2 == 3, pass
> // renderEngine->Draw();
> // renderEngine->Signal(fence1, 2); // fence1 = 2
> // renderEngine->Wait(fence2, 3); // fence2 == 3, pass
> // renderEngine->Draw();
> // renderEngine->Signal(fence1, 3); // fence1 = 3
> // now render engine is starved, and so must wait for the copy engine
> // renderEngine->Wait(fence2, 4); // fence2 == 3, wait


*Multi-engine scenarios*

D3D12 allows developers to avoid accidentally running into inefficiencies caused by unexpected synchronization delays. It also allows developers to introduce synchronization at a higher level where the required synchronization can be determined with greater certainty. A second issue that multi-engine addresses is to make expensive operations more explicit, which includes transitions between 3D and video that were traditionally costly because of synchronization between multiple kernel contexts.
In particular, the following scenarios can be addressed with D3D12:


Asynchronous and low priority GPU work. This enables concurrent execution of low priority GPU work and atomic operations that enable one GPU thread to consume the results of another unsynchronized thread without blocking.
High priority compute work. With background compute it is possible to interrupt 3D rendering to do a small amount of high priority compute work. The results of this work can be obtained early for additional processing on the CPU.
Background compute work. A separate low priority queue for compute workloads allows an application to utilize spare GPU cycles to perform background computation without negative impact on the primary rendering (or other) tasks. Background tasks may include decompression of resources or updating simulations or acceleration structures. Background tasks should be synchronized on the CPU infrequently (approximately once per frame) to avoid stalling or slowing foreground work.
Streaming and uploading data. A separate copy queue replaces the D3D11 concepts of initial data and updating resources. Although the application is responsible for more details in the D3D12 model, this responsibility comes with power. The application can control how much system memory is devoted to buffering upload data. The app can choose when and how (CPU vs GPU, blocking vs non-blocking) to synchronize, and can track progress and control the amount of queued work.
Increased parallelism. Applications can use deeper queues for background workloads (e.g. video decode) when they have separate queues for foreground work.
In D3D12 the concept of a command queue is the API representation of a roughly serial sequence of work submitted by the application. Barriers and other techniques allow this work to be executed in a pipeline or out of order, but the application only sees a single completion timeline. This corresponds to the immediate context in D3D11.
*Synchronization APIs*

*Devices and Queues*
The D3D 12 device has methods to create and retrieve command queues of different types and priorities. Most applications should use the default command queues because these allow for shared usage by other components. Applications with additional concurrency requirements can create additional queues. Queues are specified by the command list type that they consume.
Refer to the following creation methods of ID3D12Device:
CreateCommandQueue : creates a command queue based on information in a D3D12_COMMAND_QUEUE_DESC structure.
CreateCommandList : creates a command list of type D3D12_COMMAND_LIST_TYPE.
CreateFence : creates a fence, noting the flags in D3D12_FENCE_FLAGS. Fences are used to synchronize queues.
Queues of all types (3D, compute and copy) share the same interface and are all command-list based. Resource mapping operations remain on the queue interface, but are only allowed on 3D and compute queues (not copy).
Refer to the following methods of ID3D12CommandQueue:
ExecuteCommandLists : submits an array of command lists for execution. Each command list being defined by ID3D12CommandList.
Signal : sets a fence value when the queue (running on the GPU) reaches a certain point.
Wait : the queue waits until the specified fence reaches the specified value.
Note that bundles are not consumed by any queues and therefore this type cannot be used to create a queue.

*Fences*
The multi-engine API provides explicit APIs to create and synchronize using fences. A fence is a synchronization construct determined by monotonically increasing a UINT64 value. Fence values are set by the application. A signal operation increases the fence value and a wait operation blocks until the fence has reached the requested value. An event can be fired when a fence reaches a certain value.
Refer to the methods of the ID3D12Fence interface:
GetCompletedValue : returns the current value of the fence.
SetEventOnCompletion : causes an event to fire when the fence reaches a given value.
Signal : sets the fence to the given value.
Fences allow CPU access to the current fence value, and CPU waits and signals. Independent components can share the default queues but create their own fences and control their own fence values and synchronization.
The Signal method on the ID3D12Fence interface updates a fence from the CPU side. The Signal method on ID3D12CommandQueue updates a fence from the GPU side.
All nodes in a multi-engine setup can read and react to any fence reaching the right value.
Applications set their own fence values, a good starting point might be increasing a fence once per frame.
The fence APIs provide powerful synchronization functionality but can create potentially difficult to debug issues.
*Asynchronous compute and graphics example*

This next example allows graphics to render asynchronously from the compute queue. There is still a fixed amount of buffered data between the two stages, however now graphics work proceeds independently and uses the most up-to-date result of the compute stage as known on the CPU when the graphics work is queued. This would be useful if the graphics work was being updated by another source, for example user input. There must be multiple command lists to allow the ComputeGraphicsLatency frames of graphics work to be in flight at a time, and the function UpdateGraphicsCommandList represents updating the command list to include the most recent input data and read from the compute data from the appropriate buffer.
The compute queue must still wait for the graphics queue to finish with the pipe buffers, but a third fence (pGraphicsComputeFence) is introduced so that the progress of graphics reading compute work versus graphics progress in general can be tracked. This reflects the fact that now consecutive graphics frames could read from the same compute result or could skip a compute result. A more efficient but slightly more complicated design would use just the single graphics fence and store a mapping to the compute frames used by each graphics frame.
Quote:


> void AsyncPipelinedComputeGraphics()
> {
> const UINT CpuLatency = 3;
> const UINT ComputeGraphicsLatency = 2;
> 
> // Compute is 0, graphics is 1
> ID3D12Fence *rgpFences[] = { pComputeFence, pGraphicsFence };
> HANDLE handles[2];
> handles[0] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> handles[1] = CreateEvent(nullptr, FALSE, TRUE, nullptr);
> UINT FrameNumbers[] = { 0, 0 };
> 
> ID3D12GraphicsCommandList *rgpGraphicsCommandLists[CpuLatency];
> CreateGraphicsCommandLists(ARRAYSIZE(rgpGraphicsCommandLists),
> rgpGraphicsCommandLists);
> 
> *// Graphics needs to wait for the first compute frame to complete, this is the
> // only wait that the graphics queue will perform.
> pGraphicsQueue->Wait(pComputeFence, 1);*
> 
> while (1)
> {
> for (auto i = 0; i < 2; ++i)
> {
> if (FrameNumbers _> CpuLatency)
> {
> rgpFences_->SetEventOnFenceCompletion(
> FrameNumbers _- CpuLatency,
> handles_);
> }
> else
> {
> SetEvent(handles_);
> }
> }
> 
> auto WaitResult = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
> auto Stage = WaitResult = WAIT_OBJECT_0;
> ++FrameNumbers[Stage];
> 
> switch (Stage)
> {
> case 0:
> {
> if (FrameNumbers[Stage] > ComputeGraphicsLatency)
> {
> pComputeQueue->Wait(pGraphicsComputeFence,
> FrameNumbers[Stage] - ComputeGraphicsLatency);
> }
> pComputeQueue->ExecuteCommandLists(1, &pComputeCommandList);
> pComputeQueue->Signal(pComputeFence, FrameNumbers[Stage]);
> break;
> }
> case 1:
> {
> // Recall that the GPU queue started with a wait for pComputeFence, 1
> UINT64 CompletedComputeFrames = min(1,
> pComputeFence->GetCurrentFenceValue());
> UINT64 PipeBufferIndex =
> (CompletedComputeFrames - 1) % ComputeGraphicsLatency;
> UINT64 CommandListIndex = (FrameNumbers[Stage] - 1) % CpuLatency;
> // Update graphics command list based on CPU input and using the appropriate
> // buffer index for data produced by compute.
> UpdateGraphicsCommandList(PipeBufferIndex,
> rgpGraphicsCommandLists[CommandListIndex]);
> 
> // Signal *before* new rendering to indicate what compute work
> // the graphics queue is DONE with
> pGraphicsQueue->Signal(pGraphicsComputeFence, CompletedComputeFrames - 1);
> pGraphicsQueue->ExecuteCommandLists(1,
> rgpGraphicsCommandLists + PipeBufferIndex);
> pGraphicsQueue->Signal(pGraphicsFence, FrameNumbers[Stage]);
> break;
> }
> }
> }
> }_






Now pair all of that with what the Oxide dev Kollock stated... (I asked that very question about fences to Kollock if you read the last exchange)


Spoiler: Warning: Spoiler!


----------



## magnek

Quote:


> Originally Posted by *GorillaSceptre*
> 
> I'm not blaming indie devs working out of a tiny office trying to make their dreams come true..
> 
> I'm talking about big business Tripple-A studios, those are the titles that actually need the new API's (according to them). 99% of the games on Steam will run on weak hardware, DX12 and Vulkan + the extra work that comes with them aren't needed.
> 
> These big studios that release games like Arkham Knight, Unity, etc., etc., all blame there broken garbage on restrictive API's like DX11. As for the vendors being responsible, well... i agree to some extent, and AMD went as far to create their own API and push for DX12 and Vulkan in the first place.. But the onus is also on studios who are more than happy to charge $60 + season passes for their products.


The situation is exactly analogous to the GPU market. As long as people keep buying Battlefield Calls XXX Remastered Diamond Premium Edition with Season Pass and *exclusive preorder bonuses* regardless of what kind of garbage the big studios keep churning out, why would they have any incentive to do anything differently?


----------



## GorillaSceptre

Quote:


> Originally Posted by *magnek*
> 
> The situation is exactly analogous to the GPU market. As long as people keep buying Battlefield Calls XXX Remastered Diamond Premium Edition with Season Pass and *exclusive preorder bonuses* regardless of what kind of garbage the big studios keep churning out, why would they have any incentive to do anything differently?


Put the reason in my edit. One word - Refunds.


----------



## Remij

Quote:


> Originally Posted by *Kuivamaa*
> 
> Not sure if serious. A console level API better than DX9-DX11 has been a constant demand from developers for several years. It was Johan Andersson of EA DICE that instigated this API renaissance we are experiencing today (Mantle, Metal,DX12, Vulkan etc). He shared the concern of the rest of pc devs but he did not just nag. He took it to the next level and pitched the idea,the need if you like for a better API to Intel,MS,nvidia and AMD. It was AMD that shared his vision and they came up with Mantle, which is the father of DX12 and Vulkan. Without devs expressing their discontent and their need for better tools,we would be still left with the obsolete DX11.


Big developers with their own engines of course have much to gain from low level APIs where they push the latest and greatest hardware to show off their games and engines.. You'll see those devs take initiative and support the hardware better regardless of the API used because they have huge teams with highly specialized engineers and programmers that know exactly how to code close to the metal. It's no surprise they want to push technology forward.

But then you have to remember about the point I made about developing for the largest potential market of hardware out there. Even Johan was debating pushing for DX12 only vs coding two separate paths. He said the benefits would be there, but they have to consider the market.

So relax... don't blame developers just yet screaming bloody murder when something isn't fully taken advantage of. This is a transition period and the two architectures are quite different as we already know... So in the games that AMD gets ahead, celebrate and be happy. But when Nvidia wins games here and there.. just take solace in the fact that AMDs performance will likely be much better than it would have been before DX12. So progress is being made.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I am not sure if people have asked about fences or not. The fences are the basis of my argument. It does not matter if the driver switches Async Compute on or off. 3DMark has two paths. One with Async Compute and another without. When you toggle Async On... the Async path is used. When you toggle Async Off... the non-Async path is used.
> 
> The argument has to do with the fences in the Async path. Those fences would still incur a performance penalty regardless of what the nVIDIA Maxwell driver is doing. Those fences cause tiny delays... the more fences the more delays are introduced. The more delays... the lower your FPS due to the introduced latency.
> 
> What I do not see is the performance hit associated with those delays when running TimeSpy on an nVIDIA Maxwell based GPU. Those fences should be negatively affecting the performance of the Maxwell GPU as they synchronize Graphics and Compute tasks. I have already mentioned ways in which this would not be the case and those ways would be if the application was coded to favor nVIDIA hardware (utilizing short running shaders and very few Asynchronous Compute + Graphics workloads).
> 
> Since 3DMark apparently only uses a single path for both AMD and nVIDIA hardware then if that 3DMark path was coded to favor nVIDIA hardware then the benchmark is not objective. Why not? Because the AMD hardware can take even more Asynchronous Compute + Graphics workloads in order to gain even more performance than it curently being shown. Therefore if a separate path were coded for AMD... more tasks could be "marked" to run in parallel (more Graphics + Compute jobs) leading to a reduction in per frame latency (resulting in higher FPS).
> 
> The main reason both AMD and nVIDIA (as well as Microsoft) stated that separate paths are the way to go for DX12 applications is specifically due to this issue. If a developer codes in favor of nVIDIA then AMD will suffer bad performance and vice versa. In order to avert this... both architectures ought to have their own optimized paths.
> 
> This should be mentioned to that 3DMark rep on steam.


it seems devs can control how many fences/ barrier are used and when

and Nvidia tries to show that this breaks their DX12 perf

Do's



Spoiler: Warning: Spoiler!




Minimize the use of barriers and fences
We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports
The DX11 driver is doing a great job of reducing barriers - now under DX12 you need to do it
Any barrier or fence can limit parallelism
Make sure to always use the minimum set of resource usage flags
Stay away from using D3D12_RESOURCE_USAGE_GENERIC_READ unless you really need every single flag that is set in this combination of flags
Redundant flags may trigger redundant flushes and stalls and slow down your game unnecessarily
To reiterate: We have seen redundant and/or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
Specify the minimum set of targets in ID3D12CommandList::ResourceBarrier
Adding false dependencies adds redundancy
Group barriers in one call to ID3D12CommandList::ResourceBarrier
This way the worst case can be picked instead of sequentially going through all barriers
Use split barriers when possible
Use the _BEGIN_ONLY/_END_ONLY flags
This helps the driver doing a more efficient job
Do use fences to signal events/advance across calls to ExecuteCommandLists




Dont's


Spoiler: Warning: Spoiler!



Quote:


> Don't insert redundant barriers
> This limits parallelism
> A transition from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_RENDER_TARGET and back without any draw calls in-between is redundant
> Avoid read-to-read barriers
> Get the resource in the right state for all subsequent reads
> Don't use D3D12_RESOURCE_USAGE_GENERIC_READ unless you really needs every single flag
> Don't sequentially call ID3D12CommandList::ResourceBarrier with just one barrier
> This doesn't allow the driver to pick the worst case of a set of barriers
> Don't expect fences to trigger signals/advance at a finer granularity then once per ExecuteCommandLists call.






Also the work batches/command list summited for Maxwell can be big but few therefore less synchronizaton points which fits maxwell and pascal

Do´s



Spoiler: Warning: Spoiler!




Submit work in parallel and evenly across several threads/cores to multiple command lists
Recording commands is a CPU intensive operation and no driver threads come to the rescue
Command lists are not free threaded so parallel work submission means submitting to multiple command lists
Be aware of the fact that there is a cost associated with setup and reset of a command list
You still need a reasonable number of command lists for efficient parallel work submission
Fences force the splitting of command lists for various reasons ( multiple command queues, picking up the results of queries)
Try to aim at a reasonable number of command lists in the range of 15-30 or below. Try to bundle those CLs into 5-10 ExecuteCommandLists() calls per frame.




Quote:


> Originally Posted by *KarathKasun*
> 
> Even if you optimize everything for the specific target GPU's, each will have strengths and weaknesses depending on how they handle the tasks given to them.
> 
> We have seen this argument before with the now ancient 8800 GTX vs HD 2900 XT argument (SIMD vs VLIW respectively). You cant change the fact that both approaches deal with data processing in intrinsically different ways.


Maybe they use same patch regardless of the architecture they use a basic procedure for every GPU which sends info to the numbers of CUs with the common way it works along all architecture even if the asynchronous compute queues dont take advantage of the architecture, like using Pascal path which works in Maxwell and GCN


----------



## Xuper

If you give Async code to Driver and driver is doing something to make it shine while It's disabled on maxwell card, *score is not Valid!* because driver is doing Cheat.either Developer should disable Async compute ( by using gray out on Button ) Or we should see Perf hit with Async On.


----------



## Kuivamaa

Quote:


> Originally Posted by *Remij*
> 
> Big developers with their own engines of course have much to gain from low level APIs where they push the latest and greatest hardware to show off their games and engines.. You'll see those devs take initiative and support the hardware better regardless of the API used because they have huge teams with highly specialized engineers and programmers that know exactly how to code close to the metal. It's no surprise they want to push technology forward.
> 
> But then you have to remember about the point I made about developing for the largest potential market of hardware out there. Even Johan was debating pushing for DX12 only vs coding two separate paths. He said the benefits would be there, but they have to consider the market.
> 
> So relax... don't blame developers just yet screaming bloody murder when something isn't fully taken advantage of. This is a transition period and the two architectures are quite different as we already know... So in the games that AMD gets ahead, celebrate and be happy. But when Nvidia wins games here and there.. just take solace in the fact that AMDs performance will likely be much better than it would have been before DX12. So progress is being made.


I am not blaming anyone. I just expressed my objection to the notion that devs have this passive role of getting served a specific type of hardware or API and they can't do anything about it, only code for whatever the majority uses. In fact it is the top devs and their games that shape the hardware. Carmack did it back in the day, now it is Andersson. Radeons and the modern APIs had to meet his demands essentially. Not that he is some sort of King or anything - his wishes echoed the industry as a whole. Indie devs and non AAA studios in general will compromise, sure. But even they affect the shape of things to come by supporting certain advanced engines over older ones. As for Johan and DX12 , he is dead set on using it. The thing he considered was whether to drop DX11 burden altogether or not. My guess is that if DX12 was not tied to Windows 10, BF1 would be DX12 only, just like BF3 was DX11 only back in 2011. In other words his market concerns are on OS level,not hardware one - non DX12 compliant GPUs are far too weak these days to matter for this discussion.


----------



## KarathKasun

Seems the NV approach relies on doing compute kernels in a single wavefront or whatever they call the smallest slice of GPU time they can allot to a task. Effectively limiting the chip to being treated as a single resource. This is similar to the way a single CPU core was used for multitasking before multiple core PC's became commonplace.

GCN OTOH is operated more like a cluster of weaker components where each can do is own task without disturbing the jobs taking place in adjacent resources. It has more transistors dedicated to control and scheduling logic, so it gets less raw performance per transistor but can handle multiple tasks concurrently.

Both can do the same things, but you have to optimize for either to get the most out of them.


----------



## Remij

Quote:


> Originally Posted by *Xuper*
> 
> If you give Async code to Driver and driver is doing something to make it shine while It's disabled on maxwell card, *score is not Valid!* because driver is doing Cheat.either Developer should disable Async compute ( by using gray out on Button ) Or we should see Perf hit with Async On.


I honestly couldn't care either way... as long as image quality isn't affected.

If Nvidia can do something in drivers to disable async on Maxwell gpus so there is no performance hit, then it's the right thing to do imo.

Again.. as long as there is no affect to image quality who cares what drivers do to get the performance they do? Async itself is a way gaining more performance. In the end, the numbers are important.. not how they got there.


----------



## Mahigan

Quote:


> Originally Posted by *PontiacGTX*
> 
> it seems devs can control how many fences/ barrier are used and when
> 
> and Nvidia tries to show that this breaks their DX12 perf
> 
> Do's
> 
> 
> Dont's
> Maybe they use same patch regardless of the architecture they use a basic procedure for every GPU which sends info to the numbers of CUs with the common way it works along all architecture even if the asynchronous compute queues dont take advantage of the architecture, like using Pascal path which works in Maxwell and GCN


They are the only ones who can control how many fences are used and when they are used. The driver has no control over fences according to Kollock. Kollock stated that the fences are invisible to the driver contrary to what the 3DMark rep is saying.

Every single Asynchronous + Graphics task requires a fence in order to synchronize both the Compute and Graphics context. So the more Asynchronous + Graphics work... the more fences.


----------



## Xuper

Quote:


> Originally Posted by *Remij*
> 
> I honestly couldn't care either way... as long as image quality isn't affected.
> 
> If Nvidia can do something in drivers to disable async on Maxwell gpus so there is no performance hit, then it's the right thing to do imo.
> 
> Again.. as long as there is no affect to image quality who cares what drivers do to get the performance they do? Async itself is a way gaining more performance. In the end, the numbers are important.. not how they got there.


Sorry I don't accept this kind of logic! When I say Do it , You should do it exactly what I am saying !! Not by using any trick to get any result.Test A = Async OFF , Test B = Async ON , If you Can't do Test B then your Score should be much lower than Test A.because I ordered you Do it exactly what I say not by using any optimization path.


----------



## comprodigy

The test in question actually does do a good bit of async. It was altered by AMD to use a good amount. You say in games we see much different, but so far in games, that hasnt been true really. In fact, the difference in ashes between on and off isnt much at all. So where exactly are you going with this? Are you saying that its impossible for async to be turned off in driver by Nvidia? That no matter what you're going to have async running on Maxwell if the programmer doesnt explicitly disable it in code? Thats easy enough to prove. Since AMDs own demonstration of async isnt enough, do you care to write your own program? I'll gladly test.


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> They are the only ones who can control how many fences are used and when they are used. The driver has no control over fences according to Kollock. Kollock stated that the fences are invisible to the driver contrary to what the 3DMark rep is saying.
> 
> Every single Asynchronous + Graphics task requires a fence in order to synchronize both the Compute and Graphics context. So the more Asynchronous + Graphics work... the more fences.


that is exactly what the text says
Quote:


> Submit work in parallel and evenly across several threads/cores to multiple command lists
> Recording commands is a CPU intensive operation and no driver threads come to the rescue
> Command lists are not free threaded so parallel work submission means submitting to multiple command lists
> 
> Reuse fragments recorded in bundles if you can
> No need to spend CPU time once again
> ...
> ...
> The DX11 driver is doing a great job of reducing barriers - now under DX12 you need to do it
> 
> To reiterate: We have seen redundant and/or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
> 
> Don't insert redundant barriers
> This limits parallelism
> A transition from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_RENDER_TARGET and back without any draw calls in-between is redundant
> Avoid read-to-read barriers
> 
> Don't sequentially call ID3D12CommandList::ResourceBarrier with just one barrier
> This doesn't allow the driver to pick the worst case of a set of barriers
> Don't expect fences to trigger signals/advance at a finer granularity then once per ExecuteCommandLists call.
> Multi GPU


----------



## Remij

Quote:


> Originally Posted by *Xuper*
> 
> Sorry I don't accept this kind of logic! When I say Do it , You should do it exactly what I am saying !! Not by using any trick to get any result.Test A = Async OFF , Test B = Async ON , If you Can't do Test B then your Score should be much lower than Test A.because I ordered you Do it exactly what I say not by using any optimization path.


There is no Test A and Test B though, and there doesn't need to be. This isn't a image quality affecting optimization. There's Async = On, which benefits AMD. There's Async = Disabled on Nvidia, which still benefits AMD, and benefits Nvidia Maxwell GPUs.

I doubt many game in the future will come with an Async on/off option because it's just something you'd naturally want to use on AMD, and something that Nvidia will disable in drivers to maintain better performance on their older hardware.

I might catch hell for this, but I'm a fan of driver cheats. *I'm a fan of anything they can do to improve performance as long as Image Quality isn't affected.*


----------



## PontiacGTX

Quote:


> Originally Posted by *Remij*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> There is no Test A and Test B though, and there doesn't need to be. This isn't a image quality affecting optimization. There's Async = On, which benefits AMD. There's Async = Disabled on Nvidia, which still benefits AMD, and benefits Nvidia Maxwell GPUs.
> 
> I doubt many game in the future will come with an Async on/off option because it's just something you'd naturally want to use on AMD, and something that Nvidia will disable in drivers to maintain better performance on their older hardware.
> 
> 
> I might catch hell for this, but I'm a fan of driver cheats. *I'm a fan of anything they can do to improve performance as long as Image Quality isn't affected.*


http://www.dsogaming.com/news/oxide-developer-nvidia-was-putting-pressure-on-us-to-disable-certain-settings-in-the-benchmark/


----------



## Xuper

Quote:


> Originally Posted by *Remij*
> 
> There is no Test A and Test B though, and there doesn't need to be. This isn't a image quality affecting optimization. There's Async = On, which benefits AMD. There's Async = Disabled, which still benefits AMD, and benefits Nvidia Maxwell GPUs.
> 
> I doubt many game in the future will come with an Async on/off option because it's just something you'd naturally want to use on AMD, and something that Nvidia will disable in drivers to maintain better performance on their older hardware.
> 
> I might catch hell for this, but I'm a fan of driver cheats. *I'm a fan of anything they can do to improve performance as long as Image Quality isn't affected.*


It's benchmark , we talk about Valid Result.People are using this benchmark for their logic.how can I trust ?

Read this post.


----------



## Mahigan

Quote:


> Originally Posted by *comprodigy*
> 
> The test in question actually does do a good bit of async. It was altered by AMD to use a good amount. You say in games we see much different, but so far in games, that hasnt been true really. In fact, the difference in ashes between on and off isnt much at all. So where exactly are you going with this? Are you saying that its impossible for async to be turned off in driver by Nvidia? That no matter what you're going to have async running on Maxwell if the programmer doesnt explicitly disable it in code? Thats easy enough to prove. Since AMDs own demonstration of async isnt enough, do you care to write your own program? I'll gladly test.


Maxwell drops in performance once Async Compute + Graphics is enabled under AotS as seen here...


In Time Spy.. we see this odd behavior whereas the performance stays the same..



That is the point of contention here.


----------



## Remij

Quote:


> Originally Posted by *PontiacGTX*
> 
> http://www.dsogaming.com/news/oxide-developer-nvidia-was-putting-pressure-on-us-to-disable-certain-settings-in-the-benchmark/


Yep, no sense in enabling things that adversely affect performance and make no difference to image quality.

Speaking of that though, what's the end result of the whole 'Nvidia doesn't render the terrain properly' in Ashes? Did they fix it or something, because completely maxed out, they're looking exactly the same these days.


----------



## magnek

Quote:


> Originally Posted by *GorillaSceptre*
> 
> Put the reason in my edit. One word - Refunds.


It helps level the playing field, but it's by no means perfect. I mean a GPU as long as you don't buy from a certain vendor *cough*Newegg*cough* you could always return for a full refund within 14 or 30 days.

To get a Steam refund you have 2 hours to try out the game. I mean who's to say devs won't make it so that the first 3 hours of the game are excellent and bug free, but after that it's just a steaming pile of dog turd on the fast track to trainwreck town.

And yes I'm extremely cynical (as if that wasn't obvious).


----------



## GorillaSceptre

Quote:


> Originally Posted by *Remij*
> 
> Yep, no sense in enabling things that adversely affect performance and make no difference to image quality.
> 
> Speaking of that though, what's the end result of the whole 'Nvidia doesn't render the terrain properly' in Ashes? Did they fix it or something, because completely maxed out, they're looking exactly the same these days.


Talking about image quality.. This has been popping off around the web.





Maybe we should take a closer look at what type of image quality differences there are in Time Spy. Nvidias drivers have some tricks up their sleeves.


----------



## PontiacGTX

Quote:


> Originally Posted by *Remij*
> 
> Yep, no sense in enabling things that adversely affect performance and make no difference to image quality.
> 
> Speaking of that though, what's the end result of the whole 'Nvidia doesn't render the terrain properly' in Ashes? Did they fix it or something, because completely maxed out, they're looking exactly the same these days.


I think that issue should have been fixed the game was patched recently and it was happening only on Nvidia


----------



## Dargonplay

Holy, and I just got a GTX 1070 instead of a 300$ Fury.

It's also impressive that a RX 480 at 232mm2 is reaching the same level of performance than Pascal 1070 at 312mm2.

Guess I'll keep it until Battlefield 1 comes out with DirectX 12.

The Doom thingy is also unnerving, I'm really starting to doubt of my buy, hope this new driver fixes it.


----------



## Remij

Quote:


> Originally Posted by *PontiacGTX*
> 
> http://www.dsogaming.com/news/oxide-developer-nvidia-was-putting-pressure-on-us-to-disable-certain-settings-in-the-benchmark/


I read it.

At the end of the day, I understand it completely. I just don't think that one is invalid over the other because they are still doing the same amount of work, whether it's done consecutively or concurrently.

However, I honestly wish that all this async business was entirely invisible to the consumer. I wish it was just 'hey look, we're really well optimized and handle parallel workloads, our performance is better than beforet' and on the other hand, 'hey look, we're really super fast at performing serial workloads, our performance is as good as it's ever been' and the drivers would just do what they do to optimize performance.


----------



## infranoia

Quote:


> Originally Posted by *Kpjoslee*
> 
> That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.


This is exactly the dilemma I expected from them. There's no way to have a single render path in a DX12 benchmark without optimizing it for the lowest common denominator and punishing the silicon with extra features.

"Impartial" benchmarking has become an oxymoron with DX12. You have to optimize for each vendor or you're unfairly punishing one of them. It just about makes the whole concept of "benchmark" meaningless.

They had no problem doing this with tessellation. Now suddenly they've got morals?


----------



## Remij

Quote:


> Originally Posted by *infranoia*
> 
> They had no problem doing this with tessellation. Now suddenly they've got morals?


I'd say there's a difference between doing the same workload (serially vs in parallel) and actively reducing the amount of workload with tessellation (geometry) is there not? Or am I not understanding this correctly?


----------



## Bidz

Quote:


> Originally Posted by *infranoia*
> 
> This is exactly the dilemma I expected from them. There's no way to have a single render path in a DX12 benchmark without optimizing it for the lowest common denominator and punishing the silicon with extra features.
> 
> "Impartial" benchmarking has become an oxymoron with DX12. You have to optimize for each vendor or you're unfairly punishing one of them. It just about makes the whole concept of "benchmark" meaningless.
> 
> They had no problem doing this with tessellation. Now suddenly they've got morals?


Exactly, this is just a joke, the double standards are too much, now that they are coming directly from a Futuremark developer I call this Benchmark 100% pointless anymore, no matter how much they come close to reality, it's clear they favor Nvidia philosophy.


----------



## infranoia

Quote:


> Originally Posted by *Remij*
> 
> I'd say there's a difference between doing the same workload (serially vs in parallel) and actively reducing the amount of workload with tessellation (geometry) is there not? Or am I not understanding this correctly?


I get you, but DX12 is not a one-size-fits-all API. Arguably DX11 was, but AMD suffered with high tess and had driver optimizations to keep such punishment within architectural limits. These driver optimizations became invalid within 3dmark, so they were left competing one-for-one with Nvidia.

OK 3dmark, that's fine if you want to look neutral, but now with DX12 AMD isn't allowed to shine with its parallel hardware-- it must remain on a level playing field with an NV-optimized render path. It's not an indication of game performance, unless that game is specifically NV-optimized and has very few if any AMD async shader optimizations.

See the theme here? The last 3dmark was NV-optimized with tessellation levels. The limitation was on the AMD side, and the fix was ignored / bypassed. This 3dmark is NV-optimized in its avoidance of Async Compute + Graphics, aka Async Shaders. The limitation is on the Nvidia side, and the fix is honored.

It's a valid benchmark as long as AMD knows its place.


----------



## Remij

Quote:


> Originally Posted by *infranoia*
> 
> I get you, but DX12 is not a one-size-fits-all API. Arguably DX11 was, but AMD suffered with high tess and had driver optimizations to keep such punishment within architectural limits. These driver optimizations became invalid within 3dmark, so they were left competing one-for-one with Nvidia.
> 
> OK 3dmark, that's fine if you want to look neutral, but now with DX12 *AMD isn't allowed to shine with its parallel hardware-- it must remain on a level playing field with an NV-optimized render path.* It's not an indication of game performance, unless that game is specifically NV-optimized and has very few if any AMD async shader optimizations.


Well, I'd argue that AMD's hardware is still being allowed to shine. But then again I guess it's also true that Nvidia's hardware isn't being allowed to look worse than it would have under a different codepath.

I see where everyone is coming from.

I think Futuremark should make a highly parallel Async compute benchmark (separate from labeling it a DX12 benchmark that uses async compute) and really show what a highly parallel GPU architecture can do. And hold no punches back. If the hardware doesn't support it. It doesn't even run. If somethings altered in drivers, then the results are invalid. Nvidia will be heading in that direction eventually.. it's pretty well understood that it's the future, so that when things can be performed the exact same way on both IHVs hardware, then we can see which architecture handles parallel workloads better.


----------



## infranoia

Quote:


> Originally Posted by *Remij*
> 
> I think Futuremark should make a highly parallel Async compute benchmark (separate from labeling it a DX12 benchmark that uses async compute) and really show what a highly parallel GPU architecture can do. And hold no punches back. Nvidia will be heading in that direction eventually.. it's pretty well understood that it's the future, so that when things can be performed the exact same way on both IHVs hardware, then we can see which architecture handles parallel workloads better.


They will surely do this, but not one moment before Volta shows up. They are a business, after all, and true vendor "neutrality" isn't possible outside of an open-source project and codebase.


----------



## infranoia

...Car analogy taken out behind the shed...

Carry on.


----------



## comprodigy

Thats 2% difference, actually within margin of error. And thats not a drastic loss in in performance either. With the disparity between runs in AotS, I have async runs that show higher fps than my non async.


----------



## p00q

Quote:


> Originally Posted by *Remij*
> 
> Well, I'd argue that AMD's hardware is still being allowed to shine. But then again I guess it's also true that Nvidia's hardware isn't being allowed to look worse than it would have under a different codepath.
> 
> I see where everyone is coming from.
> 
> I think Futuremark should make a highly parallel Async compute benchmark (separate from labeling it a DX12 benchmark that uses async compute) and really show what a highly parallel GPU architecture can do. And hold no punches back. If the hardware doesn't support it. It doesn't even run. If somethings altered in drivers, then the results are invalid. Nvidia will be heading in that direction eventually.. it's pretty well understood that it's the future, so that when things can be performed the exact same way on both IHVs hardware, then we can see which architecture handles parallel workloads better.


If async compute was hold back because it gave AMD too much of a lead (and NOT by punishing nVIDIA, which could run the test just as well with it off), then the test is not vaid. Async compute is a feature that brings performance, just like hyperthreading does, when it's properly implemented ( or it can bring higher image quality if you want to keep the same performance, but add some effects). For AMD is the way the hardware was designed to work in the first place in order to achieve maximum performance.

At the moment, the only logical explanation would be that 3DMark is trying not to upset nVIDIA too much. If you take out 1070 and 1080, their older products are being left behind by the even older products from AMD, such as R290/x


----------



## Glottis

Funny that people cry fault when 3DMark maybe is not using more Async to make AMD shine more, but no one cares that DOOM Vulcan doesn't even support Async for Nvidia, and that Pascal is losing about 6% in Vulcan benchmarks because of this.
Quote:


> Does DOOM support asynchronous compute when running on the Vulkan API?
> 
> Asynchronous compute is a feature that provides additional performance gains on top of the baseline id Tech 6 Vulkan feature set.
> 
> *Currently asynchronous compute is only supported on AMD GPUs and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.*


https://community.bethesda.net/thread/54585?tstart=0


----------



## EightDee8D

Quote:


> Originally Posted by *Glottis*
> 
> Funny that people cry fault when 3DMark maybe is not using more Async to make AMD shine more, but no one cares that DOOM Vulcan doesn't even support Async for Nvidia, and that Pascal is losing about 6% in Vulcan benchmarks because of this.
> https://community.bethesda.net/thread/54585?tstart=0


Because what you believe is async after seeing on time spy is not actually async. read what mahigan explained before crying.

why does nvidia's own sponsored game doesn't have async ? hint - they don't support it. means pascal doesn't support graphics + compute at the same time.


----------



## mtcn77

Quote:


> Originally Posted by *p00q*
> 
> If async compute was hold back because it gave AMD too much of a lead (and NOT by punishing nVIDIA, which could run the test just as well with it off), then the test is not vaid. Async compute is a feature that brings performance, just like hyperthreading does, when it's properly implemented ( or it can bring higher image quality if you want to keep the same performance, but add some effects). For AMD is the way the hardware was designed to work in the first place in order to achieve maximum performance.
> 
> At the moment, the only logical explanation would be that 3DMark is trying not to upset nVIDIA too much. If you take out 1070 and 1080, their older products are being left behind by the even older products from AMD, such as R290/x


370 is beating 780Ti in all seriousness.


----------



## daviejams

Quote:


> Originally Posted by *mtcn77*
> 
> 370 is beating 780Ti in all seriousness.


Is the 370 not the old HD 7850 rebrand ?


----------



## dmasteR

Quote:


> Originally Posted by *daviejams*
> 
> Is the 370 not the old HD 7850 rebrand ?


7870 I think actually.


----------



## DaaQ

Quote:


> Originally Posted by *Glottis*
> 
> Funny that people cry fault when 3DMark maybe is not using more Async to make AMD shine more, but no one cares that DOOM Vulcan doesn't even support Async for Nvidia, and that Pascal is losing about 6% in Vulcan benchmarks because of this.
> https://community.bethesda.net/thread/54585?tstart=0


How about keep waiting on that driver. IIRC it's not the dev's who deliver the drivers. I'm sure once the driver is released all sites will rerun their DOOM performance benchmarks.


----------



## Dudewitbow

Quote:


> Originally Posted by *daviejams*
> 
> Is the 370 not the old HD 7850 rebrand ?


370 is a 7970 with 1 6 pin connector, basically power limited 7870


----------



## EightDee8D

Quote:


> Originally Posted by *Dudewitbow*
> 
> 370 is a 7970 with 1 6 pin connector, basically power limited 7870


It's a 7850 actually. http://www.tomshardware.com/reviews/amd-radeon-r9-390x-r9-380-r7-370,4178.html


----------



## KarathKasun

Quote:


> Originally Posted by *EightDee8D*
> 
> It's a 7850 actually. http://www.tomshardware.com/reviews/amd-radeon-r9-390x-r9-380-r7-370,4178.html


Which is also the R9 R7 265. I have one and the performance gains are huge on that card.


----------



## dmasteR

Quote:


> Originally Posted by *EightDee8D*
> 
> It's a 7850 actually. http://www.tomshardware.com/reviews/amd-radeon-r9-390x-r9-380-r7-370,4178.html


7850 didn't have 1280 cores 32 Rops. 7870 on the other hand did.
Quote:


> Originally Posted by *KarathKasun*
> 
> Which is also the R9 265. I have one and the performance gains are huge on that card.


The 265 was the 7850 I thought?

1024 cores, 32Rops which is exactly what the 7850 was running.

EDIT:


----------



## daviejams

A revised version of the 7850 beating a 780ti what a time to be alive


----------



## mtcn77

Quote:


> Originally Posted by *dmasteR*
> 
> 7850 didn't have 1280 cores 32 Rops. 7870 on the other hand did.
> The 265 was the 7850 I thought?
> 
> 1024 cores, 32Rops which is exactly what the 7850 was running.
> 
> EDIT:


1280 shader one is exclusive to MSI.


Spoiler: 1280 core R9 370X









Spoiler: 1024 core R9 370


----------



## dmasteR

Quote:


> Originally Posted by *mtcn77*
> 
> 1280 shader one is exclusive to MSI.
> 
> 
> Spoiler: 1280 core R9 370X
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Spoiler: 1024 core R9 370


Any idea why MSI got a exclusive? Don't remember this at all apparently.


----------



## mtcn77

Quote:


> Originally Posted by *dmasteR*
> 
> Any idea why MSI got a exclusive? Don't remember this at all apparently.


Me neither, it wasn't mainstream. Though I recall some had arrived here. You could pick them by the difference in their reference clocks.


----------



## KarathKasun

Did a bunch of research on the SKU's when I got the R7 265. It was the same card as the R9 370, and R9 370X (1280 shader) was nearly vaporware. Picked up the R7 265 for $50 less than the R9 370 because it was on firesale due to it being "old".

One of the best ~$100 cards Ive ever picked up. Overclocks by 20% or so to boot.


----------



## Doothe

I took three screenshots, one of each game, in GPUView. From left to right, DOOM, AOTS, and Time Spy w/ Async On. Each timeline is roughly the same length of time. I'm still learning how to read, and interpret this information but I figured I'd share some of the images with you guys and maybe get a better understanding of whats going on.










the image is 4800x2560. i recommend opening it up in a separate tab.


----------



## AmericanLoco

Quote:


> Originally Posted by *Doothe*
> 
> I took three screenshots, one of each game, in GPUView. From left to right, DOOM, AOTS, and Time Spy. Each timeline is roughly the same length of time. I'm still learning how to read, and interpret this information but I figured I'd share some of the images with you guys and maybe get a better understanding of whats going on.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> the image is 4800x2560. i recommend opening it up in a separate tab.


So what's going on there? I don't what I'm looking at it, or how to interpret it. The only thing I notice different about 3D Mark, is that it seems like one of the 3D Queues stops when some compute work gets loaded in, so they're not being executed in parallel?


----------



## EightDee8D

Quote:


> Originally Posted by *Doothe*
> 
> I took three screenshots, one of each game, in GPUView. From left to right, DOOM, AOTS, and Time Spy. Each timeline is roughly the same length of time. I'm still learning how to read, and interpret this information but I figured I'd share some of the images with you guys and maybe get a better understanding of whats going on.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> the image is 4800x2560. i recommend opening it up in a separate tab.


Seems like no parallel graphics and compute on time spy as mahigan was saying. and compue is very less compared to other 2 GAMES.


----------



## JackCY

What kind of Doom is that OGL or Vulkan? Plus it's really impossible for us to tell anything without having the stats files opened in GPUview to see all the stuff.

The many blue/pink is IMHO work being done, shows as a fence, a synchronization. But it's been a while since I profiled any 3D. Would have to see it in GPUView myself not just from a picture.


----------



## Doothe

I need someone with a maxwell card to create a GPUView log of Time Spy's Graphics Test 2.


The blue region immediately below the Timeline Ruler is the GPU Hardware Queue Area. This is the area where GPUView displays information that represents the actual work done by the video hardware. As you zoom into smaller increments of time, the data view becomes more meaningful. Also, there will be one GPU Hardware Queue for each video adapter you have in the system. GPU Hardware Queue represents workflow on the hardware. In these workflow queues, the item on the bottom of the stack of rectangles represents the work that is currently executing. The rectangles stacked above it represent work that is in waiting.

The associated process to the hardware queue will have the matching color and matching selection in process Device Context Queue(Green Area). You can see I clicked one of the Hardware Queues in AOTS, and how it correlates to the AOTS_DX12.exe Device Context Queue below.


----------



## Guthra

I can't wait to see Vulkan implemented more places, and I think it's incredible how AMD's graphics cards somehow get a little better every year like a fine wine. Unlike wine, however, one should not consume a graphics card.


----------



## Doothe

Time Spy has a Pre-Emption Packet(black rectangle) in the 3D Queue that shows up every time a compute queue is processed

From Nvidia's whitepaper:
"Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out."

btw doom is vulkan. Idk if Vulkan is properly picked up by GPUView so disregard it if you want.


----------



## Slomo4shO

Quote:


> Originally Posted by *Doothe*
> 
> I took three screenshots, one of each game, in GPUView. From left to right, DOOM, AOTS, and Time Spy. Each timeline is roughly the same length of time. I'm still learning how to read, and interpret this information but I figured I'd share some of the images with you guys and maybe get a better understanding of whats going on.
> 
> 
> Spoiler: Image


Compute queues as a % of total run time:

Doom: 43.70%
AOTS: 90.45%
Time Spy: 21.38%


----------



## EightDee8D

Quote:


> Originally Posted by *Slomo4shO*
> 
> Compute queues as a % of total run time:
> 
> Doom: 43.70%
> AOTS: 90.45%
> Time Spy: 21.38%


But not 100% of those goes to run as async mode or run parallel to graphics. ( not 100% sure here btw)


----------



## AmericanLoco

From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.

So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, and the manner in which it does its async is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.


----------



## caswow

Quote:


> Originally Posted by *AmericanLoco*
> 
> From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.
> 
> So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, and the manner in which it does its async is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.


i bet it could be even faster if it was proper async. it favours automatically nvidias arch because gcn can it do the same way and better. why wouldnt they implement proper async...


----------



## EightDee8D

Quote:


> Originally Posted by *AmericanLoco*
> 
> From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.
> 
> So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, and the manner in which it does its async is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.


But it was allowed to take full advantage of tessellation on nvidia, and any driver optimization were flagged as invalid for amd.


----------



## JackCY

Quote:


> Originally Posted by *AmericanLoco*
> 
> It's just not being allowed to take full advantage of GCN hardware.


That's what I keep saying







They simply reused their older DX11 like approach with DX12 and the features they use are quite limited as well so that they can support old hardware so new HW with new features that older doesn't have they do not use. I bet they also want 1 engine with 1 path to run on all GPUs to make their Benchmark "valid", to them but it makes it invalid to me since it doesn't use each HW to it's maximum potential, be it NV or AMD or some other GPU.

Figuratively: Say there are two architectures, one has 1 thread to do the work and the other has 16 threads, now they make an engine that only uses 1 thread and try to compute parallel work using 1 thread so they switch context like mad to get it done, of course this engine works on both 1 and 16 threaded HW and runs the same speed in theory but that 16 threaded HW is underutilized as it could do 16 times more work at the same time if used in parallel with 16 submission threads. Context switching is expensive and so on.

This article has a bit of explanation of the differences between architectures and their features.


----------



## Doothe

Quote:


> Originally Posted by *Slomo4shO*
> 
> Compute queues as a % of total run time:
> 
> Doom: 43.70%
> AOTS: 90.45%
> Time Spy: 21.38%


I zoomed out for a much larger portion of Time Spy's run. Time Spy's Compute % actually goes down the further you zoom out.The 390 has a 3D queue and a compute queue. The 3D queue is packed 99%. The compute queue has less than 20% in this screenshot.
http://i.imgur.com/scB83OA.jpg

Also, I can't tell if the pre-emption packet ever gets executed. It doesn't look like it does because it's above the other rectangles being processed but never actually gets to the bottom. The pre-emption queue does get executed. I realized this in a later post. Anyway, you can tell a queue is being processed if the rectangle is at the bottom of the queue. The rectangles above are waiting in line afaik.

Sorry for all the edits. I think I've worded it correctly now


----------



## Doothe

I'll run GPUView/Time Spy without Async and post the results in a second.


----------



## JackCY

Quote:


> Originally Posted by *Doothe*
> 
> I zoomed out for a much larger portion of Time Spy's run. Time Spy's Compute % actually goes down the further you zoom out.The 390 has a 3D queue and a compute queue. The 3D queue is packed 99%. The compute queue has less than 20% in this screenshot.
> http://i.imgur.com/scB83OA.jpg
> 
> Also, I can't tell if the pre-emption packet ever gets executed. It doesn't look like it does because it's above the other rectangles being processed but never actually gets to the bottom. You can tell a queue is being processed if the rectangle is at the bottom of the queue. The rectangles above are waiting in line afaik.
> 
> Sorry for all the edits. I think I've worded it correctly now


In other words they don't bother to use the Compute power of AMD GCN? Correct? Somewhat correct? False?


----------



## Slomo4shO

Quote:


> Originally Posted by *Doothe*
> 
> I zoomed out for a much larger portion of Time Spy's run. Time Spy's Compute % actually goes down the further you zoom out.The 390 has a 3D queue and a compute queue. The 3D queue is packed 99%. The compute queue has less than 20% in this screenshot.
> http://i.imgur.com/scB83OA.jpg


So even lower at 18.76%...

The bench definitely isn't compute heavy.


----------



## Bidz

This can only get worse if Futuremark decides to increase compute load only when Volta gets released with full async capabilites.


----------



## Doothe

Quote:


> Originally Posted by *Slomo4shO*
> 
> So even lower at 18.76%...
> 
> The bench definitely isn't compute heavy.


It does look that way compared to AOTS, and DOOM. I don't have ROTR, Hitman, or any other DX12/Vulkan titles to test this theory against. In the two other games, GPUView shows two rectangles(compute queues) stacked on top of each other. Time Spy never needs to process more than one at a time.

BTW, Forget what I said about the pre-emption not being executed earlier. I looked closer and it is definitely being executed. Again I don't know what that's there for, and I couldn't find the Device Context associated with it. I do not see Pre-Emption on AOTS, or DOOM though so there's that. Maybe something worth looking into? Who knows.

Time Spy Async off/on comparison


----------



## PontiacGTX

Quote:


> Originally Posted by *AmericanLoco*
> 
> So what's going on there? I don't what I'm looking at it, or how to interpret it. The only thing I notice different about 3D Mark, is that it seems like one of the 3D Queues stops when some compute work gets loaded in, so they're not being executed in parallel?


time spy compute queues are less than AotS, most of them are graphics, and it seems they do double fences which could be throttling AMD´s compute+graphics perf and/or parrallelism
Quote:


> Minimize the use of barriers and fences
> We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports
> The DX11 driver is doing a great job of reducing barriers - now under DX12 you need to do it
> Any barrier or fence can limit parallelism




then 3dmark could run a single path where It fits most hardware, with pre emption


----------



## Bidz

Quote:


> Originally Posted by *PontiacGTX*
> 
> then 3dmark could run a single path where It fits most hardware, with pre emption


Incorrect, it fits Pascal architecture. THE ONLY architecture that currently is made for preemption.


----------



## PontiacGTX

Quote:


> Originally Posted by *Bidz*
> 
> Incorrect, it fits Pascal architecture. THE ONLY architecture that currently is made for preemption.


It can be used for GCN, but it wont take advantage of parrarellism and performance gains, Maxwell can do some degree of pre emption and it doesnt get negative performance(given how fences are limiting the contexts switching) and it can work in Pascal given it has improved pre emption

when people compares them Maxwell seems to have some degree of async compute(benchmark is aimed to it but it does pre emption) GCN can do pre emption but it isnt deliver same gains as async compute and Pascal shows their improved pre emption gains



Devs tell they use a single path but this only favors one side


----------



## Bidz

Quote:


> Originally Posted by *PontiacGTX*
> 
> ...
> Devs tell they use a single path but this only favors one side


With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal, other Nvidia architectures are not capable of async computing at all, and most of the AMD architectures in theory are left with spare room to be requested of much heavier async computing loads.

It's like Tessellation loads were designed to fit the inferior AMD capabilities back in the day. There is a clear pattern with Futuremark controversies regardless of who's on the right or wrong, and it's that they always favor Nvidia.


----------



## Master__Shake

Quote:


> Originally Posted by *Bidz*
> 
> With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal, other Nvidia architectures are not capable of async computing at all, and most of the AMD architectures in theory are left with spare room to be requested of much heavier async computing loads.
> 
> *It's like Tessellation loads were designed to fit the inferior AMD capabilities back in the day. There is a clear pattern with Futuremark controversies regardless of who's on the right or wrong, and it's that they always favor Nvidia.*


are you going to sell a benchmark that favours the minority?

people who buy this are users like you and i and if we know from forums and what not that it won't work well on our cards we won't buy it.


----------



## sugarhell

They clearly said that they have only a neutral optimization path. They support only FL_11.0 and the only way to achieve async compute on both architectures is by supporting the method that both GCN and Paxwell can support.

That means preemption.

Oh well it doesn't matter, it is just a benchmark.


----------



## mtcn77

Quote:


> Originally Posted by *Master__Shake*
> 
> [/B]
> 
> are you going to sell a benchmark that favours the minority?
> 
> people who buy this are users like you and i and if we know from forums and what not that it won't work well on our cards we won't buy it.


Wait! They were losing money when people get to test it for free, right?


----------



## Bidz

Quote:


> Originally Posted by *Master__Shake*
> 
> [/B]
> 
> are you going to sell a benchmark that favours the minority?
> 
> people who buy this are users like you and i and if we know from forums and what not that it won't work well on our cards we won't buy it.


So now it's ok to drop objectivity in favor of sales?

You know, in cases of highly important stuff like measuring safety, contamination, etc, dropping objectivity in favor of "sales" can send you to prison.
Quote:


> Originally Posted by *sugarhell*
> 
> They clearly said that they have only a neutral optimization path. They support only FL_11.0 and the only way to achieve async compute on both architectures is by supporting the method that both GCN and Paxwell can support.
> 
> That means preemption.
> 
> Oh well it doesn't matter, it is just a benchmark.


It's not neutral if it's not really measuring the full extent of async capabilities by limiting em to fit one side capabilities.


----------



## ZealotKi11er

Quote:


> Originally Posted by *sugarhell*
> 
> They clearly said that they have only a neutral optimization path. They support only FL_11.0 and the only way to achieve async compute on both architectures is by supporting the method that both GCN and Paxwell can support.
> 
> That means preemption.
> 
> Oh well it doesn't matter, it is just a benchmark.


Yeah. It's only a benchmark and now it's nothing for me. Going to skip 3DMark from this point on. If anything 3DMark should be pushing both sides of the GPUs to the MAX to show what the GPUs are capable off.


----------



## airfathaaaaa

Quote:


> Originally Posted by *sugarhell*
> 
> They clearly said that they have only a neutral optimization path. They support only FL_11.0 and the only way to achieve async compute on both architectures is by supporting the method that both GCN and Paxwell can support.
> 
> That means preemption.
> 
> Oh well it doesn't matter, it is just a benchmark.


thats the problem
in the past they never did optimise a feature that both parties had they optimised their code in relation to the api..
now suddenly they go the other way around..being neutral tailoring their bench to be suited only for one party


----------



## mtcn77

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Yeah. It's only a benchmark and now it's nothing for me. Going to skip 3DMark from this point on. If anything 3DMark should be pushing both sides of the GPUs to the MAX to show what the GPUs are capable off.


For those missing out, it was demonstrated at GOC Asia _Nvidia_ Event. What else could you be looking out for?


----------



## Bauxno

They they shouldnt calling that setting only async and maybe put a disclaimer or information about what is his async doing and why it make amd card look less impresive on a dx12 bench. Or change the name to timespy dx12 FL_11


----------



## Randomdude

Quote:


> Originally Posted by *sugarhell*
> 
> They clearly said that they have only a neutral optimization path. They support only FL_11.0 and the only way to achieve async compute on both architectures is by supporting the method that both GCN and Paxwell can support.
> 
> That means preemption.
> 
> Oh well it doesn't matter, it is just a benchmark.


You have two people and they're both told that in time, they will have to pass a physical exam. Both of them are aware of this coming task. The physical exam is the benchmark for them. Person A is 100% fit when they are given the and has been working for years to be prepared. Person B is at 70% of person A's physical fitness level. The exam is something they both have to work towards completing, however when the exam day finally comes, the difficult hill to climb that both people have been working their asses off to overcome... it is only capable of stressing 70% of person A's potential, wasting some 30% of his hard work, while Person B who should've had to work more as he was behind, not having done anything in the years, has been rewarded saving some years of hard work because what was supposed to be a 120% person A's abilities task turned out to be nothing more than an exam tailored to let person B pass with complete disregard for A's hard work.


----------



## infranoia

The only thing this proves is that the whole concept of a neutral "benchmark" has been false for about 15 years.

It's meaningless to have one codepath that "benchmarks" different chip architectures with different feature sets. All you are doing is testing that one particular codepath. Futuremark has always made the decision that the one codepath that all chip architectures will be measured against, is the one that matches Nvidia's feature set.

It's a good thing that GPU reviews tend to minimize 3dmark for the far more important game benchmarks. It's how it should be.


----------



## criminal

Wouldn't the last few pages of posts be more relevant posted here: http://www.overclock.net/t/1605899/various-futuremark-releases-3dmark-time-spy-directx-12-benchmark/500#post_25358347

Seems to be more griping about how one sided Time Spy is in this thread than actual discussion about how good Vulcan is for AMD.


----------



## infranoia

Quote:


> Originally Posted by *criminal*
> 
> Wouldn't the last few pages of posts be more relevant posted here: http://www.overclock.net/t/1605899/various-futuremark-releases-3dmark-time-spy-directx-12-benchmark/500#post_25358347
> 
> Seems to be more griping about how one sided Time Spy is in this thread than actual discussion about how good Vulcan is for AMD.


Trying to reconcile the two, and landing on game vs. benchmark. But there are two of these Doom/Vulkan threads as well. I await the Überthread.


----------



## ZealotKi11er

There is nothing DX12 about Time Spy. DX12 reduces CPU Overhead and makes it possible to achieve scenes never possible with DX11. The Time Spy benchmark can easily be achieved with DX11.


----------



## Kravicka

Show me


----------



## Ext3h

Quote:


> Originally Posted by *Mahigan*
> 
> Maxwell drops in performance once Async Compute + Graphics is enabled under AotS as seen here...
> 
> 
> In Time Spy.. we see this odd behavior whereas the performance stays the same..
> 
> 
> 
> That is the point of contention here.


Shall we solve this riddle now?

As we know, for Maxwell cards, the OS will apply CPU side cooperative scheduling to merge the command buffers submitted to multiple software queues into a single one for submission to the GPU.

When we compare async on and off, the application essentially does the same, respectively the developer did manually when designing the application. With async off, all the work gets queued into the same queue.

With async on, and the OS performing the scheduling, we can run into two *different* cases though:

The OS queues the command buffers in the precise same order as the developer would had.
The OS finds a different valid order for the command buffers, so the execution order differs.
I suspect what we see, whenever Maxwell suffers from async on, is actually the second case. The OS made a bad choice when scheduling, and induced some type of stall / wait on barrier / memory transfer which would have been hidden with the hand tuned schedule the developer specified in the case of async off.

When the OS doesn't stumble, it's either a sign that the order of execution didn't matter (no additional stalls induced), or that the OS coincidentally came to the very same schedule as the the application would have had.

All of this obviously assumes that when you tell an application not to use async, that it won't perform additional optimizations internally instead (such as e.g. eliminating redundant barriers and fences by performing manual, application side state tracking, hence reducing the effective overhead).

For 3DMark at least, no such optimization happens on the application side. So if either a mismatching order is still stall-free, or if the software scheduler chose the same order, you just won't see a difference.

Oh, and why did I say OS? Because the software scheduler is apparently provided by Microsoft, it's NOT part of the driver.

Nvidia can't fix it, or tune it better. It's simply not in their domain. That poor performance is a bug in Windows 10, not in Nvidias driver.


----------



## Kravicka

or

old AotS bench where maxwell was trying Async but get performance hit

....
so nvidia forced Async in drivers to be off for maxwell
....

Time Spy on maxwell in not using async even if u have async on in app ..

MAGIC


----------



## kaosstar

Quote:


> Originally Posted by *ZealotKi11er*
> 
> DX12 ... makes it possible to achieve scenes never possible with DX11.


I wasn't aware of this. Is there an example?


----------



## ZealotKi11er

Quote:


> Originally Posted by *kaosstar*
> 
> I wasn't aware of this. Is there an example?


Not really because there are no true DX12 only games.


----------



## Ext3h

Quote:


> Originally Posted by *Kravicka*
> 
> Time Spy on maxwell in not using async whenever u have async on/off in app ..


You can't see that based on GPUView or alike. That shows you the hardware queues after the OS has merged them, not the software queues the application sees.
The application can't tell whether the queues are just emulated or mapped directly to hardware.

Nvidia wasn't wrong when they said that they had never "enabled" async compute in the driver for Maxwell. They didn't, and neither is it enabled today. That's the operating system forcibly adding the emulation and the (partially subpar) software scheduling, when it detects that the application is requesting multiple queues, but the driver is providing less than requested.


----------



## moustang

Quote:


> Originally Posted by *Bidz*
> 
> So now it's ok to drop objectivity in favor of sales?
> 
> You know, in cases of highly important stuff like measuring safety, contamination, etc, dropping objectivity in favor of "sales" can send you to prison.


Good thing video card benchmarks aren't important, eh? I mean, no one has died or had their health seriously effected because a benchmark favored one chipset over another.

Quote:


> It's not neutral if it's not really measuring the full extent of async capabilities by limiting em to fit one side capabilities.


Sort of like it not really being neutral because it runs tessellation on the CPU rather than the GPU because Nivida cards would kill the AMD cards if tessellation was performed by the GPU, right?

I mean, let's be totally "neutral" here and run the Physics test using GPU based tessellation. If you're not doing that then your limiting your test to fit AMDs capabilities, right?

Or does that testing bias only apply when the benchmark would favor AMD?


----------



## Exeed Orbit

Quote:


> Originally Posted by *moustang*
> 
> Good thing video card benchmarks aren't important, eh? I mean, no one has died or had their health seriously effected because a benchmark favored one chipset over another.
> Sort of like it not really being neutral because it runs tessellation on the CPU rather than the GPU because Nivida cards would kill the AMD cards if tessellation was performed by the GPU, right?
> 
> I mean, let's be totally "neutral" here and run the Physics test using GPU based Physx. If you're not doing that then your limiting your test to fit AMDs capabilities, right?
> 
> Or does that testing bias only apply when the benchmark would favor AMD?


Tesselation is not exclusive to one GPU. PhysX is. Nice try.


----------



## mypickaxe

Quote:


> Originally Posted by *ZealotKi11er*
> 
> Quote:
> 
> 
> 
> Originally Posted by *kaosstar*
> 
> I wasn't aware of this. Is there an example?
> 
> 
> 
> Not really because there are no true DX12 only games.
Click to expand...

So...what qualifies as a "true DX12 only game"? I'm curious since there are at least a couple of them on the Windows Store. Quantum Break, Forza 6 Apex to name two.


----------



## Mahigan

Quote:


> Originally Posted by *Ext3h*
> 
> Shall we solve this riddle now?
> 
> As we know, for Maxwell cards, the OS will apply CPU side cooperative scheduling to merge the command buffers submitted to multiple software queues into a single one for submission to the GPU.
> 
> When we compare async on and off, the application essentially does the same, respectively the developer did manually when designing the application. With async off, all the work gets queued into the same queue.
> 
> With async on, and the OS performing the scheduling, we can run into two *different* cases though:
> 
> The OS queues the command buffers in the precise same order as the developer would had.
> The OS finds a different valid order for the command buffers, so the execution order differs.
> I suspect what we see, whenever Maxwell suffers from async on, is actually the second case. The OS made a bad choice when scheduling, and induced some type of stall / wait on barrier / memory transfer which would have been hidden with the hand tuned schedule the developer specified in the case of async off.
> 
> When the OS doesn't stumble, it's either a sign that the order of execution didn't matter (no additional stalls induced), or that the OS coincidentally came to the very same schedule as the the application would have had.
> 
> All of this obviously assumes that when you tell an application not to use async, that it won't perform additional optimizations internally instead (such as e.g. eliminating redundant barriers and fences by performing manual, application side state tracking, hence reducing the effective overhead).
> 
> For 3DMark at least, no such optimization happens on the application side. So if either a mismatching order is still stall-free, or if the software scheduler chose the same order, you just won't see a difference.
> 
> Oh, and why did I say OS? Because the software scheduler is apparently provided by Microsoft, it's NOT part of the driver.
> 
> Nvidia can't fix it, or tune it better. It's simply not in their domain. That poor performance is a bug in Windows 10, not in Nvidias driver.


I am curious... is the case dependent on the amount of scheduling requests/intructions being made? Meaning that under heavier loads... does the second case become more likely?

Seems to me that Vulkan likely will not exhibit this behavior then. It appears to be a Windows 10 DX12 issue.


----------



## StrongForce

For real though ?? +50% boost at 1080p with Fury, that's completely insane lol seems like all this hard word for implenting a new API with AMD is finally paying off.. curious to see future titles etc !


----------



## PontiacGTX

Quote:


> Originally Posted by *Mahigan*
> 
> I am curious... is the case dependent on the amount of scheduling requests/intructions being made? Meaning that under heavier loads... does the second case become more likely?
> 
> Seems to me that Vulkan likely will not exhibit this behavior then. It appears to be a Windows 10 DX12 issue.


Also the barriers can be done oon purpose to achieve performance gains on maxwell and maybe pascal the added latency could make some SPs get waiting for the next tasks, then this seems like is from the developer and not the OS, since most said Developer are the ones which use barriers/fences for the command list/batches


----------



## flippin_waffles

Vulkan is a home run for the open source community and this debut seems to have created quite a splash! Also having Google supporting the API should make it wide spread. It looks very efficient and Doom 3 showcased it very nicely and these recent example wont even be utilizing the new LLAPIs to there full potential with nearly all being added on instead of being built from the ground up (its amazing that these complex APIs can be audded on to an already complex API in the first place). Developers are probably adopting this low level shift quite quickly given they have so much to offer.
With all this coming to market now and getting better as time goes on im looking forward to the results of Doom 3 with Vulkan. With NV already having a reputation for how quickly their cards become outdated, also having GPUs that look dated for the games coming to the market now and future doesnt look very good. If the RX 480 beats the 1060 in Doom 3, then that will be a good indication that it will continue to do so with future games. As with the rest of NV cards when the 1060 starts to lag further and further behind, NV also has to deal with GCN's ability with the new APIs. The value of practically all NV cards should be quite low considering how quickly their performance ranking drops.


----------



## Ext3h

Quote:


> Originally Posted by *Mahigan*
> 
> I am curious... is the case dependent on the amount of scheduling requests/intructions being made? Meaning that under heavier loads... does the second case become more likely?
> 
> Seems to me that Vulkan likely will not exhibit this behavior then. It appears to be a Windows 10 DX12 issue.


Well, the finer grained the individual command buffers are, and the smaller the ratio of buffers to synchronization points, the more possible solutions you have for scheduling, and hence you increase the chance of getting an alternate schedule. There are possibly also other factors playing in, e.g. possibly the order in which the buffers have been submitted and alike, and the priorities which the scheduler effectively makes up from these criteria.

Whether this actually make a measurable difference depends strongly on the application.

It would be foolish to assume that an 3D application is purely synchronous just because there isn't an asynchronous compute queue.
You actually have quite a lot of concurrent tasks running in parallel which are handling (implicit) resource transfers and preparing various resources for access in a specific context. Even on Maxwell GPUs. In the best case, you don't notice because the schedule of your draw and dispatch calls is such that there is plenty of time for the copy engine to work in the background, unnoticed. Barriers are needed to ensure that, if the timing doesn't work out, the shaders won't run unless they can safely access the data. If you mess up the schedule, it can now happen that you start blocking on barriers which wouldn't had any effect before. In the worst case, the barriers were even trivial to ignore previously because they demanded a state which was already established, while after the order was messed up, the state suddenly needs to be toggled as the command buffers are no longer neatly sorted by state.

And no, the same issue shouldn't occur with Vulkan. Mostly since the Vulkan API shouldn't claim to support multiple queues if not backed by the hardware.

DirectX and Vulkan follow a different ideology on this subject.
Microsoft chose to include asynchronous scheduling in the smallest common denominator, even if that means including a bogus compatibility layer for that purpose.
Vulkan isn't primarily about giving access to a common denominator, but to provide a low level API which can sufficiently accurately represent the hardware architecture.
Both are low level APIs, but they differ in the default strategies for hiding or exposing differences in feature capabilities. Coarse grained feature levels on DX12 with tons of mandatory features, fine grained capability bits on Vulkan.


----------



## Zero989

Quote:


> Originally Posted by *Ext3h*
> 
> Well, the finer grained the individual command buffers are, and the smaller the ratio of buffers to synchronization points, the more possible solutions you have for scheduling, and hence you increase the chance of getting an alternate schedule. There are possibly also other factors playing in, e.g. possibly the order in which the buffers have been submitted and alike, and the priorities which the scheduler effectively makes up from these criteria.
> 
> Whether this actually make a measurable difference depends strongly on the application.
> 
> It would be foolish to assume that an 3D application is purely synchronous just because there isn't an asynchronous compute queue.
> You actually have quite a lot of concurrent tasks running in parallel which are handling (implicit) resource transfers and preparing various resources for access in a specific context. Even on Maxwell GPUs. In the best case, you don't notice because the schedule of your draw and dispatch calls is such that there is plenty of time for the copy engine to work in the background, unnoticed. Barriers are needed to ensure that, if the timing doesn't work out, the shaders won't run unless they can safely access the data. If you mess up the schedule, it can now happen that you start blocking on barriers which wouldn't had any effect before. In the worst case, the barriers were even trivial to ignore previously because they demanded a state which was already established, while after the order was messed up, the state suddenly needs to be toggled as the command buffers are no longer neatly sorted by state.
> 
> And no, the same issue shouldn't occur with Vulkan. Mostly since the Vulkan API shouldn't claim to support multiple queues if not backed by the hardware.
> 
> DirectX and Vulkan follow a different ideology on this subject.
> Microsoft chose to include asynchronous scheduling in the smallest common denominator, even if that means including a bogus compatibility layer for that purpose.
> Vulkan isn't primarily about giving access to a common denominator, but to provide a low level API which can sufficiently accurately represent the hardware architecture.
> Both are low level APIs, but they differ in the default strategies for hiding or exposing differences in feature capabilities. Coarse grained feature levels on DX12 with tons of mandatory features, fine grained capability bits on Vulkan.


Just wanted you to know that my score improves using 2 pascal GPUs in SLI with ASYNC off in Timespy.


----------



## jckaboom

So they need to do a different path for amd in order to get full benefits from asynchronous compute?
But 1080 and 1070 are way ahead of fury X.
All 3 cards are getting some benefit from asynchronous on vs off, but 1070 with async off should be on par with 980ti since maxwell can't use async but is not the case.
Fury X should be on par with 1070?
If we can't compare amd vs nvidia here, why 1070 is so much faster that 980ti with async off?


----------



## Exeed Orbit

Quote:


> Originally Posted by *moustang*
> 
> Good thing video card benchmarks aren't important, eh? I mean, no one has died or had their health seriously effected because a benchmark favored one chipset over another.
> Sort of like it not really being neutral because it runs tessellation on the CPU rather than the GPU because Nivida cards would kill the AMD cards if tessellation was performed by the GPU, right?
> 
> I mean, let's be totally "neutral" here and run the Physics test using GPU based Physx. If you're not doing that then your limiting your test to fit AMDs capabilities, right?
> 
> Or does that testing bias only apply when the benchmark would favor AMD?


Tesselation is not exclusive to one GPU. PhysX is. Nice try.
Quote:


> Originally Posted by *jckaboom*
> 
> So they need to do a different path for amd in order to get full benefits from asynchronous compute?
> But 1080 and 1070 are way ahead of fury X.
> All 3 cards are getting some benefit from asynchronous on vs off, but 1070 with async off should be on par with 980ti since maxwell can't use async but is not the case.
> Fury X should be on par with 1070?
> If we can't compare amd vs nvidia here, why 1070 is so much faster that 980ti with async off?


Because the Maxwell architecture wasn't designed to take advantage of Async. Neither is the new Pascall architecture, but it does a decent job at emulating it by using clever scheduling protocols.

Whereas, GCN (all the way back to 1.0) was designed with Async Compute in mind. So any API that uses this feature will benefit AMD cards, while not really benefitting Nvidia's. Make sense?


----------



## rcfc89

Quote:


> Originally Posted by *Exeed Orbit*
> 
> Tesselation is not exclusive to one GPU. PhysX is. Nice try.
> Because the Maxwell architecture wasn't designed to take advantage of Async. Neither is the new Pascall architecture, but it does a decent job at emulating it by using clever scheduling protocols.
> 
> Whereas, GCN (all the way back to 1.0) was designed with Async Compute in mind. So any API that uses this feature will benefit AMD cards, while not really benefitting Nvidia's. Make sense?


Name 5 games that are AAA title's upcoming that will use Async?

This misinformation floating around that only Amd gpu's will benefit from Dx12 is utterly ridiculous. Nvidia will see the exact same benefit. Its only games that use Async that will give Amd the more significant boost. But in all honesty with their current line-up they need all the help they can get.


----------



## Exeed Orbit

Quote:


> Originally Posted by *rcfc89*
> 
> Name 5 games that are AAA title's upcoming that will use Async?
> 
> This misinformation floating around that only Amd gpu's will benefit from Dx12 is utterly ridiculous. Nvidia will see the exact same benefit. Its only games that use Async that will give Amd the more significant boost. But in all honesty with their current line-up they need all the help they can get.


I don't remember seeing previous generation Nvidia cards benefit from Async. I could be wrong. But even then, no, Nvidia's GPUs will not see the *exact* same benefit. The reason being that on DX11, AMD GPUs had a pretty bad CPU overhead issue. This is no longer the case in DX12. So AMD GPUs are getting the boost from the overhead issue being somewhat resolved, and async compute being employed. All I'm saying is, AMD stands to gain more from DX12 than Nvidia does.

And my point wasn't to indicate that Nvidia GPUs *DON'T* benefit from Async at all. I simply stated that the architecture wasn't designed with it in mind.


----------



## mtcn77

Quote:


> Originally Posted by *rcfc89*
> 
> Name 5 games that are AAA title's upcoming that will use Async?
> 
> This misinformation floating around that only Amd gpu's will benefit from Dx12 is utterly ridiculous. *Nvidia will see the exact same benefit.* Its only games that use Async that will give Amd the more significant boost. *But in all honesty with their current line-up they need all the help they can get.*


Fair enough. Then, why aren't "Gpu intrinsics" & "Asynchronous Shaders" not taken advantage of in Futuremark? Notice: Nvidia specifically calls their implementation "Preemption" - they know what they are doing. It would clear the misunderstanding as to what 'Async' really is, if we stuck with vendor conventions rather than coming up with alternative meanings ourselves.
Clearly, Pascal needs all the help it can get in order to hold onto the fastest card title.


----------



## jckaboom

So if this benchmark is garbage to compare asynchronous compute between nvidia and amd, what happens when we turn off async? Is all messed up?


----------



## rcfc89

Quote:


> Originally Posted by *Exeed Orbit*
> 
> I don't remember seeing previous generation Nvidia cards benefit from Async. I could be wrong. But even then, no, Nvidia's GPUs will not see the *exact* same benefit. The reason being that on DX11, AMD GPUs had a pretty bad CPU overhead issue. This is no longer the case in DX12. So AMD GPUs are getting the boost from the overhead issue being somewhat resolved, and async compute being employed. All I'm saying is, AMD stands to gain more from DX12 than Nvidia does.
> 
> And my point wasn't to indicate that Nvidia GPUs *DON'T* benefit from Async at all. I simply stated that the architecture wasn't designed with it in mind.


Turn off Async on DOOM and watch the performance increase from AMD drop tremendously. Its not DX12. Both Nvidia and Amd will benefit equally on Dx12 unless Async compute is used. So again I'm asking anyone with more knowledge on the subject then I. Name 5 big AAA games hitting the market in the next year that utilize Async?


----------



## Dargonplay

Quote:


> Originally Posted by *jckaboom*
> 
> So they need to do a different path for amd in order to get full benefits from asynchronous compute?
> But 1080 and 1070 are way ahead of fury X.
> All 3 cards are getting some benefit from asynchronous on vs off, but 1070 with async off should be on par with 980ti since maxwell can't use async but is not the case.
> Fury X should be on par with 1070?
> If we can't compare amd vs nvidia here, why 1070 is so much faster that 980ti with async off?


The Fury X is even ahead the GTX 1080 when using Vulkan + Async shaders (TSSAA) n Doom so go figure.

You are dead wrong.

Also Async Shaders aren't used in Doom unless you select the TSSAA Anti Aliasing setting in Vulkan Doom, to see how big of an impact Async shaders have in Doom you can simply select SMAA or any other AA setting which disables Async Computing (The developers have used the Async Shaders as a way to offload TSSAA's performance impact on GNC cards)

The biggest gain we see on GCN cards is coming thanks to CPU overhead being completely eliminated, and not because of Async computing.

Please, do watch the entirety of this video as it goes into details beyond simply benchmarking the game with different cards, it explains why things happen and how so you can have a clearer view of what's going on, and the exact amount of performance percentage gain Async Shaders provide in Vulkan Doom with GCN cards.




Quote:


> Originally Posted by *jckaboom*
> 
> Can you post the link?
> I remember see than 1080 in ogl was faster than fury X on Vulkan.


I just posted the video above, and of course a GTX 1080 is not only going to be faster than a Fury X in OpenGL, a GTX 1080 in OpenGL will trash the Fury X, fact.

Another fact is that the situation reverse when using Vulkan, not so much as the Fury X trashes the GTX 1080 but t does beats it when using TSSAA (Async Shaders) on Vulkan.


----------



## jckaboom

Can you post the link?
I remember see than 1080 in ogl was faster than fury X on Vulkan.


----------



## Exeed Orbit

Quote:


> Originally Posted by *rcfc89*
> 
> Turn off Async on DOOM and watch the performance increase from AMD drop tremendously. Its not DX12. Both Nvidia and Amd will benefit equally on Dx12 unless Async compute is used. So again I'm asking anyone with more knowledge on the subject then I. Name 5 big AAA games hitting the market in the next year that utilize Async?


My question to you is, WHY would you turn off one of those most useful features of a new low level API. Because it doesn't suit your brand? Async compute has been said by various developers to be of added value to game performance.

And my problem with the topic is as follows:

"Some people will simply say, don't worry, just wait for Volta, it should have hardware level support for Async Compute"

If so, great.

But what of the people that spent money on Pascal? Does their money not matter?


----------



## mtcn77

Quote:


> Originally Posted by *rcfc89*
> 
> Turn off Async on DOOM and watch the performance increase from AMD drop tremendously. Its not DX12. Both Nvidia and Amd will benefit equally on Dx12 unless Async compute is used. So again I'm asking anyone with more knowledge on the subject then I. Name 5 big AAA games hitting the market in the next year that utilize Async?


This is a slippery slope. It depends on adoption by volume. Consoles are playing to AMD's hand, so it is a free incentive - on AMD's part - rather than a couple of millions to incentivize. And obviously, the company that accepts the incentive is actually rowing against the tide by giving up the consumers, so all the more likely to go out of business. They can't just publish games without public reception all the time.


----------



## rcfc89

Quote:


> Originally Posted by *Exeed Orbit*
> 
> My question to you is, WHY would you turn off one of those most useful features of a new low level API. Because it doesn't suit your brand? Async compute has been said by various developers to be of added value to game performance.
> 
> And my problem with the topic is as follows:
> 
> "Some people will simply say, don't worry, just wait for Volta, it should have hardware level support for Async Compute"
> 
> If so, great.
> 
> But what of the people that spent money on Pascal? Does their money not matter?


I use TSAA 8X in Doom either way because it looks fantastic. I actually saw a 15% jump with Vulcan on a single 980Ti. I'm happy with its use. With game maxed out in 3440x1140 (Nightmare texture's/ Sharpness 4x) I was floating around between 80-85fps on OpenGL. With Vulcan I never drop below 100fps which fits my display perfectly. I'm just curious to see some benchmarks with TSAA turned off (Async disabled). I'm also curious how well the 4gb of the FuryX holds up in ultra settings in higher resolutions. I'm seeing 4.5-5gb memory usage at times.


----------



## Exeed Orbit

Quote:


> Originally Posted by *rcfc89*
> 
> I use TSAA 8X in Doom either way because it looks fantastic. I actually saw a 15% jump with Vulcan on a single 980Ti. I'm happy with its use. With game maxed out in 3440x1140 (Nightmare texture's/ Sharpness 4x) I was floating around between 80-85fps on OpenGL. With Vulcan I never drop below 100fps which fits my display perfectly. I'm just curious to see some benchmarks with TSAA turned off (Async disabled). I'm also curious how well the 4gb of the FuryX holds up in ultra settings in higher resolutions. I'm seeing 4.5-5gb memory usage at times.


I'd imagine the Fury X getting the piss kicked out of it by the 980Ti. Luckily, people with a Fury have the option. But now that you mention it, I'm curious too.


----------



## jckaboom

"Another fact is that the situation reverse when using Vulkan, not so much as the Fury X trashes the GTX 1080 but t does beats it when using TSSAA (Async Shaders) on Vulkan."

Where I can see that?


----------



## Exeed Orbit

Quote:


> Originally Posted by *jckaboom*
> 
> "Another fact is that the situation reverse when using Vulkan, not so much as the Fury X trashes the GTX 1080 but t does beats it when using TSSAA (Async Shaders) on Vulkan."
> 
> Where I can see that?


I haven't seen it compared to a 1080 on Vulkan, only compared to the 1070. Did someone mention that the Fury X beat the 1080 on Vulkan?


----------



## jckaboom

Dargonplay just did 30min ago.
"Another fact is that the situation reverse when using Vulkan, not so much as the Fury X trashes the GTX 1080 but t does beats it when using TSSAA (Async Shaders) on Vulkan."


----------



## mtcn77

Quote:


> Originally Posted by *jckaboom*
> 
> Dargonplay just did 30min ago.
> "Another fact is that the situation reverse when using Vulkan, not so much as the Fury X trashes the GTX 1080 but t does beats it when using TSSAA (Async Shaders) on Vulkan."


It is subject to a non-disclosure agreement, probably. I would guess they are rather close at stock.
That lead against 1070 is uncontested - at no point do the lines intersect.



This is the cpu frame time latency at 2K resolution. You can see that Fury X is very much up there.


----------



## criznit

Quote:


> Originally Posted by *Dargonplay*
> 
> I just posted the video above, and of course a GTX 1080 is not only going to be faster than a Fury X in OpenGL, a GTX 1080 in OpenGL will trash the Fury X, fact.
> 
> Another fact is that the situation reverse when using Vulkan, not so much as the *Fury X trashes the GTX 1080* but t does beats it when using TSSAA (Async Shaders) on Vulkan.


Where is this shown in the video? All I gathered was due to the percentages it should but it's not shown. The Fury X will fall in between the 1070 and the 1080 but won't beat it under vulkan. BUT unless nvidia starts being serious (lol) they will be in trouble until volta.

edit - spelling


----------



## jckaboom

i finally found a benckmark with Fury X vs GTX1080....

http://gamegpu.com/action-/-fps-/-tps/doom-api-vulkan-test-gpu



Seems like they are using SMAA maybe with TSAA enabled Fury X is close to 1080.


----------



## Dargonplay

Quote:


> Originally Posted by *jckaboom*
> 
> i finally found a benckmark with Fury X vs GTX1080....
> 
> http://gamegpu.com/action-/-fps-/-tps/doom-api-vulkan-test-gpu
> 
> 
> 
> Seems like they are using SMAA maybe with TSAA enabled Fury X is close to 1080.


I don't understand these benchmarks sites... Why they didn't used TSSAA (Async Shaders) when it is the best AA setting BY FAR on Doom, and the only way to enable Async Shaders for AMD is enabling TSSAA.

There is absolutely no reason to not use TSSAA other than to make Nvidia look better, when using TSSAA (Async) the Fury X seems to perform just like a GTX 1080 if not a tiny bit better.


----------



## flippin_waffles

Quote:


> Originally Posted by *Dargonplay*
> 
> I don't understand these benchmarks sites... Why they didn't used TSSAA (Async Shaders) when it is the best AA setting BY FAR on Doom, and the only way to enable Async Shaders for AMD is enabling TSSAA.
> 
> There is absolutely no reason to not use TSSAA other than to make Nvidia look better, when using TSSAA (Async) the Fury X seems to perform just like a GTX 1080 if not a tiny bit better.


It really serves no other purpose. Review sites that do this are doing a diservice to the tech community and their readers! What a bunch of ****ers!
Anyway it looks like the RX 480 has successfuly fended off the 1060, and opinion is quite favorable for the RX 480 while not so much for the 1060. Consumers seem to be realizing how quickly NV cards lose their value and performance standing, and will even more so now that the new APIs are rolling out. Doom 3 with Vulkan was a big win for AMD. Consumers are now paying attention to how poorly NV cards age :|


----------



## Kpjoslee

Quote:


> Originally Posted by *Dargonplay*
> 
> I just posted the video above, and of course a GTX 1080 is not only going to be faster than a Fury X in OpenGL, a GTX 1080 in OpenGL will trash the Fury X, fact.
> 
> Another fact is that the situation reverse when using Vulkan, not so much as the *Fury X trashes the GTX 1080* but t does beats it when using TSSAA (Async Shaders) on Vulkan.


Oh, come on lol.


----------



## mtcn77

Quote:


> Originally Posted by *Kpjoslee*
> 
> Oh, come on lol.


Those results are off. You cannot have a cpu driving the frame rate at 100 FPS and still score 140 FPS. Also, the comment was based on DigitalFoundry's missing 4K results as they left it to the imagination.


----------



## Kpjoslee

Quote:


> Originally Posted by *mtcn77*
> 
> Those results are off. You cannot have a cpu driving the frame rate at 100 FPS and still score 140 FPS. Also, the comment was based on DigitalFoundry's missing 4K results as they left it to the imagination.


Perhaps Frametime is more accurate results but it is not uncommon to see the big gap between FPS and Frametime. It just means 1080 had lot more peaks, doesn't mean results are off.


----------



## airfathaaaaa

Quote:


> Originally Posted by *Dargonplay*
> 
> I don't understand these benchmarks sites... Why they didn't used TSSAA (Async Shaders) when it is the best AA setting BY FAR on Doom, and the only way to enable Async Shaders for AMD is enabling TSSAA.
> 
> There is absolutely no reason to not use TSSAA other than to make Nvidia look better, when using TSSAA (Async) the Fury X seems to perform just like a GTX 1080 if not a tiny bit better.


you have to learn how computebase works..

when they do dx12 they forget the high end amd cards "by accident" and when they do gameworks games they suddently includes all the fury line


----------



## dagget3450

TSSAA in Doom is simply a violation of the International Intergalaxy of equal and fairness performance graphical silicon ACT. We cannot allow oppressive innovators to influence the outcome of what should be fair and balanced graphical performance for all. If we intend to use specialized software that favors one vendor then we have lost all reason. We must be preemptive and vigilant with concurrent regulations that they should follow a standardized pipeline so that no user get disenfranchised and reduced performance. We know that parallelization of the system will only segregate it's users and cause strife.

Stop the Tyranny before it rises. We must stop the Red Rebellion at all costs.


----------



## magnek

Quote:


> Originally Posted by *airfathaaaaa*
> 
> you have to learn how computebase works..
> 
> when they do dx12 they forget the high end amd cards "by accident" and when they do gameworks games they suddently includes all the fury line


Computerbase.de is also the site that reported 60%+ gains for Fury X under Vulkan in DOOM. Should we discard that result now?


----------



## Kuivamaa

Computerbase does the best benches these days . Does no favor to either side. Gamegpu.ru is also non biased but their methodology is absolutely trash.


----------



## airfathaaaaa

Quote:


> Originally Posted by *magnek*
> 
> Computerbase.de is also the site that reported 60%+ gains for Fury X under Vulkan in DOOM. Should we discard that result now?


there is the .de and the .ru (or i mixed up the sites?)
nvm its the gpu something .ru my bad


----------



## mtcn77

Quote:


> Originally Posted by *Kpjoslee*
> 
> Perhaps Frametime is more accurate results but it is not uncommon to see the big gap between FPS and Frametime. It just means 1080 had lot more peaks, doesn't mean results are off.


It could very much be that, since the gpu latency is not working correctly, they have made a syntax error. It would be the easiest explanation why the cpu latency is higher than overall frametime latency, when it is not possible in principle because it would just set such a value as the dominant framelimit.


----------



## rcfc89

Quote:


> Originally Posted by *Kpjoslee*
> 
> Oh, come on lol.


Come on guys let the Amd trolls have their "one game" of glory. They get punished in everything else so let them just have this one. Is their bragging silly and annoying? Yes. But if you were in their shoes you would be thrilled too. Its like the ******ed kid who wins one match out of 40 in Ping Pong. Let them enjoy this.


----------



## OneB1t

in fact its now like 7 games..







and still growing


----------



## rcfc89

Quote:


> Originally Posted by *OneB1t*
> 
> in fact its now like 7 games..
> 
> 
> 
> 
> 
> 
> 
> and still growing


Let me guess Indie games that most of us haven't even heard of? Name them


----------



## OneB1t

https://en.wikipedia.org/wiki/List_of_games_with_DirectX_12_support

nearly all gave better performance for AMD then for comparable NVIDIA card


----------



## Exeed Orbit

Quote:


> Originally Posted by *rcfc89*
> 
> Let me guess Indie games that most of us haven't even heard of? Name them


Doom
Quantum Break
Hitman (Then again, AMD won even before DX12)
AOTS
Warhammer

Those are the ones I can think of.

The new Civ will also have it.


----------



## OneB1t

forza 6
rise of tomb raider

here you have 7


----------



## NightAntilli

Quote:


> Originally Posted by *Kpjoslee*
> 
> Oh, come on lol.


You should learn to read. He did not say the Fury X trashes the 1080.


----------



## Exeed Orbit

Quote:


> Originally Posted by *OneB1t*
> 
> forza 6
> rise of tomb raider
> 
> here you have 7


To be fair, Forza and ROTR both see NVidia performing much better than AMD.


----------



## poii

Quote:


> Originally Posted by *Kuivamaa*
> 
> *Computerbase does the best benches these days* . Does no favor to either side. Gamegpu.ru is also non biased but their methodology is absolutely trash.


QFT, they even added a new feature to their benches with the 1060 Benchmark.
https://www.computerbase.de/2016-07/geforce-gtx-1060-test/3/#abschnitt_benchmarks_in_1920__1080_und_2560__1440
Just click on "Bearbeiten" and uncheck the games you're not interested in and see the graph change accordingly. I mean that is just awesome.


----------



## OneB1t

not really with last patch + driver









check this


and this
https://youtu.be/d-Jgf6rtEg8?t=84


----------



## Exeed Orbit

Quote:


> Originally Posted by *OneB1t*
> 
> not really with last patch + driver
> 
> 
> 
> 
> 
> 
> 
> 
> 
> check this
> 
> 
> and this
> https://youtu.be/d-Jgf6rtEg8?t=84


Wow, I could've sworn I saw benchmarks showing Nvidia outpace AMD even in DX12. Nice.


----------



## GorillaSceptre

Quote:


> Originally Posted by *Exeed Orbit*
> 
> Wow, I could've sworn I saw benchmarks showing Nvidia outpace AMD even in DX12. Nice.


That would be members like Chev showing you outdated/cherry picked things..

Best to double check people.


----------



## OneB1t

there is huge difference even in 1 month old benchmarks because of patches and new drivers


----------



## kaosstar

It's pointless arguing before the Nvidia async patch is released.


----------



## Exeed Orbit

Quote:


> Originally Posted by *kaosstar*
> 
> It's pointless arguing before the Nvidia async patch is released.


Do we have an ETA on that?


----------



## kaosstar

From Bethesda: "We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon."


----------



## OneB1t

update is going to be:
oh NVIDIA cant do it on maxwell so maybe go and buy pascal?


----------



## GorillaSceptre

Quote:


> Originally Posted by *kaosstar*
> 
> It's pointless arguing before the Nvidia async patch is released.


Quote:


> Originally Posted by *kaosstar*
> 
> From Bethesda: "We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon."


From Oxide Aug 2015 - "We are working with NVIDIA to enable asynchronous compute in Aots on NVIDIA GPUs. We hope to have an update soon."

Even the "DX12 standard" that is Time Spy shows no Async on Maxwell cards.. Pascal may get a bump in the future, but the writings on the wall as far as Maxwell goes.


----------



## PontiacGTX

Quote:


> Originally Posted by *kaosstar*
> 
> From Bethesda: "We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon."


Then Asynchronous Compute via Pre Emption


----------



## magnek

Quote:


> Originally Posted by *Exeed Orbit*
> 
> Doom
> Quantum Break
> Hitman (Then again, AMD won even before DX12)
> AOTS
> Warhammer
> 
> Those are the ones I can think of.
> 
> The new Civ will also have it.


You can add Black Ops 3 to the list as well.





Spoiler: Warning: Spoiler!



RX 480 near 980 Ti performance confirmed!


----------



## nagle3092

Quote:


> Originally Posted by *kaosstar*
> 
> From Bethesda: "We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon."


Isn't it funny how Nvidia demoed Doom running vulkan on the 1080 during the press conference 3 months ago but they still don't have a driver for it?


----------



## Exeed Orbit

Quote:


> Originally Posted by *magnek*
> 
> You can add Black Ops 3 to the list as well.
> 
> 
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> RX 480 near 980 Ti performance confirmed!


Didn't know BO3 also had DX12 support.


----------



## Dargonplay

Vulkan is dead, OpenGL FOR EVERYBODY!


----------



## criminal

Quote:


> Originally Posted by *magnek*
> 
> You can add Black Ops 3 to the list as well.
> 
> 
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> RX 480 near 980 Ti performance confirmed!


That turd? I wouldn't take credit for better performance in that game... lol


----------



## daviejams

Quote:


> Originally Posted by *Dargonplay*
> 
> Vulkan is dead, OpenGL FOR EVERYBODY!


Toms hardware









pretty good critique of the tech press though


----------



## Dargonplay

Quote:


> Originally Posted by *daviejams*
> 
> Toms hardware
> 
> 
> 
> 
> 
> 
> 
> 
> 
> pretty good critique of the tech press though


I don't know what's up with the tech press, I literally have to go to AdoredTV to get the Tech without the Bulltech.


----------



## Kpjoslee

Quote:


> Originally Posted by *NightAntilli*
> 
> You should learn to read. He did not say the Fury X trashes the 1080.


I might have bolded the wrong part, but my point stands.


----------



## ToTheSun!

Quote:


> Originally Posted by *Dargonplay*
> 
> Quote:
> 
> 
> 
> Originally Posted by *daviejams*
> 
> Toms hardware
> 
> 
> 
> 
> 
> 
> 
> 
> 
> pretty good critique of the tech press though
> 
> 
> 
> I don't know what's up with the tech press, I literally have to go to AdoredTV to get the Tech without the Bulltech.
Click to expand...

While i think AdoredTV is annoying and biased towards AMD, his main point in that video (that it's fishy to present OGL results in comparisons) is very valid.


----------



## Dargonplay

Quote:


> Originally Posted by *ToTheSun!*
> 
> While i think AdoredTV is annoying and biased towards AMD, his main point in that video (that it's fishy to present OGL results in comparisons) is very valid.


I've been following AdoredTV for some time and I think he is extremely unbiased, and what many people sense as bias is just him being blunt, stating fact s that are usually tuned down in the tech press that don't favor the popular brand.

He is unbiased enough to say in one of his videos that he recommends everyone to get a 980Ti over a Fury X because it is simply the best card, or when he stated in another video, and I quote "We all know the Fury X OC for (Biological waste humans eject in the bathroom)".

Or how he explained that even though Fiji seemed to be as power efficient that people shouldn't be fooled, because that was all HBM but Maxwell was still the most marvelous architecture regarding efficiency.

Or how he says Pascal is perfection regarding power efficiency.

No. He's not biased, he's blunt and he will point out "Biological wastes" where it needs to be, and points good overlooked points where it needs to.


----------



## magnek

Quote:


> Originally Posted by *Exeed Orbit*
> 
> Didn't know BO3 also had DX12 support.


It's DX11
Quote:


> Originally Posted by *criminal*
> 
> That turd? I wouldn't take credit for better performance in that game... lol


Well Call of Kiddies Duty is very popular franchise, so just wanted to show there is a popular DX11 game where AMD also has an advantage. Because remember, AotS is a benchmark, people already finished DOOM, Quantum Break and Warhammer are both broken ports, and Hitman is clearly AMD biased, so they all don't count.









But I get what you're saying.


----------



## Exeed Orbit

Quote:


> Originally Posted by *Dargonplay*
> 
> He is unbiased enough to say in one of his videos that he recommends everyone to get a 980Ti over a Fury X because it is simply the best card, or when he stated in another video, and I quote "We all know the Fury X OC for (Biological waste humans eject in the bathroom)".


The Fury X overclocks for semen?!


----------



## SuperZan

Quote:


> Originally Posted by *Exeed Orbit*
> 
> The Fury X overclocks for semen?!


That must be why I could never get a decent OC out of it.


----------



## lolerk52

Quote:


> Originally Posted by *Exeed Orbit*
> 
> The Fury X overclocks for semen?!


...that's not thermal paste.


----------



## criminal

Quote:


> Originally Posted by *SuperZan*
> 
> That must be why I could never get a decent OC out of it.


You got a 1070? When did that happen?


----------



## ToTheSun!

Quote:


> Originally Posted by *Dargonplay*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ToTheSun!*
> 
> While i think AdoredTV is annoying and biased towards AMD, his main point in that video (that it's fishy to present OGL results in comparisons) is very valid.
> 
> 
> 
> I've been following AdoredTV for some time and I think he is extremely unbiased, and what many people sense as bias is just him being blunt, stating fact s that are usually tuned down in the tech press that don't favor the popular brand.
> 
> He is unbiased enough to say in one of his videos that he recommends everyone to get a 980Ti over a Fury X because it is simply the best card, or when he stated in another video, and I quote "We all know the Fury X OC for (Biological waste humans eject in the bathroom)".
> 
> Or how he explained that even though Fiji seemed to be as power efficient that people shouldn't be fooled, because that was all HBM but Maxwell was still the most marvelous architecture regarding efficiency.
> 
> Or how he says Pascal is perfection regarding power efficiency.
> 
> No. He's not biased, he's blunt and he will point out "Biological wastes" where it needs to be, and points good overlooked points where it needs to.
Click to expand...

Fair enough, you got me.
Quote:


> Originally Posted by *lolerk52*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Exeed Orbit*
> 
> The Fury X overclocks for semen?!
> 
> 
> 
> ...that's not thermal paste.
Click to expand...

If mayonnaise is, i'll be damned if i can't cool down my things with my... stuff.


----------



## SuperZan

Quote:


> Originally Posted by *criminal*
> 
> You got a 1070? When did that happen?


A day or so ago.









I'm a classic impulse buyer, and I like to test everything. I'm looking forward to comparing UHD perf. with different settings on the Furries vs. the 1070. I'll just need a clear day and loads of coffee.


----------



## pengs

Quote:


> Originally Posted by *Dargonplay*


Yep. Tech journalism is horrible atm. Ed Crisler was on-point when he called them pundits which is almost giving them too much credit..

The neglect of Vulkan Doom is a result of laziness, getting and benching the card early (laziness), fear to bite the hand which feeds or needing to get fed. Easy peasy


----------



## criminal

Quote:


> Originally Posted by *SuperZan*
> 
> A day or so ago.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm a classic impulse buyer, and I like to test everything. I'm looking forward to comparing UHD perf. with different settings on the Furries vs. the 1070. I'll just need a clear day and loads of coffee.


Oh okay. Good deal.


----------



## magnek

Quote:


> Originally Posted by *SuperZan*
> 
> A day or so ago.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I'm a classic impulse buyer, and I like to test everything. I'm looking forward to comparing UHD perf. with different settings on the Furries vs. the 1070. I'll just need a clear day and loads of coffee.


nooooooooooooooooooooooooooooooooooooooooo now you can't say you have a Furries setup :sadface:


----------



## SuperZan

Quote:


> Originally Posted by *magnek*
> 
> nooooooooooooooooooooooooooooooooooooooooo now you can't say you have a Furries setup :sadface:


Oh, I still have them! I just have an inordinate need to test all the things.


----------



## magnek

Something something about disposable income and first world problems


----------



## toncij

It is clear that 1080 is simply still the fastest card. Nvidia has advantage in pure power there, so even with failure of an architectural features, it wins.

But. I'd be seriously concerned about Vega. Such an early release of Titan XP announces NV's fear.


----------



## ChevChelios

Quote:


> Such an early release of Titan XP announces NV's fear.


no it does not

it means that Nvidia is pulling further and further away from AMD in the high-end segment

Vega will need to contend with 1080, Titan XP, possibly another card of that level later (either another Titan, more gaming oriented/non-cut or 1080 Ti) and then also with either Pascal v2 or Volta

and it wont win that battle


----------



## EightDee8D

That's why they hold back their titan x (m) and 980ti.









Nvidia always knows something we don't. and this early release of big gpu tells something, depends what we really understand.


----------



## ChevChelios

Tells that they are selling Quadro P6000 rejects for $1200 to those who passed on the 1080?


----------



## EightDee8D

Quote:


> Originally Posted by *ChevChelios*
> 
> Tells that they are selling Quadro P6000 rejects for $1200 to those who passed on the 1080?


Maybe, or maybe what that guy was saying. we will see. interesting times ahead for gpus.


----------



## Kuivamaa

Quote:


> Originally Posted by *toncij*
> 
> It is clear that 1080 is simply still the fastest card. Nvidia has advantage in pure power there, so even with failure of an architectural features, it wins.
> 
> But. I'd be seriously concerned about Vega. Such an early release of Titan XP announces NV's fear.


I think AMD is using HBM2,for all Vega products, this could explain why nvidia is first to the market with its GDDR5X SKUs.


----------



## Ha-Nocri

Quote:


> Originally Posted by *Kuivamaa*
> 
> I think AMD is using HBM2,for all Vega products, this could explain why nvidia is first to the market with its GDDR5X SKUs.


I think AMD just didn't have enough funds to cover development of the whole range of chips, from low to high end, at the same time. And since they didn't refresh lower end forever, they decided to go with it first. Which I think was the right decision.


----------



## toncij

Quote:


> Originally Posted by *Kuivamaa*
> 
> I think AMD is using HBM2,for all Vega products, this could explain why nvidia is first to the market with its GDDR5X SKUs.


I'm afraid not. If that was the case, announced FirePro would use HBM. And it doesn't. So it seems this year both NV and AMD skipped HBM.


----------



## ku4eto

Quote:


> Originally Posted by *toncij*
> 
> I'm afraid not. If that was the case, announced FirePro would use HBM. And it doesn't. So it seems this year both NV and AMD skipped HBM.


Actually they are trying to get rid of old stock. They are still using Hawaii chips and some Fiji. They do not really need them, as they are already selling new generation cards, although the new FirePro products and Vega are still not out, they need to prepare the shelves.


----------



## mtcn77

Quote:


> Originally Posted by *ku4eto*
> 
> Actually they are trying to get rid of old stock. They are still using Hawaii chips and some Fiji. They do not really need them, as they are already selling new generation cards, although the new FirePro products and Vega are still not out, they need to prepare the shelves.


You've got to be kidding...
HBM is not a big data set capable video buffer. The interface has absolutely zero redundancies built in for manufacturing tolerances. It is therefore having a very hesitant adoption.


----------



## Kuivamaa

Quote:


> Originally Posted by *toncij*
> 
> I'm afraid not. If that was the case, announced FirePro would use HBM. And it doesn't. So it seems this year both NV and AMD skipped HBM.


You mean the one they just announced? It is a polaris,would not make sense to have HBM2 as it was not designed with that in mind. Vega is HBM2, we know that.


----------



## ku4eto

Quote:


> Originally Posted by *mtcn77*
> 
> You've got to be kidding...
> HBM is not a big data set capable video buffer. The interface has absolutely zero redundancies built in for manufacturing tolerances. It is therefore having a very hesitant adoption.


Hesitant, as in being on 600$ GPU that is for enthusiasts? Ofcourse it will be having slow adoption, its a new technology that costs a lot more than the current one, and the benefits to the average Joe are... not worth it. In few years perhaps this will change.


----------



## JackCY

Quote:


> Originally Posted by *toncij*
> 
> I'm afraid not. If that was the case, announced FirePro would use HBM. And it doesn't. So it seems this year both NV and AMD skipped HBM.


Aren't the FirePro cards just P10 and P11? It seemed that way to me, so that's G5 or G5X only. Even NV uses G5X only on the premium 1080 because it is still expensive even for them to put it into everything.


----------



## mtcn77

Quote:


> Originally Posted by *ku4eto*
> 
> Hesitant, as in being on 600$ GPU *that is for enthusiasts*? Ofcourse it will be having slow adoption, its a new technology that costs a lot more than the current one, and the benefits to the average Joe are... not worth it. In few years perhaps this will change.


Hesitant as in not present in FirePRO Series due to these reasons. AMD continues to make full reference to Fury X for the enthusiast segment.


----------



## Bryst

This makes me slightly more excited for the 480 I have incoming, I still haven't beaten this game.


----------



## Gualichu04

Just Started playing this game and with Vulkan at 1440p on my r9 290x at 1150mhz core and 1300mhz memory i get 70-80fps even in fire fights.


----------



## magnek

Quote:


> Originally Posted by *toncij*
> 
> It is clear that 1080 is simply still the fastest card. Nvidia has advantage in pure power there, so even with failure of an architectural features, it wins.
> 
> But. I'd be seriously concerned about Vega. Such an early release of Titan XP announces NV's fear.


I have to laugh at comments like these.

If nVidia truly feared what Vega would bring, they wouldn't be releasing a cut down Quadro reject that isn't a full chip and doesn't have 24 GB GDDR5X, then slap on a Titan name and charge $1200 for it. You're delusional if you think this signifies "fear".

No this is not "fear", this is them flaunting their market position by giving us what is essentially a 1080 Ti and charging Titan prices for it.


----------



## FLaguy954

Quote:


> Originally Posted by *Ha-Nocri*
> 
> I think AMD just didn't have enough funds to cover development of the whole range of chips, from low to high end, at the same time. And since they didn't refresh lower end forever, they decided to go with it first. *Which I think was the right decision.*




I agree.


----------



## toncij

Quote:


> Originally Posted by *magnek*
> 
> I have to laugh at comments like these.
> 
> If nVidia truly feared what Vega would bring, they wouldn't be releasing a cut down Quadro reject that isn't a full chip and doesn't have 24 GB GDDR5X, then slap on a Titan name and charge $1200 for it. You're delusional if you think this signifies "fear".
> 
> No this is not "fear", this is them flaunting their market position by giving us what is essentially a 1080 Ti and charging Titan prices for it.


You might as well be right.







I won't pursue my original idea since it's just one happy theory. Yours is more probable.


----------



## magnek

Sorry if I came across as a bit harsh, that wasn't my intention.


----------



## costilletas

Quote:


> Originally Posted by *magnek*
> 
> I have to laugh at comments like these.
> 
> If nVidia truly feared what Vega would bring, they wouldn't be releasing a cut down Quadro reject that isn't a full chip and doesn't have 24 GB GDDR5X, then slap on a Titan name and charge $1200 for it. You're delusional if you think this signifies "fear".
> 
> No this is not "fear", this is them flaunting their market position by giving us what is essentially a 1080 Ti and charging Titan prices for it.


And people will buy it anyway, so who cares.


----------



## KyadCK

Quote:


> Originally Posted by *ku4eto*
> 
> Quote:
> 
> 
> 
> Originally Posted by *mtcn77*
> 
> You've got to be kidding...
> HBM is not a big data set capable video buffer. The interface has absolutely zero redundancies built in for manufacturing tolerances. It is therefore having a very hesitant adoption.
> 
> 
> 
> Hesitant, as in being on 600$ GPU that is for enthusiasts? Ofcourse it will be having slow adoption, its a new technology that costs a lot more than the current one, and the benefits to the average Joe are... not worth it. In few years perhaps this will change.
Click to expand...

No, Hesitant like being on the largest Tesla nVidia has ever made designed for super computers. I mean, c'mon man, duh.








Quote:


> Originally Posted by *JackCY*
> 
> Quote:
> 
> 
> 
> Originally Posted by *toncij*
> 
> I'm afraid not. If that was the case, announced FirePro would use HBM. And it doesn't. So it seems this year both NV and AMD skipped HBM.
> 
> 
> 
> Aren't the FirePro cards just P10 and P11? It seemed that way to me, so that's G5 or G5X only. Even NV uses G5X only on the premium 1080 because it is still expensive even for them to put it into everything.
Click to expand...

That would be correct.
Quote:


> Originally Posted by *mtcn77*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ku4eto*
> 
> Hesitant, as in being on 600$ GPU *that is for enthusiasts*? Ofcourse it will be having slow adoption, its a new technology that costs a lot more than the current one, and the benefits to the average Joe are... not worth it. In few years perhaps this will change.
> 
> 
> 
> Hesitant as in not present in FirePRO Series due to these reasons. AMD continues to make full reference to Fury X for the enthusiast segment.
Click to expand...

Ah crap, I knew I left that link around here somewhere...

*digging* *shuffle shuffle* Aha!

http://www.anandtech.com/show/10209/amd-announces-firepro-s9300-x2
https://www.amd.com/Documents/s9300-x2-datasheet.pdf

Oh wait, they don't use them for FirePros. Or Teslas. I'm mistaken, obviously.


----------



## toncij

Quote:


> Originally Posted by *magnek*
> 
> Sorry if I came across as a bit harsh, that wasn't my intention.


Your explanation IS more probable, but something is bugging me there so I went that way.







I know Nvidia for stalling everything they can as long as they can. They never move further up if they don't have to. They didn't have to announce Titan yet. Quadros? Yes, but Titan may be just?? Ok, counter to AMD FirePro in professional market? Hardly with the same 12GB. Perf wise? No, AMD has Polaris, weak FirePros... Not sure why a Titan so early... and why is it made of unobtainium? (only NV distribution by a website).


----------



## mushroomboy

Quote:


> Originally Posted by *rcfc89*
> 
> Let me guess Indie games that most of us haven't even heard of? Name them


You know both consoles support dx12 features as well as async compute? And as much as you probably Love to disbelieve, they out number all the nvidia fans out there. With that in mind, and since it's a known performance boost.... you really want to keep talking about how this isn't a big deal?

Really man, developers now gain MORE in favoring AMD this time around. So those ports, where async was already implemented?

It's no coincidence that AMD designed an architecture tailored to the big benifits they introduced into DX/Vulcan? GCN was tailored for this.


----------



## ChevChelios

Quote:


> You know both consoles support dx12 features as well as async compute?


but they have old GCN1 GPUs in them IIRC

not that much of a support compared to GCN3/4


----------



## EightDee8D

^ Still million times better support than competition.









B b bb. but it's weak so irrelevant, comment incoming .....


----------



## ChevChelios

Quote:


> Still million times better support than competition.


nope








Quote:


> but it's weak so irrelevant, comment incoming


you beat me to it


----------



## EightDee8D

Quote:


> Originally Posted by *ChevChelios*
> 
> nope
> 
> 
> 
> 
> 
> 
> 
> 
> you beat me to it


Hh yeah, hypocrisy. i forgot about the green goblin motto.









370 > GTX titan OG. in vulkan. it's like beating a ferrari on race track by honda civic. how pathetic lol.

DX FL_11.1 vs DX FL_11.0, and so called DX12.1 on maxell/paxwell where most of relevant features are still lower tier compared to older gcn. this is why their performance doesn't last long.

talk about better support


----------



## ChevChelios

do the red goblins only play Doom ?








Quote:


> GTX titan OG. in vulkan


use OpenGl for the titan


----------



## EightDee8D

Quote:


> Originally Posted by *ChevChelios*
> 
> use OpenGl for the titan


See, less support, even green goblins know it. thanks for admitting.


----------



## pas008

lol typical red vs green ocn crap
hey lets argue about the highest depreciated item in everyones systems with ssds
dx12/vulkan/etc great stuff love to see this,
wont matter for awhile too early to say but holy lets argue cuz we can predict the future and will be still using same components


----------



## EightDee8D

Typical ignorant posters, learn what argument actually was instead of living in wonderland.

LOL


----------



## Remij

At this point I'm simply hoping that AMD can remain competitive enough to win back a good chunk of market share (and mind share) while they have the advantages they have with regards to the new APIs.

I usually buy enthusiast level PC gaming hardware, and on the GPU side that's meant Nvidia for the past few years. However, things are becoming a bit too ridiculous for me. I'm anxious to see if anything comes from Roy Taylor's "Excited for Christmas" comment, but there's one problem I have. AMD are always half a year too late for me.

I might buy the Titan XP... depending on benchmarks.. but if I do, I'm cutting back and dropping SLI. I've always bought two gpus together for a long time now. But there'll be no more buying the top of the line Titan cards and buying 2 of them. It's just a ridiculous amount of money to spend on GPUs. I mean, I'm all about SLI and having an absurdly powerful gaming PC, but I refuse to give Nvidia so much money like that any longer.

I find my preferences changing as well. A few years ago I wanted the biggest PC I could build. I wanted a top of the line fully featured eATX motherboard with tons of options and expansion slots, and dual GPUs, tons of room for storage, and a fully custom water loop I designed. So I bought it and built it, and it was awesome. I loved it and was glad that I did it. But as the years went on, I started going in the opposite direction and wanted the smallest case I could get while still supporting plenty of features and with dual GPUs and an AIO watercooling solution. It was a great change, and I'm happy I did it.

So now I find myself wanting to downsize again. Since GPUs like Titan XP are so powerful, and Nvidia is becoming insulting with their pricing, SLI truly isn't worth it anymore for me. I'm a big fan of tons of power in small form factors. I would love to have something from AMD be relatively close to the Titan XP and in the realm of sane pricing. So while Nvidia might get more money from me this current generation, my opinion of the scene is changing with these prices, and it's driving me to just going back to the basics and looking at other ways I can improve my setup instead of just adding more GPUs.


----------



## Dargonplay

Quote:


> Originally Posted by *ChevChelios*
> 
> do the red goblins only play Doom ?




Goblins are green...


----------



## pas008

Quote:


> Originally Posted by *Remij*
> 
> At this point I'm simply hoping that AMD can remain competitive enough to win back a good chunk of market share (and mind share) while they have the advantages they have with regards to the new APIs.
> 
> I usually buy enthusiast level PC gaming hardware, and on the GPU side that's meant Nvidia for the past few years. However, things are becoming a bit too ridiculous for me. I'm anxious to see if anything comes from Roy Taylor's "Excited for Christmas" comment, but there's one problem I have. AMD are always half a year too late for me.
> 
> I might buy the Titan XP... depending on benchmarks.. but if I do, I'm cutting back and dropping SLI. I've always bought two gpus together for a long time now. But there'll be no more buying the top of the line Titan cards and buying 2 of them. It's just a ridiculous amount of money to spend on GPUs. I mean, I'm all about SLI and having an absurdly powerful gaming PC, but I refuse to give Nvidia so much money like that any longer.
> 
> I find my preferences changing as well. A few years ago I wanted the biggest PC I could build. I wanted a top of the line fully featured eATX motherboard with tons of options and expansion slots, and dual GPUs, tons of room for storage, and a fully custom water loop I designed. So I bought it and built it, and it was awesome. I loved it and was glad that I did it. But as the years went on, I started going in the opposite direction and wanted the smallest case I could get while still supporting plenty of features and with dual GPUs and an AIO watercooling solution. It was a great change, and I'm happy I did it.
> 
> So now I find myself wanting to downsize again. Since GPUs like Titan XP are so powerful, and Nvidia is becoming insulting with their pricing, SLI truly isn't worth it anymore for me. I'm a big fan of tons of power in small form factors. I would love to have something from AMD be relatively close to the Titan XP and in the realm of sane pricing. So while Nvidia might get more money from me this current generation, my opinion of the scene is changing with these prices, and it's driving me to just going back to the basics and looking at other ways I can improve my setup instead of just adding more GPUs.


agreed on all except on downsizing thing I like my mm case and multiple cards,
I am not liking the news on sli/xfire/multigpu so far, I know its like 1% of users but still to take something away after been around so long, or even leave it up to lazy developers? really hope there will be alot of 3rd party fixes support cuz i'm no programmer
I'd go single card every gen if we didnt get the typical intel petty next gen jumps


----------



## magnek

Quote:


> Originally Posted by *toncij*
> 
> Your explanation IS more probable, but something is bugging me there so I went that way.
> 
> 
> 
> 
> 
> 
> 
> I know Nvidia for stalling everything they can as long as they can. They never move further up if they don't have to. They didn't have to announce Titan yet. Quadros? Yes, but Titan may be just?? Ok, counter to AMD FirePro in professional market? Hardly with the same 12GB. Perf wise? No, AMD has Polaris, weak FirePros... Not sure why a Titan so early... and why is it made of unobtainium? (only NV distribution by a website).


Yes the timing is definitely unexpected I agree. My best guess is they didn't want to sit on a pile of Quadro rejects, so they figured why not sell the salvage parts as a Titan card and watch everybody trip over each other to be the first in line to buy it.


----------



## toncij

Quote:


> Originally Posted by *magnek*
> 
> Yes the timing is definitely unexpected I agree. My best guess is they didn't want to sit on a pile of Quadro rejects, so they figured why not sell the salvage parts as a Titan card and watch everybody trip over each other to be the first in line to buy it.


What makes it even more strange is that they've decided to sell it exclusively through their own shop - cutting off a huge market. For example: my colleagues and me bought 12 TitanXes last year and this year we simply can't. There is no NV shop for our country...


----------

