# [Various] AMD's Zen To Have 10 Pipelines Per Core - Details Leaked In Patch (Updated)



## Blameless

Looks like AMD's first true four-wide core (Intel's x86 CPUs have been able to retire four instructions per cycle since Core 2) and 8 or 10 execution ports, depending on how you look at things (Haswell has seven).


----------



## alcal

Cool stuff. Is your use of the term "Hyperthreads" accurate though? My understanding was that hyperthreads basically context switch between two simulated processes on a single physical core, whereas this system seems to actually be able to execute multiple instructions simultaneously.


----------



## cookieboyeli

Why didn't they design it to exceed Haswell's specifications?

Wouldn't they want to match Haswell at least?

ZEN:
Quote:


> FPU: 2 128bit FMAC (2 128bit add + 2 128bit mul)


Haswell:
Quote:


> FPU: 2 256bit FMAC (256bit add + 256bit mul)


And why not use 384bit or something so it's "better".

(I have no idea how any of this works, please don't kill me if this is a stupid thing to ask).


----------



## Clocknut

So now we are taking someone's blog as rumor as well?


----------



## epic1337

Quote:


> Originally Posted by *cookieboyeli*
> 
> Why didn't they design it to exceed Haswell's specifications?
> 
> Wouldn't they want to match Haswell at least?
> 
> ZEN:
> Haswell:
> And why not use 384bit or something so it's "better".
> 
> (I have no idea how any of this works, please don't kill me if this is a stupid thing to ask).


you'd end up with a huge die, counter productive to margin and yield per wafer.
or in other words, it gets stupid expensive.

what they'd have to do is aim for the smallest die-size they can design it to, without sacrificing average performance.
in this case, they'd have to make their design more efficient and takes less transistors to perform better.


----------



## 8800GT

Quote:


> Originally Posted by *Clocknut*
> 
> So now we are taking someone's blog as rumor as well?


Technically anything someone says that isn't confirmed is a rumor. I could say that you like to eat golden retriever puppies. Is it true? No (most likely). Is it a credible source in retrospect? No. But that's what makes it a rumor.

It is entirely possible what is shown here is 100% true and on point. Not likely, but possible.


----------



## geoxile

Quote:


> Originally Posted by *cookieboyeli*
> 
> Why didn't they design it to exceed Haswell's specifications?
> 
> Wouldn't they want to match Haswell at least?
> 
> ZEN:
> Haswell:
> And why not use 384bit or something so it's "better".
> 
> (I have no idea how any of this works, please don't kill me if this is a stupid thing to ask).


Zen is aimed at servers where scaling isn't as big of an issue as it is with personal-use and hence perf/power means they can scale up more. Given that they're producing Zen on GF's 14nm process they probably can't make something that's both large and efficient. So they're making something kinda big and will most likely be trying to maximize power efficiency.


----------



## Blameless

Quote:


> Originally Posted by *cookieboyeli*
> 
> Wouldn't they want to match Haswell at least?


It does, perhaps even exceeds them, if the AVX/AVX2 capabilities are on their own ports.
Quote:


> Originally Posted by *cookieboyeli*
> 
> And why not use 384bit or something so it's "better".


Because there are 128-bit, 256-bit, and 512-bit SIMD instruction sets that need to be supported.

384-bit would never be used by any native instruction, and would thus be a less efficient option.


----------



## Noufel

If the price is right AMD could have a hit especialy with an 8 cores 16 threads zen cpu to compete with the i7, and in the end we'll see price drops hopefully


----------



## ku4eto

Very interesting stuff. Where are those "patch notes" coming from ?If its only using 2x128, that wouldn't be too good :/


----------



## Dom-inator

Quote:


> Originally Posted by *cookieboyeli*
> 
> Why didn't they design it to exceed Haswell's specifications?
> 
> Wouldn't they want to match Haswell at least?
> 
> ZEN:
> Haswell:
> And why not use 384bit or something so it's "better".
> 
> (I have no idea how any of this works, please don't kill me if this is a stupid thing to ask).


I have no idea either lol (but who actually does on OCN?) maybe a higher amount is not needed because the components around it are faster


----------



## icanhasburgers

Quote:


> Originally Posted by *ku4eto*
> 
> Very interesting stuff. Where are those "patch notes" coming from ?If its only using 2x128, that wouldn't be too good :/


The core design is better than Bulldozer's design philosophy, so i personally would say that not all hope is lost because of 2 x 128.


----------



## Cybertox

Despite all that, this upcoming AMD CPUs line-up still wont be able to beat Intel in terms of raw performance, will just provide a better price per performance ration until a certain point.


----------



## looncraz

Quote:


> Originally Posted by *ku4eto*
> 
> Very interesting stuff. Where are those "patch notes" coming from ?If its only using 2x128, that wouldn't be too good :/


2x128 on this design will be just as effective as 2x256 on Intel's design for all non-AVX instructions.

Zen can also do 4x64 instructions as well, and certain operations should be able to do 8x32 (32-bit being the near-universal floating point size).

I wouldn't be surprised to learn that AMD didn't prioritize AVX performance, with their strong desire to lean on GPU compute (for obvious reasons).

I think they're moving towards being able to directly move certain floating point vector math to the GPU without any special coding tools or compilers needed. All-on-CPU. In theory, it wouldn't be that difficult, but it would absolutely introduce latency, so they would need to be able to fetch a significant amount of instructions and just have the GPU work on all of the likely branches all at once, then when the logic units know which results are needed, simply pull in the finished results. The end result would be an apparent near-zero latency for certain floating point operations


----------



## MoGTy

Quote:


> Originally Posted by *cookieboyeli*
> 
> Why didn't they design it to exceed Haswell's specifications?
> 
> Wouldn't they want to match Haswell at least?
> 
> ZEN:
> Haswell:
> And why not use 384bit or something so it's "better".
> 
> (I have no idea how any of this works, please don't kill me if this is a stupid thing to ask).


"It's not about how big it is, it's about how you use it." Sort of applies here.

While larger, may equal better, it doesn't have to be that way.

I.E. I could design a CPU with 256MB of L1 cache, and it'd be a damn fine chip. Except it would be so impractical even my grandma would lose her patience trying to open Outlook 2003.

P.S. I'm not sure if this is entirely impossible, I was just hyperbolizing.


----------



## Redwoodz

Quote:


> Originally Posted by *Cybertox*
> 
> Despite all that, this upcoming AMD CPUs line-up still wont be able to beat Intel in terms of raw performance, will just provide a better price per performance ration until a certain point.


It's not that hard to make a cpu that would outperform Intel's.

The hard thing is to make a cpu with that performance that is affordable. It remains to be seen where those lines will cross.


----------



## Kuivamaa

Quote:


> Originally Posted by *Cybertox*
> 
> Despite all that, this upcoming AMD CPUs line-up still wont be able to beat Intel in terms of raw performance, will just provide a better price per performance ration until a certain point.


Actually this design appears poised to beat intel CPUs in a good chunk of integer based workloads, and perhaps even 128-bit SIMD. Most likely intel will remain stronger in anything wider, like AVX, but If Zen's 128-bit parts can be effectively combined into 256-bit units (Bulldozer family was awful at that), AMD won't be left too far behind this time round.In any case When it comes to games where AVX is not a thing, Zen might achieve parity with haswell/skylake after all. If Zen doesn't have hidden faults or some major overlook, we might see reviews where AMD and intel trade blows in benchmarks depending on application, and I am talking throughout their whole range of products (since FX8 will be 5960X competitor). not like with bulldozer or piledriver where they could only score some wins vs mainstream SB/IB quad i7s. Provided that GloFo offers a node that allow clocks in the 3.5GHz region that is...

Quote:


> Originally Posted by *Redwoodz*
> 
> It's not that hard to make a cpu that would outperform Intel's.
> 
> The hard thing is to make a cpu with that performance that is affordable. It remains to be seen where those lines will cross.


That might not be that hard after all. Intel likes to sell their mainstream quads that are designed around laptops to the desktop crowd. Which means half their die or more is the iGPU which many people find unnecessary. AMD could offer four hyperthreaded Zen cores made with 14nm FF of Samsung/GloFo without iGPU at half the price of a 4790k ir 6700k and still be as profitable.


----------



## ZeSy

I'm sorry for being an idiot, but can someone please explain in laymans terms what this actually means?


----------



## KarathKasun

The core can execute 10 functions concurrently. This is instruction level parallelism (doing int/fp/store/etc all at the same time) rather than thread level parallelism that can use more cores.

Think of breaking down a complex algebra problem into chunks that can be done at the same time. (A+B)*(C-D) is a good simple example. You would do A+B and C-D at the same time in the same CPU. Then you could do the multiplication in one of the FPU units to free the other general purpose math pipelines to start the next equation. All of this is decided in the instruction decoder (real time) usually if my memory serves right, rather than the compiler doing the work (pre-computed).


----------



## Cybertox

Quote:


> Originally Posted by *Redwoodz*
> 
> It's not that hard to make a cpu that would outperform Intel's.
> 
> The hard thing is to make a cpu with that performance that is affordable. It remains to be seen where those lines will cross.


Making an affordable enthusiast CPUs line-up better than the one of Intel is not impossible, just requires things that AMD doesnt have or isnt capable of doing. AMD wasnt even able to beat Nvidia despite having an an entirely new architecture and newest HBM technology. While I am pretty sure that Pascal is going to be a significant leap in performance even though it will use a later HBM generation. It sounds like I am bashing AMD but the truth is that since the 2XX series, AMD has been going downhill. Took ages to release a new line-up which didnt even meet expectations and ended up being as good as that times current Nvidia GPUs. The only benefit is the cheaper price and smaller cards, performance is on par if not worse as mentioned previously. So I wouldnt be surprised if the the situation with the upcoming CPUs is going to have the same fate. AMD is welcome to prove me wrong with their upcoming Zen but I really doubt that will happen.


----------



## Tojara

Quote:


> Originally Posted by *Cybertox*
> 
> Making an affordable enthusiast CPUs line-up better than the one of Intel is not impossible, just requires things that AMD doesnt have or isnt capable of doing. AMD wasnt even able to beat Nvidia despite having an an entirely new architecture and newest HBM technology. While I am pretty sure that Pascal is going to be a significant leap in performance even though it will use a later HBM generation. It sounds like I am bashing AMD but the truth is that since the 2XX series, AMD has been going downhill. Took ages to release a new line-up which didnt even meet expectations and ended up being as good as that times current Nvidia GPUs. The only benefit is the cheaper price and smaller cards, performance is on par if not worse as mentioned previously. So I wouldnt be surprised if the the situation with the upcoming CPUs is going to have the same fate. AMD is welcome to prove me wrong with their upcoming Zen but I really doubt that will happen.


I'm sorry, what? Fiji is very much the same GCN as in Tahiti even in name; just with clock gating, improved ROPs and new memory. It's very much not a new architecture by any definition. Even the 200-series is only a bit over an year old, and it's rather obvious they didn't bother creating several new chips on 28nm when they would just lose money by doing it compared to keeping the rebrands.

And don't get me wrong, I'm by no means saying that Zen will the faster than Skylake or whateverlake it will be up against. Having a smaller core that is withing a spitting distance with most instructions and still smaller is advantage in itself. While having faster threads is nice there is something to be said for power and die sizes if your cores as, say, 20% smaller and practically at the same level. The worst thing they could do is go straight head-to-head with Intel with a massive core, against Skylake which they have spent howevermany billion perfecting on a very good process node.


----------



## Asterox

Quote:


> Originally Posted by *looncraz*
> 
> 2x128 on this design will be just as effective as 2x256 on Intel's design for all non-AVX instructions.
> 
> Zen can also do 4x64 instructions as well, and certain operations should be able to do 8x32 (32-bit being the near-universal floating point size).
> 
> I wouldn't be surprised to learn that AMD didn't prioritize AVX performance, with their strong desire to lean on GPU compute (for obvious reasons).
> 
> I think they're moving towards being able to directly move certain floating point vector math to the GPU without any special coding tools or compilers needed. All-on-CPU. In theory, it wouldn't be that difficult, but it would absolutely introduce latency, so they would need to be able to fetch a significant amount of instructions and just have the GPU work on all of the likely branches all at once, then when the logic units know which results are needed, simply pull in the finished results. The end result would be an apparent near-zero latency for certain floating point operations


Big and wicked Radeon Cores are waiting,







and they are very impatient to take over its APU tasks.


----------



## imran27

Too many AMD news' and no @Seronx around here?


----------



## Seronx

Quote:


> Originally Posted by *imran27*
> 
> Too many AMD news' and no @Seronx around here?


Rather wait for Crane/Sweeper actually.


----------



## ZeSy

Quote:


> Originally Posted by *KarathKasun*
> 
> The core can execute 10 functions concurrently. This is instruction level parallelism (doing int/fp/store/etc all at the same time) rather than thread level parallelism that can use more cores.
> 
> Think of breaking down a complex algebra problem into chunks that can be done at the same time. (A+B)*(C-D) is a good simple example. You would do A+B and C-D at the same time in the same CPU. Then you could do the multiplication in one of the FPU units to free the other general purpose math pipelines to start the next equation. All of this is decided in the instruction decoder (real time) usually if my memory serves right, rather than the compiler doing the work (pre-computed).


Thank you so much, that's an amazing answer!

I think I now understand, thanks again <3


----------



## magnek

Quote:


> Originally Posted by *Asterox*
> 
> Big and wicked Radeon Cores are waiting,
> 
> 
> 
> 
> 
> 
> 
> and they are very impatient to take over its APU tasks.


What is this, Crysis meets Star Wars?


----------



## dave12

I do not understand OP. Will OCN tell me if this means Zen will be roughly equivalent to Sandy?


----------



## santi2104

Quote:


> Originally Posted by *Noufel*
> 
> If the price is right AMD could have a hit especialy with an 8 cores 16 threads zen cpu to compete with the i7, and in the end we'll see price drops hopefully


i would laugh my ass off looking at a 16 core zen cpu being beaten by a 4690k


----------



## looncraz

Quote:


> Originally Posted by *dave12*
> 
> I do not understand OP. Will OCN tell me if this means Zen will be roughly equivalent to Sandy?


You can't really tell 100%, but it has enough resources that it should beat out Sandy in integer, in terms of IPC.

It's definitely a good way to gain 40% increase in IPC and provide potentially exceptional SMT performance (which I didn't really expect from them).


----------



## BiG StroOnZ

Seems there is more to add to this Blog post via WCCF Tech (Breaking down the Blog Post and adding imagery):
Quote:


> AMD has just uploaded a patch to the patchwork project detailing many aspects of its hotly anticipated Zen CPU microarchitecture.
> 
> The patch was uploaded by [email protected] and is titled " [x86_64] znver1 enablement" re-affirming that there will be indeed multiple generations of AMD's brand new CPU core and this particular patch only covers the first iteration of the core that's coming out next year.
> 
> The Patch Allows Us To Get A Glimpse Into The Inner-Workings Of AMD's Next Generation High Performance x86 Zen CPU Core Today, with the information that has been revealed through the patch, we can get a better idea of how Zen looks like from a high-level design standpoint.
> Quote:
> 
> 
> 
> +;; Integer unit 4 ALU pipes.
> 
> +(define_cpu_unit "znver1-ieu0" "znver1_ieu")
> 
> +(define_cpu_unit "znver1-ieu1" "znver1_ieu")
> 
> +(define_cpu_unit "znver1-ieu2" "znver1_ieu")
> 
> +(define_cpu_unit "znver1-ieu3" "znver1_ieu")
> 
> +(define_reservation "znver1-ieu" "znver1-ieu0|znver1-ieu1|znver1-ieu2|znver1-ieu3")
> 
> +
> 
> +;; 2 AGU pipes.
> 
> +(define_cpu_unit "znver1-agu0" "znver1_agu")
> 
> +(define_cpu_unit "znver1-agu1" "znver1_agu")
> 
> +(define_reservation "znver1-agu-reserve" "znver1-agu0|znver1-agu1")
> Floating point unit 4 FP pipes.
> 
> +(define_cpu_unit "znver1-fp0" "znver1_fp")
> 
> +(define_cpu_unit "znver1-fp1" "znver1_fp")
> 
> +(define_cpu_unit "znver1-fp2" "znver1_fp")
> 
> +(define_cpu_unit "znver1-fp3" "znver1_fp")
> 
> +
> 
> +(define_reservation "znver1-fpu" "znver1-fp0|znver1-fp1|znver1-fp2|znver1-fp3")
> 
> 
> 
> This gives us a beautifully high-level insight into what a Zen core looks like. Each core has four ALU pipes , two AGU pipes and four FP pipes. ALU is short for Arithmetic Logic Unit, AGU is short for Address Generation Unit and FP is short for Floating Point. The four ALU pipes in this context represent the core's integer pipeline and the four FP pipes represent the floating point pipeline inside the core's Floating Point Unit. The AGU's work in tandem with the integer front-end to facilitate communication between the ALUs and a II-read, I-write L1 cache according to an AMD engineer's linkedin profile that Mr. Waldhauer has spotted.
> 
> If we create a diagram of the core's high-level design based on the Integer and Floating Point pipes mentioned in the patch then we get something that looks like this :
> 
> 
> 
> For a better perspective we put Zen side to side with AMD's steamroller, as the company has not published a block diagram for Excavator unfortunately. However according to what AMD revealed at this past Hot Chips, Excavator should have a very similar high-level layout to Steamroller. A just quick note to refresh everyone's memory, Steamroller is the CPU core that AMD has introduced with its 7000 series Kaveri and Godavari APUs.
> 
> 
> 
> The first thing that is easily discernible is that there is only one integer cluster in a Zen core rather than two like there is in a Steamroller module. These two integer clusters in Steamroller are what forms the two separate CPU cores / threads in each module. Zen takes on a more traditional AMD CPU layout resembling that of Phenom and Athlon K series cores. With a single Integer cluster and one equally large floating point unit.
> 
> Zen forgoes the CMT design of the bulldozer family we Zen should have a single fetch and a single decode unit in the front end, as opposed to the double decoders that were introduced with Steamroller. Comparing both floating point units, with four FP pipes Zen's floating point is effectively twice as wide as that of Steamroller.
> 
> 
> 
> Interestingly, the two 128-bit FMAC units in the Bulldozer family can process one 128-bit SIMD instruction per cycle each ot fuse together to process a single 256-bit AVX instruction per cycle.
> 
> 
> 
> If this capability to fuse and process larger instructions is carried over to Zen it would enable the two 256-bit FMAC units - 4 128-bit pipes - to fuse and process 512bit AVX instructions. Which would make the core compatible with Intel's AVX512 instruction set extension. Which is currently only supported by Intel's Knight's Landing Xeon Phi microarchitecture.
> 
> The wider floating point unit also means that Zen will be able to process less complex instructions at double the rate of Steamroller A massive boost in floating point performance, an area where AMD had historically excelled in with Phenom II and prior CPUs.
> 
> There was also one particularly important improvement with Zen that Mr. Waldhauer has managed to spot in a number of patents filed by AMD CPU engineers working on Zen.
> 
> A lot of the new functionality has been filed for patenting. For example there was a mention of checkpointing, which is good for quick reversion of mispredicted branches and other reasons for restarting the pipelines. Some patents suggest, that Zen might use some slightly modified Excavator branch prediction.
> 
> The branch misprediction penalty on the Bulldozer family of cores was a particularly significant one due to the deep piped nature of the microarchitecture. Intel's Sandy Bridge, which was introduced to the market at the same time as Bulldozer, had an equally deep pipeline. However with Sandy Bridge Intel introduced a micro-op cache which significantly contributed to reducing the performance penalty of mispredicting a branch. Zen should be AMD's first CPU core which would see the introduction of a technology that might not be similar to the solution on Sandy Bridge but is still focused solely on reducing branch misprediction penalties.
> 
> In summary, compared to Bulldozer family cores, Zen has double the floating point pipes as well as a better way of handling mispredicted branches, coupled with a more streamlined front-end as well as faster and more efficient cache sub-systems. All of these combined have undoubtedly contributed to the massive 40% IPC improvement that AMD has announced back in May.
Click to expand...

*Source:* http://wccftech.com/amd-zen-cpu-core-microarchitecture-detailed/2/


----------



## epic1337

they should stop comparing this to skylake in terms of IPC, because if it does match, it wont end with us having a good option to choose from.
look at Fury series, just because Fury X had slightly matched and sometimes outperformed GTX980Ti in higher resolutions they've put a $650 price tag on it.

now what does this have to do with Zen? if IPC were to match haswell or skylake, then do expect Zen 4core to be priced at $250, and Zen 8core to be priced at $600~$1000.
i mean, look at intel's 8core at $1000, if Zen were to be on-par with it, obviously AMD would put it with a price tag at the same bracket.


----------



## infranoia

Quote:


> Originally Posted by *epic1337*
> 
> they should stop comparing this to skylake in terms of IPC, because if it does match, it wont end with us having a good option to choose from.
> look at Fury series, just because Fury X had slightly matched and sometimes outperformed GTX980Ti in higher resolutions they've put a $650 price tag on it.
> 
> now what does this have to do with Zen? if IPC were to match haswell or skylake, then do expect Zen 4core to be priced at $250, and Zen 8core to be priced at $600~$1000.
> i mean, look at intel's 8core at $1000, if Zen were to be on-par with it, obviously AMD would put it with a price tag at the same bracket.


Lisa Su has already publicly said that they can no longer be just the low-cost lower-performance competitor, but rather one that competes on performance at a given price point. That has already set the stage for higher price points, and we saw the first shot of that with Fiji. There shouldn't be any surprises about this going forward, AMD has been very clear about that.

You can criticize the strategy all you want, but the cost for AMD is going up. What we need (and what AMD is aiming for), is for performance to match that higher price. Only then would we see the Nvidias and Intels actually begin to compete, and prices fall for the entire industry, instead of AMD being some low-rent VIA option for po' folks while everyone else gets fleeced.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *infranoia*
> 
> Lisa Su has already publicly said that they can no longer be just the low-cost lower-performance competitor, but rather one that competes on performance at a given price point. That has already set the stage for higher price points, and we saw the first shot of that with Fiji. There shouldn't be any surprises about this going forward, AMD has been very clear about that.
> 
> You can criticize the strategy all you want, but the cost for AMD is going up. What we need (and what AMD is aiming for), is for performance to match that higher price. Only then would we see the Nvidias and Intels actually begin to compete, and prices fall for the entire industry, instead of AMD being some low-rent VIA option for po' folks while everyone else gets fleeced.


While this might be true, people don't want the second best if they are paying the prices for the best. Meaning even if Zen can compete with Kaby Lake, or Skylake-E they cannot charge those prices for them because people will just go for the best anyway (unless of course you are an avid AMD fan).

So while they might not want to be viewed as the cost to performance competitor any longer they still have to lower their prices to compete properly. If someone can get an 8 Core Zen processor for $200 less than an 8 Core Intel processor. That makes sense to do especially if the performance differences are 10%. However, if they attempt to charge the same price as Intel they will only be marketing to AMD fans. They won't get Intel users to switch because an Intel user will just say, "Well I might as well buy the Intel 8 Core because it is 10% faster and costs the same."


----------



## epic1337

Quote:


> Originally Posted by *infranoia*
> 
> Lisa Su has already publicly said that they can no longer be just the low-cost lower-performance competitor, but rather one that competes on performance at a given price point. That has already set the stage for higher price points, and we saw the first shot of that with Fiji. There shouldn't be any surprises about this going forward, AMD has been very clear about that.
> 
> You can criticize the strategy all you want, but the cost for AMD is going up. What we need (and what AMD is aiming for), is for performance to match that higher price. Only then would we see the Nvidias and Intels actually begin to compete, and prices fall for the entire industry, instead of AMD being some low-rent VIA option for po' folks while everyone else gets fleeced.


and thats why its better for us if AMD slowly catches up, if Zen is comparable to Sandy Bridge or Ivy Bridge in terms of IPC, the 8core Zen would be just about in between intel's 6core and 8core skylake in terms of raw performance.
this means we can expect 8core Zen to be priced much closer to 6core skylake, or around $400~$500, while their 6core Zen would be at the $200~$300 directly competing against intel's mainstream line.

this isn't a bad thing for us, Sandy Bridge level IPC is far from being slow, yes it can slightly bottleneck games that are highly dependent to single-thread performance, but we'd still get a far better deal than intel's offering.
or simply put, if you were to be given a choice between a 6core sandy bridge and a 4core skylake, the 6core sandy bridge is a better deal specially when its at the same price, even if its considerably slower in single-thread performance.


----------



## SystemTech

I think Zen has potential. If they can release it within the next year otherwise they may be too late.. although intel are only doing 5% performance increases now














so maybe they will still be in contention


----------



## Clocknut

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> While this might be true, people don't want the second best if they are paying the prices for the best. Meaning even if Zen can compete with Kaby Lake, or Skylake-E they cannot charge those prices for them because people will just go for the best anyway (unless of course you are an avid AMD fan).
> 
> So while they might not want to be viewed as the cost to performance competitor any longer they still have to lower their prices to compete properly. If someone can get an 8 Core Zen processor for $200 less than an 8 Core Intel processor. That makes sense to do especially if the performance differences are 10%. However, if they attempt to charge the same price as Intel they will only be marketing to AMD fans. They won't get Intel users to switch because an Intel user will just say, "Well I might as well buy the Intel 8 Core because it is 10% faster and costs the same."


for the same price/performance I still gonna Intel.

AMD need to offer extra 15% performance for me to switch sides.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Clocknut*
> 
> for the same price/performance I still gonna Intel.
> 
> AMD need to offer extra 15% performance for me to switch sides.


I severely doubt they will be able to provide 15% better performance, however 15% less performance but for hundreds of dollars cheaper might make people switch sides.

Like imagine an AMD Zen 8 Core with IPC of Haswell-E or Ivy-E for $600 competing against a Skylake-E 8 Core for $1000.

That's quite a competitive product.


----------



## Wishmaker

The good old days are gone when a good chip can be priced in the mid range. When Conroe came out, manufacturing cost was not as high as today. Consequently, they could price it as they wanted and still beat FX-52/FX-69







. Truth be told, a good performing ZEN will be expensive and this will not bode well for those who used to buy AMD for 'budget' builds. I am also sure ZEN motherboards will increase in price due to the new 'we won't be the cheap solution anymore' strategy Lisa Su is promoting.


----------



## KarathKasun

Eh, each performance level of tech gadgets should be going up in price by about %5-10 per year because of inflation. People who compare prices now to prices 10 years ago need to learn about economics.

A CPU in the same performance category as a $100 CPU from 10 years ago should be in the $160-250 price bracket. As this is not true, its obvious that cost per CPU has come down quite a bit just by virtue of the price tiers staying fairly consistent.


----------



## infranoia

All other things being equal, I wouldn't necessarily go with Intel over AMD. What's the draw? Warranty? The color blue? You hate change? Their flux smells different? It's a new motherboard either way.

If Zen is engineered well enough to trade blows with Intel, then absolutely I would REWARD that.

The instruction set is called AMD64, people. Don't worry. It's native.


----------



## Clocknut

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> I severely doubt they will be able to provide 15% better performance, however 15% less performance but for hundreds of dollars cheaper might make people switch sides.
> 
> Like imagine an AMD Zen 8 Core with IPC of Haswell-E or Ivy-E for $600 competing against a Skylake-E 8 Core for $1000.
> 
> That's quite a competitive product.


what I meant is they need to offer 15% better price/performance.

if the same performing CPU are price the same as Intel's, just as how their poorly priced Fury X vs 980ti then I will stick to intel


----------



## BiG StroOnZ

Quote:


> Originally Posted by *Clocknut*
> 
> what I meant is they need to offer 15% better price/performance.
> 
> *if the same performing CPU are price the same as Intel's, just as how their poorly priced Fury X vs 980ti then I will stick to intel*


I agree with this, if it becomes a 980 Ti vs Fury X with Zen vs Skylake-E/Kaby Lake then yes it will definitely be problematic as you see with the 980 Ti vs Fury X. The majority of people simply just buy a 980 Ti because it is clearly faster and overclocks better, therefore allowing more for your $650. Whereas the Fury X performs worse than a 980 Ti and actually doesn't really overclock at all, but yet still maintains a $650 price tag.

If AMD does this again with their Zen CPUs, they will have a problem moving units as seen with their Fury lineup.


----------



## Seronx

5 pipelines per thread!

Integer/Memory is a downgrade. Floating Point is an improvement. It would be a nice phone processor, but for servers it is subpar.


----------



## Carniflex

That was good read. Ofc in the end benchmarks will tell .. once it releases which is so far off its even a bit too early for hype.


----------



## 364901

I think a lot of people are assuming that AMD will simply price themselves out of competition by keeping some kind of parity with Intel. I'm not sure if they're basing this solely off the value of the brand, or the public's reception of the FX-9000 chips.

AMD did the smart thing by pricing Fury X and Nano like they did. There's nothing out there like these products, and they priced them at $650 because they expected consumers to snap them up despite that price tag. What ended up happening in the launch week? All available Fiji-based cards got snapped up within hours. The market supported that $650 price point and so long as AMD continues to sell out on every Fiji chip they make, they're perfectly happy with not dropping the price.

If Zen costs $1000 for a dual-threaded, eight-core chip that keeps up with the Core i7-5960X, who are we to deny them the right to begin pricing it like that? If the market doesn't support that price point, AMD will notice it in their partner's reported sales figures, and drop the price accordingly in markets where it isn't selling. Tom Peterson, in a PCPer interview, told the audience that this is why they priced the GTX 980 like they did, and why it is sold as a high-end product - consumers were buying it up in droves, so why sell themselves short on a product they're seeing healthy margins on? Intel does the exact same thing with the Xeon family.

Zen's going to be good, that's what I believe through following all the leaks and hints about the chip's structure. It'll be a strong contender for multi-threaded performance, it has more of a focus on compatibility with what Intel's doing rather than trying to play their own game, and it's going to be the first new CPU architecture from AMD in almost five years. A lot has changed in that time and they're not going to throw away good opportunities now like previous management did in the past. This is going to be exciting stuff to play with.


----------



## sumitlian

Quote:


> Originally Posted by *CataclysmZA*
> 
> I think a lot of people are assuming that AMD will simply price themselves out of competition by keeping some kind of parity with Intel. I'm not sure if they're basing this solely off the value of the brand, or the public's reception of the FX-9000 chips.
> 
> AMD did the smart thing by pricing Fury X and Nano like they did. There's nothing out there like these products, and they priced them at $650 because they expected consumers to snap them up despite that price tag. What ended up happening in the launch week? All available Fiji-based cards got snapped up within hours. The market supported that $650 price point and so long as AMD continues to sell out on every Fiji chip they make, they're perfectly happy with not dropping the price.
> 
> If Zen costs $1000 for a dual-threaded, eight-core chip that keeps up with the Core i7-5960X, who are we to deny them the right to begin pricing it like that? If the market doesn't support that price point, AMD will notice it in their partner's reported sales figures, and drop the price accordingly in markets where it isn't selling. Tom Peterson, in a PCPer interview, told the audience that this is why they priced the GTX 980 like they did, and why it is sold as a high-end product - consumers were buying it up in droves, so why sell themselves short on a product they're seeing healthy margins on? Intel does the exact same thing with the Xeon family.
> 
> Zen's going to be good, that's what I believe through following all the leaks and hints about the chip's structure. It'll be a strong contender for multi-threaded performance, it has more of a focus on compatibility with what Intel's doing rather than trying to play their own game, and it's going to be the first new CPU architecture from AMD in almost five years. A lot has changed in that time and they're not going to throw away good opportunities now like previous management did in the past. This is going to be exciting stuff to play with.


I couldn't agree more than ^this.


----------



## mcg75

Quote:


> Originally Posted by *CataclysmZA*
> 
> I think a lot of people are assuming that AMD will simply price themselves out of competition by keeping some kind of parity with Intel. I'm not sure if they're basing this solely off the value of the brand, or the public's reception of the FX-9000 chips.


They are basing it off people's perception of the brand. Being the price/performance leader doesn't necessarily do much for the bottom line.

AMD needs to shake off that label. The only way to do it is to offer a great product at a proper price point instead of discounting it.
Quote:


> Originally Posted by *CataclysmZA*
> 
> AMD did the smart thing by pricing Fury X and Nano like they did. There's nothing out there like these products, and they priced them at $650 because they expected consumers to snap them up despite that price tag. What ended up happening in the launch week? All available Fiji-based cards got snapped up within hours. The market supported that $650 price point and so long as AMD continues to sell out on every Fiji chip they make, they're perfectly happy with not dropping the price.


Absolutely true for Fury-X but not for Nano. If you wanted a Nano, Newegg has had at least 1 of the 4 brands in stock since launch.

But that's ok too. Nano was meant as a niche product and this is kind of an experiment by AMD to see if there is a big enough market to support it. They didn't intend to sell them in the same quantities as the normal Fury cards.


----------



## Clocknut

with their recent product quality & performance. It will take multi-years of consistent great product b4 they can shake off that bad brand image.

Fury X is selling off quick because there are enough hype for a HBM GPU + small form factor friendly + the combination of very limited quantity of fury X + there is enough AMD fan to buy that thing. I highly doubt they can sell as well as 980Ti if fury X is mass produced.

Zen on overrall need to be 15% better than intel on price/performance, otherwise they wont work.


----------



## looncraz

Quote:


> Originally Posted by *Seronx*
> 
> 5 pipelines per thread!
> 
> Integer/Memory is a downgrade. Floating Point is an improvement. It would be a nice phone processor, but for servers it is subpar.


There's no downgrade. The module has 2xALU and 2xAGU per integer core, Zen has 4xALU ad 2xAGU per integer core.

The AGUs are undoubtedly capable of handling all the needs of the system, as the module design has ample AGU performance and the AGUs are probably now being better utilized and are interacting with a massively improved cache system.

It's all wins.


----------



## 47 Knucklehead

Quote:


> Originally Posted by *Seronx*
> 
> 5 pipelines per thread!
> 
> Integer/Memory is a downgrade. Floating Point is an improvement. It would be a nice phone processor, but for servers it is subpar.


Pretty much this.


----------



## Wishmaker

Zenfone anyone? oh wait, Asus has one







!


----------



## ebduncan

Quote:


> Originally Posted by *Clocknut*
> 
> with their recent product quality & performance. It will take multi-years of consistent great product b4 they can shake off that bad brand image.
> 
> Fury X is selling off quick because there are enough hype for a HBM GPU + small form factor friendly + the combination of very limited quantity of fury X + there is enough AMD fan to buy that thing. I highly doubt they can sell as well as 980Ti if fury X is mass produced.
> 
> Zen on overrall need to be 15% better than intel on price/performance, otherwise they wont work.


Brand Image is not as important with computer parts, according to history anyways. Point blank if you offer a better product at the same price then you're going to go with the better product, unless you have some sort of bias towards the other.

Zen doesn't need to be XX% better than Intel in price and performance. It could have worse performance, and a lower price, It could have better performance and higher price. They just need to find the median and price it competitively to attract sales. If their product is good they will begin to recapture market share. AMD could outsell Intel 9 to 1 for the next 5 years and still not get past 50% market share, thanks to the current volume of the PC market.


----------



## spurdomantbh

It's hard to tell at the moment what the performance will be, but theoretically, compared to Haswell this should win in a lot of benchmarks. We still don't know what ops the ALUs can do and we don't know AGU width, but let's do some basic comparison to Haswell. Integer wise if the AGUs can feed the ALUs, theoretically both Haswell and Zen should be equal here. However if I understand this correctly, back-end wise, seems like Zen can issue 8 ops. So Haswell would experience a bottleneck if it's doing integer and floating point operations at the same time, whereas Zen would not. Floating point advantage for desktop users actually goes to Zen here. 256bit ops are still rarely used these days. On 128bit Zen can issue more FADD or FMUL ops. Haswell really only has an advantage for doing 2x256 bit FMA, but these are rarely used in desktop applications. Also Zen should use less power thanks to the smaller FPU.
But it's hard to say how the performance will be in reality. We still don't know a lot of details about Zen. Most importantly we don't know what clocks the 14/16nm node will reach. I have a feeling intel has a better process for 14nm and will have higher clocks, which will make them victorious in overall performance. But on the same clocks I think Zen can compete with skylake in desktop space.


----------



## EniGma1987

Quote:


> Originally Posted by *spurdomantbh*
> 
> However if I understand this correctly, back-end wise, seems like Zen can issue 8 ops.


Maybe I read it wrong, but I though it was able to decode only 4 instructions per cycle which means only 4 ops could be issued if that is all that can be decoded right? So it should be equal to Haswell in that area too unless I am mistaken


----------



## Seronx

Quote:


> Originally Posted by *EniGma1987*
> 
> Maybe I read it wrong, but I though it was able to decode only 4 instructions per cycle which means only 4 ops could be issued if that is all that can be decoded right? So it should be equal to Haswell in that area too unless I am mistaken


It can decode 4 macro-ops. It is unknown if the decoders can decode FastPath double singularly or if they need to fuse.

If we are going based on AMD64 manuals, the decodes should be 4 FastPath doubles, 8 FastPath singles, ~~length Vector{Microcode}. If not it just points out Zen is just a stop gap for NG Bulldozer 5{Crane} and 6{Sweeper}.


----------



## spurdomantbh

Quote:


> Originally Posted by *EniGma1987*
> 
> Maybe I read it wrong, but I though it was able to decode only 4 instructions per cycle which means only 4 ops could be issued if that is all that can be decoded right? So it should be equal to Haswell in that area too unless I am mistaken


Sorry if I'm unclear. I'm talking about the execution pipeline. From the GCC patch it seems like Zen is 4 int + 4 fpu. And Haswell seems to have only 4 for both int and fpu in port 0, 1, 5 and 6. I'm not 100% sure if I'm right on this, I'm not an expert.


----------



## btupsx

Zen looks very promising indeed. The big (HUGE) question mark is going to be the process node characteristics. It's going to be essential for Samsung's 14nm FF LPP to translate appropriately to a desktop/server arch design. If it performs as needed/intended, Zen is going to be very close to Haswell, but it needs to be right around 4 GHz, with a little OC headroom. On the other hand, if the process struggles to achieve 3.2- 3.5 GHz, Zen could be seen as a "what could have been" semi-disappointment.

Fingers crossed.


----------



## muffins

i was going to wait for zen buy after hearing about its delay release of end of 2016 early 2017 i went ahead and upgraded my 980x to skylake with a 6700k.

intel is already on skylake, and skylake brought a decent boost over haswell and a worth while upgrade boost over all previous generations under haswell. intel is planning on releasing its skylake refresh at the end of 2016 / early 2017 with kaby lake. if by the end of 2016 / early 2017 zen releases and only matches haswell, intel architecture from 2013 in performance, amd will once again still be two generations behind intel in terms of performance.

i'm not saying haswell performances isn't great, no its awesome, but if amd want's to be successful and really cause people to switch over it needs to at least match intel's current generation in terms of performance. toss in a nice platform with a rich feature set, and competitive pricing, and amd can be hugely successful. if it just matches haswell it will just be another bulldozer for them.


----------



## btupsx

Quote:


> Originally Posted by *muffins*
> 
> i was going to wait for zen buy after hearing about its delay release of end of 2016 early 2017 i went ahead and upgraded my 980x to skylake with a 6700k.
> 
> intel is already on skylake, and skylake brought a decent boost over haswell and a worth while upgrade boost over all previous generations under haswell. intel is planning on releasing its skylake refresh at the end of 2016 / early 2017 with kaby lake. if by the end of 2016 / early 2017 zen releases and only matches haswell, intel architecture from 2013 in performance, amd will once again still be two generations behind intel in terms of performance.
> 
> i'm not saying haswell performances isn't great, no its awesome, but if amd want's to be successful and really cause people to switch over it needs to at least match intel's current generation in terms of performance. toss in a nice platform with a rich feature set, and competitive pricing, and amd can be hugely successful. if it just matches haswell it will just be another bulldozer for them.


It depends how Zen ultimately clocks, and how they price it. Attaining near parity with Haswell would be a monumental accomplishment. That would be ~60% improvement over Vishera silicon. Even if Zen does not quite hit Haswell parity, it most certainly would NOT be "another Bulldozer." BD failed to exceed its predecessor.


----------



## maarten12100

Quote:


> Originally Posted by *KarathKasun*
> 
> Eh, each performance level of tech gadgets should be going up in price by about %5-10 per year because of inflation. People who compare prices now to prices 10 years ago need to learn about economics.
> 
> A CPU in the same performance category as a $100 CPU from 10 years ago should be in the $160-250 price bracket. As this is not true, its obvious that cost per CPU has come down quite a bit just by virtue of the price tiers staying fairly consistent.


That only works if people get payed more otherwise they feel that things get more expensive relative to what they earn.
Such exponential growth needs to stop at some point.
Quote:


> Originally Posted by *Seronx*
> 
> It can decode 4 macro-ops. It is unknown if the decoders can decode FastPath double singularly or if they need to fuse.
> 
> If we are going based on AMD64 manuals, the decodes should be 4 FastPath doubles, 8 FastPath singles, ~~length Vector{Microcode}. If not it just points out Zen is just a stop gap for NG Bulldozer 5{Crane} and 6{Sweeper}.


Please no more Bulldozer iterations unless fully fixed and competitive.


----------



## schmotty

Zen looks very promising. I think this is going to be a winner.

I can't imagine why AMD would not be trying to beat or at least match Intel's current offerings *at the time of release*. I'm pretty much and AMD CPU fan, but if they release Zen only to match Intel's previous generation, I'll keep my Phenom until it dies and then switch. I see no point in AMD dumping R&D $$$ into a product they know will be inferior.

They should fire their marketing and PR departments all together and save the money. Let the product speak for itself, and let it scream awesomeness.


----------



## Hattifnatten

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *muffins*
> 
> i was going to wait for zen buy after hearing about its delay release of end of 2016 early 2017 i went ahead and upgraded my 980x to skylake with a 6700k.
> 
> intel is already on skylake, and skylake brought a decent boost over haswell and a worth while upgrade boost over all previous generations under haswell. intel is planning on releasing its skylake refresh at the end of 2016 / early 2017 with kaby lake. if by the end of 2016 / early 2017 zen releases and only matches haswell, intel architecture from 2013 in performance, amd will once again still be two generations behind intel in terms of performance.
> 
> i'm not saying haswell performances isn't great, no its awesome, but if amd want's to be successful and really cause people to switch over it needs to at least match intel's current generation in terms of performance. toss in a nice platform with a rich feature set, and competitive pricing, and amd can be hugely successful. if it just matches haswell it will just be another bulldozer for them.





Zen wasn't delayed; it has always been scheduled for a 2H/late 2H release, with Zen APUs in 2017. I guess some people were not fully aware of the APUs coming later, and then freaked out when they saw Zen and 2017 together, causing a rumour to spread









Nice to see that we get some high-level info on the cores. Looks exiting


----------



## PiOfPie

This looks like the antithesis of Bulldozer in many ways; the general idea seems to be to minimize the amount of latency in the core with checkpointing and the rest of the branch misprediction improvements and spit things out AQAP. I bet that's one of the things Jim brought over; the 212/21464 had deep pipes like Bulldozer, but Alpha also had a lot of countermeasures against branch misprediction, so Jim probably knew of some ways to further reduce the misprediction. AMD did a lot of work themselves reducing the branch misprediction in the subsequent iterations of the Construction Cores, as well. PD was a 30% reduction, and I think SR was also around a 30% reduction.

They also seem to think they've fixed the cache latency problem if they doubled the amount of L2 they're putting on it.

The salient question: how many consumer-grade applications heavily use 256-bit FP operations? That appears to be Zen's primary bottleneck.


----------



## EniGma1987

Quote:


> Originally Posted by *PiOfPie*
> 
> The salient question: how many consumer-grade applications heavily use 256-bit FP operations? That appears to be Zen's primary bottleneck.


Dont PCSX2 and Dolphin make fairly good use of 256-bit AVX2 instructions for performance boosts? I doubt I will get Zen if it does have gimped AVX2 performance like the past AMD processors. I do like to go back and play some of those old games from time to time or have a lan party and play some old Mario Kart and stuff, and it is nice to upscale to 4K resolution with AA + AF, high def texture packs, and still get 60fps.


----------



## spurdomantbh

Quote:


> Originally Posted by *EniGma1987*
> 
> Dont PCSX2 and Dolphin make fairly good use of 256-bit AVX2 instructions for performance boosts?


I believe they can. But Zen isn't really gimped in AVX2. It can do 2xFADD/FMUL ops just like Haswell. However seems like Haswell can do 2xFMA, while Zen can do 1xFMA. But there might be a typo I think.

Code:



Code:


+                     "znver1-double,(znver1-fp0+znver1-fp3)|(znver1-fp1+znver1-fp3)")

Doesn't make much sense. It could be a typo so we still can't be 100% sure how the FPU will work. But either way on most workloads it shouldn't lag behind Haswell.


----------



## sumitlian

Quote:


> Originally Posted by *EniGma1987*
> 
> Dont PCSX2 and Dolphin make fairly good use of 256-bit AVX2 instructions for performance boosts? I doubt I will get Zen if it does have gimped AVX2 performance like the past AMD processors. I do like to go back and play some of those old games from time to time or have a lan party and play some old Mario Kart and stuff, and it is nice to upscale to 4K resolution with AA + AF, high def texture packs, and still get 60fps.


Haswell Pentium G is performing significantly better per clock in both applications than its predecessors, and it doesn't have AVX/AVX2.


----------



## PiOfPie

One other point: if you look at Stroonz's third source and scroll all the way down do a control+f for avx512, you can see that the flags for AVX512 are set to 0, so Zen won't support it, at least not until Zen+.


----------



## Redwoodz

Quote:


> Originally Posted by *Clocknut*
> 
> what I meant is they need to offer 15% better price/performance.
> 
> if the same performing CPU are price the same as Intel's, just as how their poorly priced Fury X vs 980ti then I will stick to intel


Quote:


> Originally Posted by *Clocknut*
> 
> with their recent product quality & performance. It will take multi-years of consistent great product b4 they can shake off that bad brand image.
> 
> Fury X is selling off quick because there are enough hype for a HBM GPU + small form factor friendly + the combination of very limited quantity of fury X + there is enough AMD fan to buy that thing. I highly doubt they can sell as well as 980Ti if fury X is mass produced.
> 
> Zen on overrall need to be 15% better than intel on price/performance, otherwise they wont work.


That's just ludicrous. Better just send your money to Intel now,don't bother to wait to get something in return for it. You have just proved AMD can be every bit as good as Intel and the fanboys will still cry Intel/Nvidia is better. Should be ashamed.


----------



## sumitlian

Quote:


> Originally Posted by *PiOfPie*
> 
> One other point: if you look at Stroonz's third source and scroll all the way down do a control+f for avx512, you can see that the flags for AVX512 are set to 0, so Zen won't support it, at least not until Zen+.


After seeing these AMD Zen slides, Lets assume Intel may/should only be faster in FP workloads then. Intel's last chance should be within their FPU power.
And I would be surprised if Intel lets AMD use AVX512 first, until they first Introduce it with Cannonlake.

I am also skeptical of all currently available documents and optimization guides (if there are any) for avx512 for their Xeon, as there is no guarantee those avx intrinsics would work "optimally" with AMD Zen+ or whatever. This is after watching all the Instruction set wars that we have been seeing all these past years.


----------



## looncraz

Quote:


> Originally Posted by *spurdomantbh*
> 
> I believe they can. But Zen isn't really gimped in AVX2. It can do 2xFADD/FMUL ops just like Haswell. However seems like Haswell can do 2xFMA, while Zen can do 1xFMA. But there might be a typo I think.
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> +                     "znver1-double,(znver1-fp0+znver1-fp3)|(znver1-fp1+znver1-fp3)")
> 
> Doesn't make much sense. It could be a typo so we still can't be 100% sure how the FPU will work. But either way on most workloads it shouldn't lag behind Haswell.


Wow, yeah, that must be a typo. I'd expect even/odd or adjacent ganging, not fp3 always being ganged. But, if that's the case, then maybe they use fp2 for something else?

Or, maybe, I'm just not understanding what the code is meant to represent.


----------



## Vesku

It's going to mainly come down to how much AMD can improve their cache system and memory controller. The Apple chips showed this.


----------



## SpeedyVT

Quote:


> Originally Posted by *geoxile*
> 
> Zen is aimed at servers where scaling isn't as big of an issue as it is with personal-use and hence perf/power means they can scale up more. Given that they're producing Zen on GF's 14nm process they probably can't make something that's both large and efficient. So they're making something kinda big and will most likely be trying to maximize power efficiency.


Plus with it aiming up to 16 cores with HT... I highly doubt the need for large cores.


----------



## PiOfPie

Quote:


> Originally Posted by *Vesku*
> 
> It's going to mainly come down to how much AMD can improve their cache system and memory controller. The Apple chips showed this.


They seem to have confidence in their cache if the design is relatively close to Haswell's but double the cache size.
Jim supposedly really likes tuning IMCs, and it shows if his work on Cyclone is anything to go by.

I think the clincher will be the node and how high it clocks.


----------



## cookieboyeli

Quote:


> Originally Posted by *PiOfPie*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Vesku*
> 
> It's going to mainly come down to how much AMD can improve their cache system and memory controller. The Apple chips showed this.
> 
> 
> 
> They seem to have confidence in their cache if the design is relatively close to Haswell's but double the cache size.
> Jim supposedly really likes tuning IMCs, and it shows if his work on Cyclone is anything to go by.
> 
> I think the clincher will be the node and how high it clocks.
Click to expand...

Absolutely. Because there's no way they're blowing PAST Kaby Lake, but if it clocks well then it's really going to be a winner.








Is there anything else on that node we can compare with?
Any clues as to it's potential or downfalls?


----------



## BiG StroOnZ

Seems HeXus wants to get in on this too with some more charts:
Quote:


> AMD's first generation Zen processors will have; four instruction decoders, four Integer units (ALUs), two address units (AGUs) and four floating-point units (FP) at 128 bit wide. Compared to Bulldozer and Steamroller, Zen will have double the execution units and quadruple floating point units. SMT, similar to Intel Hyperthreading, will be enabled by Zen. The cache sizes are speculation, just to complete the diagram. Further speculation is that the clock speeds that Zen will run at will be between 3.5 to 4.0GHz - even though it will be manufactured at 14 or 16nm.
> 
> *Edit Chart 2:*


*Source:* http://hexus.net/tech/news/cpu/86954-zen-processor-block-diagram-devised-amd-software-patch/


----------



## epic1337

Quote:


> Originally Posted by *cookieboyeli*
> 
> Absolutely. Because there's no way they're blowing PAST Kaby Lake, but if it clocks well then it's really going to be a winner.
> 
> 
> 
> 
> 
> 
> 
> 
> Is there anything else on that node we can compare with?
> Any clues as to it's potential or downfalls?


soldered + good clock per voltage = pure win
AMD's processors had always gotten good clock potential, this is something which could also be expected from Zen.
i just hope AMD doesn't put the base clock at 4Ghz with boost of +500Mhz, that high of a baseclock ruins the fun in overclocking.


----------



## Clocknut

Quote:


> Originally Posted by *epic1337*
> 
> soldered + good clock per voltage = pure win
> AMD's processors had always gotten good clock potential, this is something which could also be expected from Zen.
> i just hope AMD doesn't put the base clock at 4Ghz with boost of +500Mhz, that high of a baseclock ruins the fun in overclocking.


if they put for ex. 3.5GHz, the price gonna be 3.5GHz pricing. I am gonna compare that with the equivalent stock intel cpu performance.

Overclock-ability has always been the last part of the product valuation, right after heat + power consumption.


----------



## Wishmaker

Relax guys! it is all fine! They had a decade of copying INTEL designs







! The same way INTEL copied them in the past







. The CPU industry has gone full circle







. Zen will be good and we will ride all our noble steeds into the sunshine!


----------



## sumitlian

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Seems HeXus wants to get in on this too with some more charts:
> *Source:* http://hexus.net/tech/news/cpu/86954-zen-processor-block-diagram-devised-amd-software-patch/


Hadn't they added two more execution ports with Haswell ? One for ALU and another one for AGU.
or am I missing something ?


----------



## Seronx

1 * BP -> 1 * 128 KB L1i -> 1 * F -> 1 * P -> 1 * D -> 1 * gRT{FE} -> 4 * disP -> 4 * 16KB L0i -> 4 * IntMemFP Sch{LoRT} -> 4 * RegRd -> 4 * Int/Mem/FP Exe -> 4 * 16 KB L0d WT -> 1 * 128KB L1d WB{FE} -> 1 * 1 MB L2


----------



## Artikbot

It'd be amazing if you could actually explain whatever it is that you wrote there.


----------



## DweeB0

Quote:


> Originally Posted by *Artikbot*
> 
> It'd be amazing if you could actually explain whatever it is that you wrote there.


He's speaking in tongues.
Only the processor Gods can understand him now.


----------



## Artikbot

Well, using words instead of acronyms tends to do the job.


----------



## delboy67

What about predictions of performance per watt? Any ideas? Surely this is an important metric in servers as outright performance.


----------



## Seronx

Quote:


> Originally Posted by *Artikbot*
> 
> It'd be amazing if you could actually explain whatever it is that you wrote there.


I'll do it vertically, there is to little space for horizontal paths.

4-way SMT Branch Predictor
128 KB L1 Instruction Cache
Fetch
Pick//Pre-decode Cache
Decode
Global Instruction Retirement <-> 128KB L1 Data Cache // Write-back to L2 Cache
Dispatch {4-way}
4 * 16 KB L0 Instruction Cache/Macro-op caches
4 * Integer/Memory/Floating Point - Scheduler/Local Instruction Retire <-> L0d
4 * Integer/Floating Point - Physical Register Files
4 * Integer/Memory/Floating Point - Execution Cores
4 * 16 KB L0 Data Cache // Write-through to L1 Data Cache
1 MB L2

Execution resources{From speculative stance}:
2 * 128-bit MMX units // or from disassemblement? // 2 * 64-bit Integer ALUs + 2 * 64-bit Integer AGUs
2 * 128-bit FMAC units // or from disassemblement? // 2 * 64-bit Integer iMUL/iDIV + 1 * 64-bit Integer SQRT/XBAR

Bunch of weird patent lingo for nondescript architectures led to weird phrases that I have no idea what the meaning is.


----------



## LuckyStarV

Quote:


> Originally Posted by *sumitlian*
> 
> Hadn't they added two more execution ports with Haswell ? One for ALU and another one for AGU.
> or am I missing something ?


They added an ALU + port for it as well as another one

Sadny/Ivy only had 3 Integer ALUs if I am reading it correctly

http://www.anandtech.com/show/6355/intels-haswell-architecture/8


----------



## sumitlian

Quote:


> Originally Posted by *LuckyStarV*
> 
> They added an ALU + port for it as well as another one
> 
> Sadny/Ivy only had 3 Integer ALUs if I am reading it correctly
> 
> http://www.anandtech.com/show/6355/intels-haswell-architecture/8


This is what I've been believing too, until I saw following from hexus.net


----------



## spurdomantbh

Quote:


> Originally Posted by *looncraz*
> 
> Wow, yeah, that must be a typo. I'd expect even/odd or adjacent ganging, not fp3 always being ganged. But, if that's the case, then maybe they use fp2 for something else?
> 
> Or, maybe, I'm just not understanding what the code is meant to represent.


Yes you are understanding it correctly. It's always using fp3 for adding. The only way this is not a typo is if fp2 is gimped for whatever reason? That really doesn't make much sense...

edit: looked over some more of the code. Seems like fp2 is fully used everywhere else. I think it really is a typo.

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Seems HeXus wants to get in on this too with some more charts:
> *Source:* http://hexus.net/tech/news/cpu/86954-zen-processor-block-diagram-devised-amd-software-patch/


Hexus is wrong.
Sandy/Ivy: 3 ALU 3 AGU (2 load/store, 1 store) 1x256 bit FPU
Haswell/Broadwell: 4 ALU 4 AGU (2 load/store, 2 store) 2x256bit FPU
Quote:


> Originally Posted by *epic1337*
> 
> soldered + good clock per voltage = pure win
> AMD's processors had always gotten good clock potential, this is something which could also be expected from Zen.
> i just hope AMD doesn't put the base clock at 4Ghz with boost of +500Mhz, that high of a baseclock ruins the fun in overclocking.


Don't expect a high base clock. Chances are the base could be 3.2GHz, but boost up to 4Ghz. Samsung/Glofo 14nm process is designed with mobile in mind after all so it shouldn't do well with base clocks. Personally I hope the base is at least 3.5GHz and boosts up to at least 4GHz.


----------



## BiG StroOnZ

Turns out the chart is not actually from Hexus, but instead from a German website called 3DCenter.org which Hexus borrowed.

I will edit the chart myself and then re-post it.



Better?


----------



## spurdomantbh

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Turns out the chart is not actually from Hexus, but instead from a German website called 3DCenter.org which Hexus borrowed.
> 
> I will edit the chart myself and then re-post it.
> 
> 
> 
> Better?


I believe it's correct now.
edit: bulldozer decoder has 1x4 and 2x4 for steamroller and excavator


----------



## LuckyStarV

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> Turns out the chart is not actually from Hexus, but instead from a German website called 3DCenter.org which Hexus borrowed.
> 
> I will edit the chart myself and then re-post it.
> 
> 
> 
> Better?


Better









Iirc, Haswelll is 1x 256bit add 1x256biy mul which is the same as Sandy, making it 1x256 FPU. However, they can act as two when you use FMA3 instructions with them, allowing each 1x256 unit to do both add/mul. At least I think that is what it is like.


----------



## sumitlian

Quote:


> Originally Posted by *spurdomantbh*
> 
> I believe it's correct now.
> edit: bulldozer decoder has 1x4 and 2x4 for steamroller and excavator


Hahaha.....lets edit it again.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *spurdomantbh*
> 
> I believe it's correct now.
> edit: bulldozer decoder has 1x4 and 2x4 for steamroller and excavator


I'll see what I can do give me a minute.

Edit:

How would you want me to represent that in the chart exactly?

Tell me how you want me to revise it and I'll fix it tomorrow night, need to get to bed for now.


----------



## spurdomantbh

Quote:


> Originally Posted by *LuckyStarV*
> 
> Better
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Iirc, Haswelll is 1x 256bit add 1x256biy mul which is the same as Sandy, making it 1x256 FPU. However, they can act as two when you use FMA3 instructions with them, allowing each 1x256 unit to do both add/mul. At least I think that is what it is like.


I believe Haswell has full 2x 256bit FMA units.
Quote:


> Originally Posted by *sumitlian*
> 
> Hahaha.....lets edit it again.


I kept trying to type one thing and ended up typing something else. I think I'm semi ******ed DDDDDDDDDDDDDD
But Bulldozer has 1x4 decoder and SR/XV has 2x4


----------



## sumitlian

Quote:


> Originally Posted by *LuckyStarV*
> 
> Iirc, Haswelll is 1x 256bit add 1x256biy mul which is the same as Sandy, making it 1x256 FPU. However, they can act as two when you use FMA3 instructions with them, allowing each 1x256 unit to do both add/mul. At least I think that is what it is like.


True.
FMA3 and AVX2 make Haswell do 32 flops/cycle as compared to 16 flops/cycle with Sandy/Ivy.


----------



## sumitlian

Quote:


> Originally Posted by *spurdomantbh*
> 
> I kept trying to type one thing and ended up typing something else. I think I'm semi ******ed DDDDDDDDDDDDDD
> But Bulldozer has 1x4 decoder and SR/XV has 2x4


No no....You ain't ******ed at all.








We were just focusing on Sandy/ivy vs Haswell, that's it.


----------



## spurdomantbh

Quote:


> Originally Posted by *sumitlian*
> 
> No no....You ain't ******ed at all.
> 
> 
> 
> 
> 
> 
> 
> 
> We were just focusing on Sandy/ivy vs Haswell, that's it.


I am though, I typed it out correctly first then closed the browser by accident









Also from what I've read Haswell can do 256bit 2xFMA, 2xFMUL but only 1xFADD. Haswell has a really complicated FPU.

edit: ok sandy/ivy has 1x 256bit FMUL and 1x 256bit FADD. Haswell has 2x FMA units. So technically both Haswell and Sandy have 2x FP units, but Haswell has 2xFMA while Sandy has 1xFADD and 1x FMUL (which result in 1xFMA).


----------



## FlanK3r

Now 1 year of waiting for product, keep my fingers cross:thumb:


----------



## spurdomantbh

Made my own chart m8s. I think it's correct.


Quote:


> Originally Posted by *FlanK3r*
> 
> Now 1 year of waiting for product, keep my fingers cross:thumb:


I hope it's out on 1H 2016 tbh


----------



## Dom-inator

Quote:


> Originally Posted by *spurdomantbh*
> 
> I hope it's out on 1H 2016 tbh


Mate you're dreamin. Q1 2017 earliest


----------



## spurdomantbh

Quote:


> Originally Posted by *Dom-inator*
> 
> Mate you're dreamin. Q1 2017 earliest


It's already confirmed a 2016 product m8.


----------



## Jimbags

why they comparing to sandy and Ivy? even haswell/broadwell will be old by release time in terms of tech...


----------



## FlanK3r

Quote:


> Originally Posted by *spurdomantbh*
> 
> Made my own chart m8s. I think it's correct.
> 
> 
> I hope it's out on 1H 2016 tbh


I dont think so, first plan was Q2/Q3 (so something around Computex or later up to October). What is possible, we will see motherboards AM4 at Computex 2016


----------



## Cursedqt

So sad to see that this is news, guess people will take anything as true. Ironically this was on r/AMD a couple of days ago and all of a sudden it went all over the place. Sad really if someone makes a theoretical cpu core on the basis of what AMD has stated on their pdf files how willing are people to believe it. It seem a lot.


----------



## ku4eto

Quote:


> Originally Posted by *Cursedqt*
> 
> So sad to see that this is news, guess people will take anything as true. Ironically this was on r/AMD a couple of days ago and all of a sudden it went all over the place. Sad really if someone makes a theoretical cpu core on the basis of what AMD has stated on their pdf files how willing are people to believe it. It seem a lot.


Ahm, this CPU patch file is rather good source if you ask me. And people have been believing rumours since the begining of society. Just go and check what goes around in the nVidia/Intel release rumour threads.


----------



## Dom-inator

Quote:


> Originally Posted by *Cursedqt*
> 
> So sad to see that this is news, guess people will take anything as true. Ironically this was on r/AMD a couple of days ago and all of a sudden it went all over the place. Sad really if someone makes a theoretical cpu core on the basis of what AMD has stated on their pdf files how willing are people to believe it. It seem a lot.


I understand what you're saying, but It doesn't cost anything to believe it or not. It promotes discussion and speculation, and some people actually learn a bit more about how a CPU works. Looking at the big picture, it's part of what forums like OCN and hardware related subreddits are for.

If someone is basing their future upgrade decisions on this rumour/news, well that's unwise and they shouldn't be doing it.


----------



## spurdomantbh

Quote:


> Originally Posted by *Cursedqt*
> 
> So sad to see that this is news, guess people will take anything as true. Ironically this was on r/AMD a couple of days ago and all of a sudden it went all over the place. Sad really if someone makes a theoretical cpu core on the basis of what AMD has stated on their pdf files how willing are people to believe it. It seem a lot.


It's from a compiler. They can't just throw random stuff in there. All the data here is more or less correct(with one possible typo for FMA). This isn't even a rumor, these specs for the core are pretty much confirmed.


----------



## Cursedqt

Quote:


> Originally Posted by *spurdomantbh*
> 
> It's from a compiler. They can't just throw random stuff in there. All the data here is more or less correct(with one possible typo for FMA). This isn't even a rumor, these specs for the core are pretty much confirmed.


Okay can you link AMD reporting this?

It peeked my interest


----------



## spurdomantbh

Quote:


> Originally Posted by *Cursedqt*
> 
> Okay can you link AMD reporting this?
> 
> It peeked my interest


All the info is from here: https://patchwork.ozlabs.org/patch/524324/
You can clearly see: Submitter [email protected]
If you scroll down to the patch itself you'll find

+ 32, /* size of l1 cache. */

+ 512, /* size of l2 cache. */

+;; Decoders unit has 4 decoders and all of them can decode fast path
+;; and vector type instructions.

+;; Integer unit 4 ALU pipes.

+;; 2 AGU pipes.

+;; Floating point unit 4 FP pipes.


----------



## Cursedqt

Quote:


> Originally Posted by *spurdomantbh*
> 
> All the info is from here: https://patchwork.ozlabs.org/patch/524324/
> You can clearly see: Submitter [email protected]
> If you scroll down to the patch itself you'll find
> 
> + 32, /* size of l1 cache. */
> 
> + 512, /* size of l2 cache. */
> 
> +;; Decoders unit has 4 decoders and all of them can decode fast path
> +;; and vector type instructions.
> 
> +;; Integer unit 4 ALU pipes.
> 
> +;; 2 AGU pipes.
> 
> +;; Floating point unit 4 FP pipes.


Good, thanks. Now if only tech sites reported this instead of a blog post.


----------



## EniGma1987

Quote:


> Originally Posted by *cookieboyeli*
> 
> Absolutely. Because there's no way they're blowing PAST Kaby Lake, but if it clocks well then it's really going to be a winner.
> 
> 
> 
> 
> 
> 
> 
> 
> Is there anything else on that node we can compare with?
> Any clues as to it's potential or downfalls?


Sadly no, not yet. If AMD is still using the 14nm FF+ node from global foundries, it is a tweaked node from a low power mobile process. Personally I don't imagine we will be getting higher than 3.5~ clocks on such a node. There were other rumors about a switch to TSMC's 16nm FF+ instead because of GF node troubles. I don't believe we have any products on that yet either but I believe it is also a node tweaked from a more high density+lower power type process. I am betting these will be amazing mobile chips, but from the little evidence of the nodes we have so far it doesnt give very promising info to a high clocking desktop design









Quote:


> Originally Posted by *Artikbot*
> 
> It'd be amazing if you could actually explain whatever it is that you wrote there.


He probably can, and it isnt even close to as impressive as Seronx is trying to make himself appear. The guy is well known for flip flopping on any idea he posts just a week or two later. He likes to try and type things in the most technical and hard to understand way possible so people think he is smart, really he posts mostly mis information and/or stuff stolen from other posters. Best to keep him on a block list.


----------



## spurdomantbh

Quote:


> Originally Posted by *EniGma1987*
> 
> Sadly no, not yet. If AMD is still using the 14nm FF+ node from global foundries, it is a tweaked node from a low power mobile process. Personally I don't imagine we will be getting higher than 3.5~ clocks on such a node. There were other rumors about a switch to TSMC's 16nm FF+ instead because of GF node troubles. I don't believe we have any products on that yet either but I believe it is also a node tweaked from a more high density+lower power type process. I am betting these will be amazing mobile chips, but from the little evidence of the nodes we have so far it doesnt give very promising info to a high clocking desktop design


I can't remember who exactly, might've been an AMD guy talking about hawaii, said that 20nm was not considered, because high-performance node was performing pretty much the same as a low power node, which is why 20nm HP node was cancelled by both TSMC and GloFo. Meaning high-performance nodes now add very little extra performance compared to their low-power counterparts. This is why FinFET is so important. FinFET allows low-power nodes to achieve higher clocks. That's why even though 14/16nm is just a 20nm node + FinFet, it gives a big enough performance difference to be considered a whole node jump. Right now we're at a point where a high-performance node would probably only give you <10% larger frequency and cost a lot more compared to a low power node.


----------



## Dom-inator

Hmm so you think they'll be good for laptops but no good for high frequencies on desktops? Makes sense really, that's all AMD need. They don't need to win us over we're such a minority.

@spurdomantbh so much knowledge, so little time signed up to OCN. Smurf account? Jim Keller confirmed?


----------



## AmericanLoco

Quote:


> Originally Posted by *Dom-inator*
> 
> Hmm so you think they'll be good for laptops but no good for high frequencies on desktops? Makes sense really, that's all AMD need. They don't need to win us over we're such a minority.
> 
> @spurdomantbh so much knowledge, so little time signed up to OCN. Smurf account? Jim Keller confirmed?


How did you come to that conclusion? He said there's little difference these days between a low-power node, and a high-performance node.


----------



## PPBottle

Quote:


> Originally Posted by *Cursedqt*
> 
> Good, thanks. Now if only tech sites reported this instead of a blog post.


Tech sites probably dont have a clue about what would mean the rest of the patch besides the obvious lines described above.

That particular blog is run by someone with more credibility than 90% of your regular "tech sites".

So no, in this very case that particular blog is way a better source than the usual click bait, copypasting tech site.


----------



## spurdomantbh

Quote:


> Originally Posted by *Dom-inator*
> 
> Hmm so you think they'll be good for laptops but no good for high frequencies on desktops? Makes sense really, that's all AMD need. They don't need to win us over we're such a minority.
> 
> @spurdomantbh so much knowledge, so little time signed up to OCN. Smurf account? Jim Keller confirmed?


Quote:


> Originally Posted by *AmericanLoco*
> 
> How did you come to that conclusion? He said there's little difference these days between a low-power node, and a high-performance node.


Indeed as AmericanLoco says, the nodes are now pretty much universal. Even though it's not a high-performance node, low-power nodes pretty much caught up to it in performance and hp nodes simply don't work very well beyond 20nm. Look how long it took intel to get a high-performance 14nm node, how many delays and extra billions it must've cost them. In the end there's no doubt it's a much better node than what TSMC and GloFo will offer, but when low-power FinFET nodes are almost at the same level of performance, it's not worth it to spend all that extra time and money on a slightly better node.

And I'm not 100% sure about all of this. It's pretty much what I've read up and conclusions I've come to. I know I haven't been here long, but most of my technology discussions used to happen on /g/ and seems like these days it's just filled with uneducated newfriends so I came to OCN hoping for better discussions


----------



## Olivon

Quote:


> Originally Posted by *spurdomantbh*
> 
> All the info is from here: https://patchwork.ozlabs.org/patch/524324/
> You can clearly see: Submitter venkat[email protected]
> If you scroll down to the patch itself you'll find
> 
> + 32, /* size of l1 cache. */
> 
> + 512, /* size of l2 cache. */
> 
> +;; Decoders unit has 4 decoders and all of them can decode fast path
> +;; and vector type instructions.
> 
> +;; Integer unit 4 ALU pipes.
> 
> +;; 2 AGU pipes.
> 
> +;; Floating point unit 4 FP pipes.


Nice find mon and welcome on the forum.


----------



## iLeakStuff

If you dig a bit deeper you can find the new instruction sets for AMDs Zen CPUs









*Excevator:*
+supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
+SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
+extensions.)

*Zen:*
+supersets BMI, *BMI2*, F16C, FMA, *FSGSBASE*, AVX, *AVX2*, *ADCX, RDSEED, MWAITX,*
+*SHA, CLZERO*, AES, PCL_MUL, CX16, *MOVBE*, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
+SSE4.1, SSE4.2, ABM, *XSAVEC, XSAVES, CLFLUSHOPT, POPCNT*, and 64-bit
+instruction set extensions.


----------



## ku4eto

Quote:


> Originally Posted by *iLeakStuff*
> 
> If you dig a bit deeper you can find the new instruction sets for AMDs Zen CPUs
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *Excevator:*
> +supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
> +SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
> +extensions.)
> 
> *Zen:*
> +supersets BMI, *BMI2*, F16C, FMA, *FSGSBASE*, AVX, *AVX2*, *ADCX, RDSEED, MWAITX,*
> +*SHA, CLZERO*, AES, PCL_MUL, CX16, *MOVBE*, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
> +SSE4.1, SSE4.2, ABM, *XSAVEC, XSAVES, CLFLUSHOPT, POPCNT*, and 64-bit
> +instruction set extensions.


SHA? I think this iwll be big plus in servers.


----------



## The Stilt

Quote:


> Originally Posted by *iLeakStuff*
> 
> If you dig a bit deeper you can find the new instruction sets for AMDs Zen CPUs
> 
> 
> 
> 
> 
> 
> 
> 
> 
> *Excevator:*
> +supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE,
> +SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set
> +extensions.)
> 
> *Zen:*
> +supersets BMI, *BMI2*, F16C, FMA, *FSGSBASE*, AVX, *AVX2*, *ADCX, RDSEED, MWAITX,*
> +*SHA, CLZERO*, AES, PCL_MUL, CX16, *MOVBE*, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
> +SSE4.1, SSE4.2, ABM, *XSAVEC, XSAVES, CLFLUSHOPT, POPCNT*, and 64-bit
> +instruction set extensions.


Excavator already supports AVX2, BMI2, FSGSBASE, MOVBE, POPCNT.


----------



## PiOfPie

Quote:


> Originally Posted by *Jimbags*
> 
> why they comparing to sandy and Ivy? even haswell/broadwell will be old by release time in terms of tech...


Because Sandy and Ivy are good performance yardsticks for the price/perf crowd; if AMD matches Sandy or Ivy's IPC but provides moar coars at a similar or lower price point, they'll have a "good enough" Phenom II-esque CPU on their hands that a lot of people will buy. If it matches Haswell, they're sitting pretty.

Outside of niche applications (Dolphin and other emulators; applications that make heavy use of AVX2), Haswell and onward don't really provide much of a performance uplift over Sandy and Ivy. In some cases, Skylake's IPC *regresses* slightly against Haswell at the same clocks. Kaby looks to be a Devil's Canyon-esque refresh that may provide slightly higher clocks. Cannonlake is the only thing that represents a threat if Intel actually does increase core counts.


----------



## Dom-inator

Quote:


> Originally Posted by *AmericanLoco*
> 
> How did you come to that conclusion? He said there's little difference these days between a low-power node, and a high-performance node.


????
Quote:


> Originally Posted by *EniGma1987*
> 
> Sadly no, not yet. If AMD is still using the 14nm FF+ node from global foundries, it is a tweaked node from a low power mobile process. Personally I don't imagine we will be getting higher than 3.5~ clocks on such a node. There were other rumors about a switch to TSMC's 16nm FF+ instead because of GF node troubles. I don't believe we have any products on that yet either but I believe it is also a node tweaked from a more high density+lower power type process. I am betting these will be amazing mobile chips, but from the little evidence of the nodes we have so far it doesnt give very promising info to a high clocking desktop design


----------



## st0necold

i wish i stayed in school


----------



## looncraz

Quote:


> Originally Posted by *EniGma1987*
> 
> If AMD is still using the 14nm FF+ node from global foundries, it is a tweaked node from a low power mobile process. Personally I don't imagine we will be getting higher than 3.5~ clocks on such a node.


The node isn't quite as important to clock speeds as you many seem to think. I've said it before, but I think I should be a bit more specific.

Low power nodes are primarily nodes that have prioritized transistor switching current (and voltage) over transistor switching speed. That's about all there is to it.

Transistor switching speed is important to clock speed based upon the architecture of the longest-running stage on the CPU (including execution units, caches, and the pipeline stages). Faster switching transistors mean the longest stage will take X amount less time to complete, allowing the clock signal to be increased.

The process node will have a certain degree of variance, such that the longest-running stage will vary from CPU to CPU. If you increase the clock signal such that the next signal arrives before the longest-running stage is complete, you will have instability. You need just enough time for the results to reach their destination when the next clock signal arrives.

This is why longer-pipeline CPUs hit higher clock speeds - they've simplified a stage or two by splitting them up so they take less time to complete.

Now, with that in mind, we have to consider the transistor switching speed of 14nm LPP FinFet vs 32nm SOI. The 14nm FinFet LPP process has a 20% faster transistor speed than the failed 20nm node, which was suppose to be faster than 28nm, which was faster than 32m. Of course, that doesn't directly translate into higher clock speeds, but it means that it COULD be higher clocked, even with the same design, provided a few other factors don't become more important.

For this case, transconductance becomes important to determine signal integrity. Compared to 28nm SHP, 14nm LPP is nearly twice as good in this regard, which is fantastic for power usage, but even better for higher clock speeds... provided the driving circuitry doesn't need higher voltages and the node is consistent enough with each layer.

In theory, Samsung's 14nm FinFet is superior to Intel' 14nm FinFet for saving power, and not terribly inferior for clock speed potential. However, we know that Intel had to rig up their process to handle higher current to drive the transistor to switch fast enough reliably enough. This *should* be a little less of a problem for 14nm LPP due to its lower drive current requirements.

In the end, it all comes down to whether or not AMD planned with this process in mind from the beginning (or a worse one, at least). If so, then they should have little difficulties achieving 3.5~4GHz stock speeds, but I don't know if they will be able to do that with the initial manufacturing runs. Part of me hopes the first few runs gives us a lot of low-clocking samples, then we will have much more stock for the middle-of-the road Zen CPUs on release day. The rest of me hopes they have 4GHz on 8 core CPUs pulling 45W with Haswell+ IPC and >= Intel SMT scaling and charge Intellike prices and gain some much needed high-margin server sales and the rest of just look on jealously.


----------



## Faithh

Quote:


> Haswell core:
> 
> Core1: 4 ALU + 3 AGU
> FPU: 2 256bit FMAC (256bit add + 256bit mul)
> Instruction Decode: 4 Wide
> L1 Cache: 32 KB Instruction + 32 KB Data
> L2 Cache: 256 KB


Is this a joke or?

SIMD is made out of;

- 256 bit FMA FBlend
- 256 bit FMA ADD
- 256 bit FShuffle FBlend
- 256 bit VALU VBlend
- 256 bit VALU VMUL VShift
- 256 bit VALU Vshuffle

http://www.realworldtech.com/haswell-cpu/4/
Quote:


> Bulldozer module:
> 
> Core1: 2 ALU + 2 AGU
> Core2: 2 ALU + 2 AGU
> FPU: 2 128bit FMAC *+ 2 MMX*
> Instruction Decode: 4 Wide
> L1 Cache: 64 KB Instruction per module + 16 KB data per core
> L2 Cache: 2 MB
> 
> Zen core:
> 
> Core1: 4 ALU + 2 AGU
> FPU: 2 128bit FMAC (2 128bit add + 2 128bit mul)
> Instruction Decode: 4 Wide
> L1 Cache: 32 KB data + ?
> L2 Cache: 512 KB


Where are our SIMD integer units? Obv if it has 10 pipelines, the 2nd FMAC should be a MMX unit instead. MMX = SIMD integer unit


----------



## The Stilt

Haswell like IPC, 4GHz and 8 core at 45W is...
Very wishful thinking.

14nm LPP is not better than Intel P1272/3 in any aspect, it is worse in every aspect.
Unless you are designing a ARM SoC with power consumption measured in milliwatts, which I don´t think will be the case with Zen.

i7-6700K is rated for 91W TDP and that´s with 4C/8T at 4GHz.

I´m extremely confident that AMD can have 8C/16T Zen at 4GHz with 45W TDP









40% increase in IPC over Excavator doesn´t add up to anything near Haswell like IPC.


----------



## Solohuman

Interesting thread, I've been getting an 'edukachen' in how modern cpu's work...lol...

Personally, I'm not holding my breath waiting for Zen, in the meantime I"ll just push my FX-6300 &/or A10-7850K a little higher with OC _if needed_ needed for the gaming and 'general' desktop work I do on each rig... lol...


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> Haswell like IPC, 4GHz and 8 core at 45W is...
> Very wishful thinking.
> 
> 14nm LPP is not better than Intel P1272/3 in any aspect, it is worse in every aspect.
> Unless you are designing a ARM SoC with power consumption measured in milliwatts, which I don´t think will be the case with Zen.
> 
> i7-6700K is rated for 91W TDP and that´s with 4C/8T at 4GHz.
> 
> I´m extremely confident that AMD can have 8C/16T Zen at 4GHz with 45W TDP
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 40% increase in IPC over Excavator doesn´t add up to anything near Haswell like IPC.


I was trying to say something outlandish with the 4ghz 8/core @ 45W, give a loon a break









In any event, from what little information I have been able to gather, 14nm LPP is superior to Intel's 14nm process in transconductance (perhaps exclusively this) by a notable margin (some 40~50%). Of course, this is a direct consequence of being a node specifically designed for low power applications. I have never been able to get any information for the curve each follows, though, so it is very possible the characteristics change to favor Intel as voltage and frequency increases.

However, as for IPC, 40% over Excavator is nearly exactly even with Haswell in most cases, is only halfway there in others, and is slightly faster in others.

For example, a 40% increase, normalized, puts Zen with 93% of Haswell's performance for Cinebench R15 (Single Threaded, of course). Due to certain weaknesses in Bulldozer, it gets worse as the Cinebench versions get older. R11.5 is 91% of Haswell, and R10 is a poor 86%. (Yes, I did all this math.. and much, much, more).

However, that says nothing for what it would mean for overall performance. Cinebench is one app, so I went with every single reliable benchmark I could find (only about 15 of them), normalized the threads (4 vs 4, i5 4790k vs FX-4300 with a calculated improvement to represent Zen (taking into account known benchmarks improvement for Steamroller and what could be found for Excavator directly)). The spread was huge. 3dPM, multi-threaded, for example, favored Intel 2:1. But x264 enncoding favored Zen with a healthy 20% margin, and WebXPRT had a 30% advantage for Zen.

Of course, this won't be reality. The performance profile will be quite different, so we can really only allow for the average. If you do a simple average, the different is only about an 8% IPC deficit for Zen, putting it about on par with Ivy Bridge overall. However, when you do a proper average, by removing the outlier (only the 2:1 qualified, as there is are 1.1, 1.2, 1.27 as well which keeps the score for WebXPRT in the mix), it jumps to 96%. Removing the WebXPRT drops it to 94%.

This puts it almost dead on with the Ivy Bridge results on the same tests (the entire spread between Sandy Bridge and Haswell is a mere 13%).

Haswell's IPC, interestingly, is only about 40% faster than Penryn (Core 2 Quad), and Excavator is about equal in IPC with Penryn.

Given that my results are so close to Haswellwith calculated results and just using a mean, and the fact that Bulldozer's performance profile is notoriously inferior is certain respects, I feel it is very much possible that Zen will match Haswell... or come very close to it.

That said, matching Sandy Bridge is good enough. I have a 2600k and I have no compelling upgrade path available to me - nor any need to upgrade (though I'm itching, naturally). I could, however, always use more cores


----------



## The Stilt

Quote:


> Originally Posted by *looncraz*
> 
> I was trying to say something outlandish with the 4ghz 8/core @ 45W, give a loon a break
> 
> 
> 
> 
> 
> 
> 
> 
> 
> In any event, from what little information I have been able to gather, 14nm LPP is superior to Intel's 14nm process in transconductance (perhaps exclusively this) by a notable margin (some 40~50%). Of course, this is a direct consequence of being a node specifically designed for low power applications. I have never been able to get any information for the curve each follows, though, so it is very possible the characteristics change to favor Intel as voltage and frequency increases.
> 
> However, as for IPC, 40% over Excavator is nearly exactly even with Haswell in most cases, is only halfway there in others, and is slightly faster in others.
> 
> For example, a 40% increase, normalized, puts Zen with 93% of Haswell's performance for Cinebench R15 (Single Threaded, of course). Due to certain weaknesses in Bulldozer, it gets worse as the Cinebench versions get older. R11.5 is 91% of Haswell, and R10 is a poor 86%. (Yes, I did all this math.. and much, much, more).
> 
> However, that says nothing for what it would mean for overall performance. Cinebench is one app, so I went with every single reliable benchmark I could find (only about 15 of them), normalized the threads (4 vs 4, i5 4790k vs FX-4300 with a calculated improvement to represent Zen (taking into account known benchmarks improvement for Steamroller and what could be found for Excavator directly)). The spread was huge. 3dPM, multi-threaded, for example, favored Intel 2:1. But x264 enncoding favored Zen with a healthy 20% margin, and WebXPRT had a 30% advantage for Zen.
> 
> Of course, this won't be reality. The performance profile will be quite different, so we can really only allow for the average. If you do a simple average, the different is only about an 8% IPC deficit for Zen, putting it about on par with Ivy Bridge overall. However, when you do a proper average, by removing the outlier (only the 2:1 qualified, as there is are 1.1, 1.2, 1.27 as well which keeps the score for WebXPRT in the mix), it jumps to 96%. Removing the WebXPRT drops it to 94%.
> 
> This puts it almost dead on with the Ivy Bridge results on the same tests (the entire spread between Sandy Bridge and Haswell is a mere 13%).
> 
> Haswell's IPC, interestingly, is only about 40% faster than Penryn (Core 2 Quad), and Excavator is about equal in IPC with Penryn.
> 
> Given that my results are so close to Haswellwith calculated results and just using a mean, and the fact that Bulldozer's performance profile is notoriously inferior is certain respects, I feel it is very much possible that Zen will match Haswell... or come very close to it.
> 
> That said, matching Sandy Bridge is good enough. I have a 2600k and I have no compelling upgrade path available to me - nor any need to upgrade (though I'm itching, naturally). I could, however, always use more cores


In Cinebench R15 Haswell has ~65.4% higher IPC and Skylake around 76.4% higher IPC than Excavator.
In similar FP critical workloads utilizing AVX2 (such as VP9 or X265), the difference will increase even further by 15-25% in favor of Haswell / Skylake.

With the projected increase of 40% Zen would be still, far far behind.

AMD challenging Intel in performance is completely ridiculous and never was an option.
It is all about staying alive.

Zen should be available in similar time frame as Cannonlake.
Cannonlake will once again have a process advantage, more cores (over HW / SL / KL) and no doubt further IPC improvements.


----------



## Wishmaker

I find it difficult to believe that all of a sudden, on a completely new node, INTEL is behind their competitors. I will change my view when I see finalized products versus the revised 14nm Kaby Lake but until then, speculate away.


----------



## spurdomantbh

Quote:


> Originally Posted by *Faithh*
> 
> Is this a joke or?
> 
> SIMD is made out of;
> 
> - 256 bit FMA FBlend
> - 256 bit FMA ADD
> - 256 bit FShuffle FBlend
> - 256 bit VALU VBlend
> - 256 bit VALU VMUL VShift
> - 256 bit VALU Vshuffle
> 
> http://www.realworldtech.com/haswell-cpu/4/
> Where are our SIMD integer units? Obv if it has 10 pipelines, the 2nd FMAC should be a MMX unit instead. MMX = SIMD integer unit


It said FPU, not SIMD though. That being said the thing you were quoting is still incorrect.
As for Zen, the same fp pipelines seem to be used in both fp and mmx operations. Shared pipeline between units perhaps, similar to intel?
Quote:


> Originally Posted by *looncraz*
> 
> I was trying to say something outlandish with the 4ghz 8/core @ 45W, give a loon a break
> 
> 
> 
> 
> 
> 
> 
> 
> 
> In any event, from what little information I have been able to gather, 14nm LPP is superior to Intel's 14nm process in transconductance (perhaps exclusively this) by a notable margin (some 40~50%). Of course, this is a direct consequence of being a node specifically designed for low power applications. I have never been able to get any information for the curve each follows, though, so it is very possible the characteristics change to favor Intel as voltage and frequency increases.
> 
> However, as for IPC, 40% over Excavator is nearly exactly even with Haswell in most cases, is only halfway there in others, and is slightly faster in others.
> 
> For example, a 40% increase, normalized, puts Zen with 93% of Haswell's performance for Cinebench R15 (Single Threaded, of course). Due to certain weaknesses in Bulldozer, it gets worse as the Cinebench versions get older. R11.5 is 91% of Haswell, and R10 is a poor 86%. (Yes, I did all this math.. and much, much, more).
> 
> However, that says nothing for what it would mean for overall performance. Cinebench is one app, so I went with every single reliable benchmark I could find (only about 15 of them), normalized the threads (4 vs 4, i5 4790k vs FX-4300 with a calculated improvement to represent Zen (taking into account known benchmarks improvement for Steamroller and what could be found for Excavator directly)). The spread was huge. 3dPM, multi-threaded, for example, favored Intel 2:1. But x264 enncoding favored Zen with a healthy 20% margin, and WebXPRT had a 30% advantage for Zen.
> 
> Of course, this won't be reality. The performance profile will be quite different, so we can really only allow for the average. If you do a simple average, the different is only about an 8% IPC deficit for Zen, putting it about on par with Ivy Bridge overall. However, when you do a proper average, by removing the outlier (only the 2:1 qualified, as there is are 1.1, 1.2, 1.27 as well which keeps the score for WebXPRT in the mix), it jumps to 96%. Removing the WebXPRT drops it to 94%.
> 
> This puts it almost dead on with the Ivy Bridge results on the same tests (the entire spread between Sandy Bridge and Haswell is a mere 13%).
> 
> Haswell's IPC, interestingly, is only about 40% faster than Penryn (Core 2 Quad), and Excavator is about equal in IPC with Penryn.
> 
> Given that my results are so close to Haswellwith calculated results and just using a mean, and the fact that Bulldozer's performance profile is notoriously inferior is certain respects, I feel it is very much possible that Zen will match Haswell... or come very close to it.
> 
> That said, matching Sandy Bridge is good enough. I have a 2600k and I have no compelling upgrade path available to me - nor any need to upgrade (though I'm itching, naturally). I could, however, always use more cores


Quote:


> Originally Posted by *The Stilt*
> 
> In Cinebench R15 Haswell has ~65.4% higher IPC and Skylake around 76.4% higher IPC than Excavator.
> In similar FP critical workloads utilizing AVX2 (such as VP9 or X265), the difference will increase even further by 15-25% in favor of Haswell / Skylake.
> 
> With the projected increase of 40% Zen would be still, far far behind.
> 
> AMD challenging Intel in performance is completely ridiculous and never was an option.
> It is all about staying alive.
> 
> Zen should be available in similar time frame as Cannonlake.
> Cannonlake will once again have a process advantage, more cores (over HW / SL / KL) and no doubt further IPC improvements.


You guys shouldn't confuse IPC with general performance, an increase in IPC will never translate into the same increase in performance. Look at Haswell, for example, technically it has 30% higher IPC, but that translated into what, 5% extra performance per clock? 10 ~ 15% if you look at SMT? In Zen that 40% IPC increase might translate into a bigger number, simply due to the fact that BD arch had bottlenecks in other places like L2 cache and long pipeline. If they fix those things, it'll not increase IPC, but it will increase performance. The final numbers still remain to be seen though.


----------



## Jimbags

Quote:


> Originally Posted by *PiOfPie*
> 
> Because Sandy and Ivy are good performance yardsticks for the price/perf crowd; if AMD matches Sandy or Ivy's IPC but provides moar coars at a similar or lower price point, they'll have a "good enough" Phenom II-esque CPU on their hands that a lot of people will buy. If it matches Haswell, they're sitting pretty.
> 
> Outside of niche applications (Dolphin and other emulators; applications that make heavy use of AVX2), Haswell and onward don't really provide much of a performance uplift over Sandy and Ivy. In some cases, Skylake's IPC *regresses* slightly against Haswell at the same clocks. Kaby looks to be a Devil's Canyon-esque refresh that may provide slightly higher clocks. Cannonlake is the only thing that represents a threat if Intel actually does increase core counts.


Matching the performance of 3-5year old chips? Just seems a bit weak to me is all give Ivybridge-e a run then boast about it. Which even then is an old chip in terms of tech :-/ Im not anyi-AMD I want intel to have some proper competition so we can have some lower prices!


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> In Cinebench R15 Haswell has ~65.4% higher IPC and Skylake around 76.4% higher IPC than Excavator.
> In similar FP critical workloads utilizing AVX2 (such as VP9 or X265), the difference will increase even further by 15-25% in favor of Haswell / Skylake.
> 
> With the projected increase of 40% Zen would be still, far far behind.
> 
> AMD challenging Intel in performance is completely ridiculous and never was an option.
> It is all about staying alive.
> 
> Zen should be available in similar time frame as Cannonlake.
> Cannonlake will once again have a process advantage, more cores (over HW / SL / KL) and no doubt further IPC improvements.


No, it's that much faster than Piledriver, not Excavator. Excavator is two generations removed from Piledriver.

Using measured performance, Excavator has a 9.85% higher IPC than Steamroller and Steamroller is about 6.7% better than Piledriver. Interestingly, Steamroller and Piledriver perform effectively identically in Cinebench per clock, but differ notably in most every benchmark.

The end result is that Zen should be be about 64% faster than Piledriver in single threaded tasks if the performance difference is truly based upon Excavator. This will of course, be a spread. It could be 75% faster in some and be only 30% in others. We can only be certain that the performance profile will change, which means we could see Zen win big in Cinebench and lose in many other places. With its design, it really seems like it could excel in Cinebench more than the projections, provided it can genuinely keep its pipelines fed (which may be possible with enough FastPath utilization). I feel, though, that AMD is holding out on the full FastPath optimizations for Zen+, so I imagine we will see two or three pipelines empty every cycle, but potentially being filled every other cycle or two. A situation similar to k10...

I think the most interesting aspect of Zen's design's impact on its performance profile is that they went back to k10's integer/FPU performance decoupling, so performance should almost be better based upon a 52% increase in performance over phenom II. That would still give a narrow win for Haswell over Zen in Cinebench, though, and a narrow loss in POVRay against Zen. Which is about as close to even as you get (trading blows).

Remember to account for Intel's Turbo. Multithreaded workloads are a difficult thing to compare because of it, but Intel CPUs are pretty much guaranteed to run at their max turbo clock in single threaded workloads (I have yet to see when that didn't). Under a multi-core load, AMD's current CPUs barely manage to turbo at all, whereas Intel CPUs rarely fall back to their base clocks. All of my numbers are based on benchmarks with turbo disabled or accounted for by assuming single threaded loads on both brands operate at the full turbo clock speed (giving AMD a disadvantage vs the real world) and that multithreaded workloads manage to keep none of the turbo (giving AMD another disadvantage vs the real world).

Zen will challenge Haswell. It WILL lose to Skylake and Cannonlake, no doubt. Zen+ should be a bit better than Skylake, but will lose to Cannonlake. Intel will have out Cannonlake's successor at that point.

Intel will stay ahead, just not 65% ahead like they are now (for desktop CPUs). It will be much more like the phenom II vs Core 2 era, which kept AMD in the running.

AMD will probably make up for the IPC deficit by enabling SMT at a lower price point and having superior platform value. I fully expect Zen+ to be on AM4. Partly because AM4 is designed already with Zen+ in mind, and partly because AMD really can't afford the cost of designing a new socket so soon. That upgrade enticement drew a lot of sales to AMD over Intel when they had a 7% lower IPC (though they could make up for that with clock speeds, which I doubt they will be able to do with Zen).


----------



## looncraz

Quote:


> Originally Posted by *spurdomantbh*
> 
> You guys shouldn't confuse IPC with general performance, an increase in IPC will never translate into the same increase in performance. Look at Haswell, for example, technically it has 30% higher IPC, but that translated into what, 5% extra performance per clock? 10 ~ 15% if you look at SMT? In Zen that 40% IPC increase might translate into a bigger number, simply due to the fact that BD arch had bottlenecks in other places like L2 cache and long pipeline. If they fix those things, it'll not increase IPC, but it will increase performance. The final numbers still remain to be seen though.


IPC is usually directly related to performance when the IPC improvement is a general improvement, rather than a targeted improvement.

It all depends on which instructions benefit from the increase. With Zen, it seems clear that AMD was aiming for a 40% general increase over Excavator. Intel has been targeting only specific workload improvements for numerous generations now. Haswell is between 1% and 20% faster than Ivy Bridge, but it tends towards the lower end of that spectrum. I expect AMD's Zen will tend towards the upper end of the 64% improvement over Piledriver.

Statistically speaking, they tie in modeled numbers. But they are just that - models. And all models are, by definition, inaccurate.


----------



## spurdomantbh

Quote:


> Originally Posted by *looncraz*
> 
> IPC is usually directly related to performance when the IPC improvement is a general improvement, rather than a targeted improvement.
> 
> It all depends on which instructions benefit from the increase. With Zen, it seems clear that AMD was aiming for a 40% general increase over Excavator. Intel has been targeting only specific workload improvements for numerous generations now. Haswell is between 1% and 20% faster than Ivy Bridge, but it tends towards the lower end of that spectrum. I expect AMD's Zen will tend towards the upper end of the 64% improvement over Piledriver.
> 
> Statistically speaking, they tie in modeled numbers. But they are just that - models. And all models are, by definition, inaccurate.


Fair enough. 64% is still a huge number tbh. Though in the end it's all going to come down to clocks anyway. IPC should be fairly close to Haswell/Skylake. I do suspect that AMD will have a more efficient SMT implementation though, at the moment intel ALUs and FPU share the same execution ports, while Zen seems to have them separate. Perhaps they'll be able to get some wins in multi threading.


----------



## looncraz

Quote:


> Originally Posted by *spurdomantbh*
> 
> Fair enough. 64% is still a huge number tbh. Though in the end it's all going to come down to clocks anyway. IPC should be fairly close to Haswell/Skylake. I do suspect that AMD will have a more efficient SMT implementation though, at the moment intel ALUs and FPU share the same execution ports, while Zen seems to have them separate. Perhaps they'll be able to get some wins in multi threading.


Yes, for clock speeds, I don't expect Zen to have parity.


----------



## The Stilt

Quote:


> Originally Posted by *looncraz*
> 
> No, it's that much faster than Piledriver, not Excavator. Excavator is two generations removed from Piledriver.


No, faster than Excavator (Gen. 4)


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> No, faster than Excavator (Gen. 4)


??

Code:



Code:


Piledriver:    100%
Steamroller:   106.7%
Excavator:     117.2%
Zen (est):     164.1%

That is +6.7%, +9.85%, +40%

When you compare with Intel, you must do so while taking turbo into account as well. Intel has only made small improvements, even by their own claims:



Though their claims are a bit much, they are generally accurate:


Here's the chart for AMD, starting with Bulldozer, with only Zen being calculated:


The benchmarks used are identical for both Intel and AMD, otherwise comparison would be a moot point. I tried to exclude benchmarks that benefited from a feature not present on all CPUs, in an effort to focus on core improvements over software improvements (even if as a result of hardware enablement).

I rarely speak lightly


----------



## Jimbags

Quote:


> Originally Posted by *looncraz*
> 
> ??
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> Piledriver:    100%
> Steamroller:   106.7%
> Excavator:     117.2%
> Zen (est):     164.1%
> 
> That is +6.7%, +9.85%, +40%
> 
> When you compare with Intel, you must do so while taking turbo into account as well. Intel has only made small improvements, even by their own claims:
> 
> 
> 
> Though their claims are a bit much, they are generally accurate:
> 
> 
> Here's the chart for AMD, starting with Bulldozer, with only Zen being calculated:
> 
> 
> The benchmarks used are identical for both Intel and AMD, otherwise comparison would be a moot point. I tried to exclude benchmarks that benefited from a feature not present on all CPUs, in an effort to focus on core improvements over software improvements (even if as a result of hardware enablement).
> 
> I rarely speak lightly


All done at stock clocks? Some overclock better than others is all :-D


----------



## looncraz

Quote:


> Originally Posted by *Jimbags*
> 
> All done at stock clocks? Some overclock better than others is all :-D


Pure IPC comparisons, normalized clocks generation over generation.

The CPUs ability to clock is a secondary discussion. I don't expect Zen to have 4GHz base clocks, probably around 3.5, with turbo to 4, and reliable overclocking barely exceeding the turbo. And that's at the high end, with four cores.

Zen could be a decent upgrade for anyone who doesn't overclock, however, and is on a Sandy Bridge or older system. Considering moving from any i7 to any AMD CPU has been a general downgrade since time immemorial, I'd say that is a drastic improvement.

Zen will, by no means, defeat Intel.

I expect them to use SMT to even the playing field, price-wise, with Intel's i5s, and to add a couple more cores to compete i7s, and then have some eight cores that will compete with Intel's six core CPUs.


----------



## Faithh

Quote:


> Originally Posted by *spurdomantbh*
> 
> It said FPU, not SIMD though. That being said the thing you were quoting is still incorrect.
> As for Zen, the same fp pipelines seem to be used in both fp and mmx operations. Shared pipeline between units perhaps, similar to intel?


If you'd read the bulldozer part I marked in bold;
Quote:


> Bulldozer module:
> 
> Core1: 2 ALU + 2 AGU
> Core2: 2 ALU + 2 AGU
> *FPU*: 2 128bit FMAC + *2 MMX*
> Instruction Decode: 4 Wide
> L1 Cache: 64 KB Instruction per module + 16 KB data per core
> L2 Cache: 2 MB


The MMX units are SIMD integer units, not floating point units. Theyre including the SIMD integer units for bulldozer, except they forgot to include the ones from Haswell and Zen doesnt seem to be having any of them which is illogical.

"The other half of the floating point cluster's execution units actually have little to do with floating point data at all. Bulldozer has a pair of largely symmetric 128-bit integer SIMD ALUs (P2 and P3) that execute arithmetic and logical operations."

http://www.realworldtech.com/bulldozer/7/

For the last decade, we've never seen an architecture a single SIMD unit doing both floats & integers afaik. There might be but they'd be really old. If you haven't noticed, Bulldozer is using a symmetric design eg just 2x 128 bit MMX/2x FMAC unlike any other architecture (see my haswell SIMD list) which you would classify BD's SIMD a FlexFPU which failed miserably. Whoever sketched this, got it on the SIMD side completely wrong.

Zen is not supposed to have such a flawed SIMD design


----------



## looncraz

Quote:


> Originally Posted by *Faithh*
> 
> If you'd read the bulldozer part I marked in bold;
> The MMX units are SIMD integer units, not floating point units. Theyre including the SIMD integer units for bulldozer, except they forgot to include the ones from Haswell and Zen doesnt seem to be having any of them which is illogical.
> 
> "The other half of the floating point cluster's execution units actually have little to do with floating point data at all. Bulldozer has a pair of largely symmetric 128-bit integer SIMD ALUs (P2 and P3) that execute arithmetic and logical operations."
> 
> http://www.realworldtech.com/bulldozer/7/
> 
> For the last decade, we've never seen an architecture a single SIMD unit doing both floats & integers afaik. There might be but they'd be really old. If you haven't noticed, Bulldozer is using a symmetric design eg just 2x 128 bit MMX/2x FMAC unlike any other architecture (see my haswell SIMD list) which you would classify BD's SIMD a FlexFPU which failed miserably. Whoever sketched this, got it on the SIMD side completely wrong.
> 
> Zen is not supposed to have such a flawed SIMD design


From the patch:

Integer SIMD (MMX)

Code:



Code:


+;; Currently blocking all decoders for vector path instructions as

+;; they are dispatched separetely as microcode sequence.

+;; Fix me: Need to revisit this.

+(define_reservation "znver1-vector" "znver1-decode0+znver1-decode1+znver1-decode2+znver1-decode3")

Combined with other entries in the gcc znver1.md file it would seem to suggest that integer SIMD instructions are not necessarily handled by a dedicated unit at all, but are instead translated into microcode instructions and assigned to ganged execution units by ganging together ALL of the decoders (which is beyond strange). However, the comment above it suggests this is just place-holder code that works, but isn't necessarily representative of the processor internals... which would be my guess.

For floating point vector instructions:

Code:



Code:


+(define_reservation "znver1-fvector" "znver1-fp0+znver1-fp1

+                                     +znver1-fp2+znver1-fp3

+                                     +znver1-agu0+znver1-agu1")

This, however, is exactly what we've suspected all along - that two of the pipes for the FPU merge together for vector loads. This also suggests that we are talking about a 4x64bit FPU much like that in Excavator. A few other entries suggests that the FPU may have its own load/store capabilities as well as the one or more of the ALUs having such a capability. However, this is almost certainly just a matter of logical access to the LSU from these units. However, each such entry shows units being ganged together unless an AGU is used:

Code:



Code:


+(define_insn_reservation "znver1_fp_mov_direct_load" 5

+                        (and (eq_attr "cpu" "znver1")

+                             (and (eq_attr "znver1_decode" "direct")

+                                  (and (eq_attr "type" "fmov")

+                                       (eq_attr "memory" "load"))))

+                        "znver1-direct,znver1-load,znver1-fp3|znver1-fp1")

+(define_insn_reservation "znver1_fp_mov_direct_store" 5

+                        (and (eq_attr "cpu" "znver1")

+                             (and (eq_attr "znver1_decode" "direct")

+                                  (and (eq_attr "type" "fmov")

+                                       (eq_attr "memory" "store"))))

+                        "znver1-direct,znver1-fp2|znver1-fp3,znver1-store")

Or I could be misreading what this means. My first foray into gcc's internals.


----------



## The Stilt

Did four tests on four different designs.

BD Gen. 2 (Piledriver) - A10-6800K
2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T

BD Gen. 3 (Steamroller) - A10-7870K
2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T

BD Gen. 4 (Excavator) - FX-8800P
2CU / 2T (no CMT penalty), 3.0GHz, 1.3GHz NCLK, 1600MHz DRAM 9-9-9-27-2T

Haswell - i5-4430
2C / 2T (no SMT penalty), 3.0GHz, 3.0GHz Cache / UCCLK, 1600MHz DRAM 9-10-9-24-2T

*C-Ray V1.1 (Raytracer)*
Compiler GCC 5.20 x86-64, CFlags = O3 & static
1600x1200 with 8 rays per pixel (15360000)

PD = 187155ms (82071.0pps) - 100.0%
SR = 184502ms (83251.1pps) - 101.438%
XV = 170368ms (90157.8pps) - 109.853%
HW = 116907ms (131386.5pps) - 160.089%

*Euler3D (CFD)*
Compiler GCC 5.20 x86-64, CFlags = O3 & static
NACA0012.097K air foil

PD = 177.076s (11.2946 IPS) - 100.0%
SR = 151.380s (13.2102 IPS) - 116.960%
XV = 135.674s (14.7412 IPS) - 130.515%
HW = 94.521s (21.1593 IPS) - 187.340%

*X265 (Encoder)*
Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
Version 1.7+512

PD = 225.25s (1.57 fps) - 100.0%
SR = 213.97s (1.65 fps) - 105.1%
XV = 204.81s (1.72 fps) - 109.554%
HW = 117.12s (3.01 fps) - 191.720%

*Cinebench R15*

PD = 71pts - 100.0%
SR = 72pts - 101.408%
XV = 75pts - 105.634%
HW = 119pts - 167.606%

These are all naturally single threaded results.
All of the systems had additional core enabled in order to offload the operating system overhead.

EDIT: Fixed the messed up results


----------



## SpeedyVT

Quote:


> Originally Posted by *The Stilt*
> 
> Did four tests on four different designs.
> 
> BD Gen. 2 (Piledriver) - A10-6800K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 3 (Steamroller) - A10-7870K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 4 (Excavator) - FX-8800P
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.3GHz NCLK, 1600MHz DRAM 9-9-9-27-2T
> 
> Haswell - i5-4430
> 2C / 2T (no SMT penalty), 3.0GHz, 3.0GHz Cache / UCCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> *C-Ray V1.1 (Raytracer)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> 1600x1200 with 8 rays per pixel (15360000)
> 
> PD = 187155ms (82071.0pps) - 100.0%
> SR = 184502ms (83251.1pps) - 101.438%
> XV = 151462ms (101411.6pps) - 123.566%
> HW = 116907ms (131386.5pps) - 160.089%
> 
> *Euler3D (CFD)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> NACA0012.097K air foil
> 
> PD = 177.076s (11.2946 IPS) - 100.0%
> SR = 151.380s (13.2102 IPS) - 116.960%
> XV = 122.726s (16.2965 IPS) - 144.286%
> HW = 94.521s (21.1593 IPS) - 187.340%
> 
> *X265 (Encoder)*
> Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
> Version 1.7+512
> 
> PD = 225.25s (1.57 fps) - 100.0%
> SR = 213.97s (1.65 fps) - 105.1%
> XV = 187.47s (1.88 fps) - 119.745%
> HW = 117.12s (3.01 fps) - 191.720%
> 
> *Cinebench R15*
> 
> PD = 71pts - 100.0%
> SR = 72pts - 101.408%
> XV = 75pts - 105.634%
> HW = 119pts - 167.606%
> 
> These are all naturally single threaded results.
> All of the systems had additional core enabled in order to offload the operating system overhead.


That raises questions FX-8800P is locked to TDP no? Although it totally slaughters the previous APUs.


----------



## The Stilt

Quote:


> Originally Posted by *SpeedyVT*
> 
> That raises questions FX-8800P is locked to TDP no? Although it totally slaughters the previous APUs.


I think the 75W TDP limit I used for the FX-8800P covers the two active units, with only one core under the load









Sufficient L1D and lower latency L2 cache works pretty well.


----------



## spurdomantbh

Quote:


> Originally Posted by *Faithh*
> 
> If you'd read the bulldozer part I marked in bold;
> The MMX units are SIMD integer units, not floating point units. Theyre including the SIMD integer units for bulldozer, except they forgot to include the ones from Haswell and Zen doesnt seem to be having any of them which is illogical.


Ahh, my bad, missed the MMX units in bulldozer part.
Quote:


> Originally Posted by *Faithh*
> 
> Zen is not supposed to have such a flawed SIMD design


Hopefully
Quote:


> Originally Posted by *looncraz*
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> +;; Fix me: Need to revisit this.
> 
> However, the comment above it suggests this is just place-holder code that works, but isn't necessarily representative of the processor internals... which would be my guess.


Indeed, seems like it's too early to say how that will work.
Quote:


> Originally Posted by *looncraz*
> 
> This also suggests that we are talking about a 4x64bit FPU much like that in Excavator.


I believe it just suggests that all pipelines are used? Unless I'm missing something?
Quote:


> Originally Posted by *looncraz*
> 
> A few other entries suggests that the FPU may have its own load/store capabilities as well as the one or more of the ALUs having such a capability. However, this is almost certainly just a matter of logical access to the LSU from these units. However, each such entry shows units being ganged together unless an AGU is used:
> 
> Or I could be misreading what this means. My first foray into gcc's internals.


It's still using the 2 AGUs. "znver1-load" and "znver1-store" definitions:

Code:



Code:


+;; 2 AGU pipes.

+(define_cpu_unit "znver1-agu0" "znver1_agu")

+(define_cpu_unit "znver1-agu1" "znver1_agu")

+(define_reservation "znver1-agu-reserve" "znver1-agu0|znver1-agu1")

+

+(define_reservation "znver1-load" "znver1-agu-reserve")

+(define_reservation "znver1-store" "znver1-agu-reserve")


----------



## Olivon

For those interested, Hardware.fr published a 4GHz comparison on SB/IB/HW/BW/SL and multiple apps here :



http://www.hardware.fr/articles/940-6/cpu-sandy-bridge-vs-ivy-bridge-vs-haswell-vs-skylake-4-g.html


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> Did four tests on four different designs.
> 
> BD Gen. 2 (Piledriver) - A10-6800K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 3 (Steamroller) - A10-7870K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 4 (Excavator) - FX-8800P
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.3GHz NCLK, 1600MHz DRAM 9-9-9-27-2T
> 
> Haswell - i5-4430
> 2C / 2T (no SMT penalty), 3.0GHz, 3.0GHz Cache / UCCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> *C-Ray V1.1 (Raytracer)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> 1600x1200 with 8 rays per pixel (15360000)
> 
> PD = 187155ms (82071.0pps) - 100.0%
> SR = 184502ms (83251.1pps) - 101.438%
> XV = 151462ms (101411.6pps) - 123.566%
> HW = 116907ms (131386.5pps) - 160.089%
> 
> *Euler3D (CFD)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> NACA0012.097K air foil
> 
> PD = 177.076s (11.2946 IPS) - 100.0%
> SR = 151.380s (13.2102 IPS) - 116.960%
> XV = 122.726s (16.2965 IPS) - 144.286%
> HW = 94.521s (21.1593 IPS) - 187.340%
> 
> *X265 (Encoder)*
> Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
> Version 1.7+512
> 
> PD = 225.25s (1.57 fps) - 100.0%
> SR = 213.97s (1.65 fps) - 105.1%
> XV = 187.47s (1.88 fps) - 119.745%
> HW = 117.12s (3.01 fps) - 191.720%
> 
> *Cinebench R15*
> 
> PD = 71pts - 100.0%
> SR = 72pts - 101.408%
> XV = 75pts - 105.634%
> HW = 119pts - 167.606%
> 
> These are all naturally single threaded results.
> All of the systems had additional core enabled in order to offload the operating system overhead.


That's 6.22% for Steamroller vs my 6.7% (because you used fewer benchmarks).
You are showing a 15.79% improvement for Excavator, far more than my average 9.85%. For the same reason, of course.

So, you're numbers would be:

PileDriver: 100%
Steamroller: 106%
Excavator: 123%
Zen: 172%
Haswell: 176%

Which is higher than what I stated for Zen









Averages are beautiful, are they not? And once you expand to more tests, your numbers should converge with mine. And we have no reason to expect a uniform improvement from Zen. I expect its Cinebench performance to increase notably more than its Euler3D numbers, for example.

In any event, even your numbers put Zen within scratching distance of Haswell, mine put it only a little closer (because Haswell doesn't perform as well in other benchmarks, such as ST 3DPM, ST WebXPRT, an ST Google Octane v2..

I would like to know how you set a high TDP for the FX-8800p... and why you have all this hardware available to you ;-). My numbers are mostly based on numbers I can find on the internet (and they sometimes even disagree, which is annoying). I had an FX-8350, and I've built systems using Haswell, and I got my numbers for those directly (left one core enabled, disabled turbo, and set to 4GHz, and went to town on a 4690k







). In fact, I will be building another system with a 4690k in the next week or two (waiting for the rest of the money to show...), so I'll run some more tests and compare them against the Phenom II 955 my wife's computer still uses and my 2600k, all at 3GHz. More data, I love me some more data!


----------



## spurdomantbh

Quote:


> Originally Posted by *Olivon*
> 
> For those interested, Hardware.fr published a 4GHz comparison on SB/IB/HW/BW/SL and multiple apps here :
> 
> http://www.hardware.fr/articles/940-6/cpu-sandy-bridge-vs-ivy-bridge-vs-haswell-vs-skylake-4-g.html


wow look at that Broadwell performance. Seems like Skylake is actually worse than the previous generation in general purpose applications. What's up with that? Bigger focus on HPC than GP perheps?


----------



## Dom-inator

Quote:


> Originally Posted by *Olivon*
> 
> For those interested, Hardware.fr published a 4GHz comparison on SB/IB/HW/BW/SL and multiple apps here :
> 
> 
> 
> http://www.hardware.fr/articles/940-6/cpu-sandy-bridge-vs-ivy-bridge-vs-haswell-vs-skylake-4-g.html


Broadwell does so well in CPU intensive games like Arma. Some eDRAM / L4 cache please AMD?


----------



## looncraz

Quote:


> Originally Posted by *spurdomantbh*
> 
> I believe it just suggests that all pipelines are used? Unless I'm missing something?


It uses the or operator, which is ganging the units, I believe (during the simulation which determines how to order relevant instructions).

_znver1-fp2_*|*_znver1-fp3_

Unless gcc means something else by this entirely, which it could. If it were an either/or situation, it seems you'd just use commas.

It is important to keep in mind, of course, that this is just to give gcc an idea of which instructions cost less than others. The instructions are fetched by the CPU, the code doesn't actually address the individual units beyond which registers to use, which the CPU can reinterpret however it pleases.

The patch could just as easily be saying: "We can only handle two SIMD instructions at once, so we should tell gcc that we're ganging these pipes so it knows to try and use more optimal paths if available."

Of course, you want to be somewhat accurate as you don't want gcc thinking that an execution unit is still busy when it is actually idle, so it throws out some otherwise less-than-ideal instructions to use one of the other idle pipes when it could use a better instruction and use more available pipes at that time.

It does seem, though, that Zen has been targeted at executing legacy code much better than Bulldozer (which was notoriously a HUGE step in the opposite direction).


----------



## epic1337

Quote:


> Originally Posted by *looncraz*
> 
> That's 6.22% for Steamroller vs my 6.7% (because you used fewer benchmarks).
> You are showing a 15.79% improvement for Excavator, far more than my average 9.85%. For the same reason, of course.
> 
> So, you're numbers would be:
> 
> PileDriver: 100%
> Steamroller: 106%
> Excavator: 123%
> Zen: 172%
> Haswell: 176%
> 
> Which is higher than what I stated for Zen
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Averages are beautiful, are they not? And once you expand to more tests, your numbers should converge with mine. And we have no reason to expect a uniform improvement from Zen. I expect its Cinebench performance to increase notably more than its Euler3D numbers, for example.
> 
> In any event, even your numbers put Zen within scratching distance of Haswell, mine put it only a little closer (because Haswell doesn't perform as well in other benchmarks, such as ST 3DPM, ST WebXPRT, an ST Google Octane v2..


thats if the overall IPC improvement AMD had mentioned is anywhere accurate, but it should still be possible that the minimum is at Ivy Bridge IPC level or slightly above it.
much like Skylake's IPC improvement "rumors" pointed anywhere from 10% to 30% increase over Haswell.
Quote:


> Originally Posted by *spurdomantbh*
> 
> wow look at that Broadwell performance. Seems like Skylake is actually worse than the previous generation in general purpose applications. What's up with that? Bigger focus on HPC than GP perheps?


it wildly varies, broadwell has a highly refined cache design to aid both their CPU and IGP.
and according to reviews about skylake, skylake's cache design is somewhat less effective than broadwell, although its a tad faster than haswell's.
on a side note, skylake's IMC has some latency penalty due to supporting both DDR3 and DDR4, and DDR4 in general is slightly slower than DDR3 in terms of latency.

as for how cache can affect performance, probably it makes the front-end more efficient in loading up all the cores by stalling less for data access.
much like how broadwell is notably faster than haswell, even though its the same architecture design, just die-shrunk and given a massive cache.

so in all likelihood, comparing skylake to broadwell isn't a good approach as to see whether theres any architectural improvement to skylake.
if skylake were to be given a massive cache as well, then that'd be the best comparison to broadwell, otherwise its much fairer to compare it to haswell instead.


----------



## ebduncan

I don't know why people spend so much effort trying to guess what a Cpu's performance will be based on such vague information.

1. Zen is not Haswell
2. Zen is not Bulldozer
3. So stop using data for either of these iterations of cpus to estimate the performance of Zen.

Overall there are entirely to many variables to accurately calculate the performance of Zen. AMD isn't going to compete with Haswell, but instead Skylake. The biggest concerns to me come in the form of chipset and motherboard solutions as we have zero information on that, other than AM4.

All I know is it's getting pretty close to upgrade time for me. Skylake-E will probably be at the heart of my next rig, but I hope AMD hits it big with Zen and I will gladly buy one of those.


----------



## spurdomantbh

Quote:


> Originally Posted by *looncraz*
> 
> It uses the or operator, which is ganging the units, I believe (during the simulation which determines how to order relevant instructions).
> 
> _znver1-fp2_*|*_znver1-fp3_
> 
> Unless gcc means something else by this entirely, which it could. If it were an either/or situation, it seems you'd just use commas.


Done some googling around, found this from GCC GNU website:
Quote:


> •',' is used for describing the start of the next cycle in the reservation.
> •'|' is used for describing a reservation described by the first regular expression or a reservation described by the second regular expression or etc.
> •'+' is used for describing a reservation described by the first regular expression and a reservation described by the second regular expression and etc.
> •'*' is used for convenience and simply means a sequence in which the regular expression are repeated number times with cycle advancing (see ',').


I believe these apply to the GCC patch as well
Quote:


> Originally Posted by *ebduncan*
> 
> I don't know why people spend so much effort trying to guess what a Cpu's performance will be based on such vague information.
> 
> 1. Zen is not Haswell
> 2. Zen is not Bulldozer
> 3. So stop using data for either of these iterations of cpus to estimate the performance of Zen.


It's fun to speculate







Nobody is claiming exact performance figures, everything posted itt is just fun speculation.


----------



## EniGma1987

Quote:


> Originally Posted by *looncraz*
> 
> Here's the chart for AMD, starting with Bulldozer, with only Zen being calculated:
> 
> 
> The benchmarks used are identical for both Intel and AMD, otherwise comparison would be a moot point. I tried to exclude benchmarks that benefited from a feature not present on all CPUs, in an effort to focus on core improvements over software improvements (even if as a result of hardware enablement).
> 
> I rarely speak lightly


You should add an Ahtlon 64 X2 and a Phenom II into the AMD chart just for kicks.

Quote:


> Originally Posted by *Faithh*
> 
> The MMX units are SIMD integer units, not floating point units. Theyre including the SIMD integer units for bulldozer, except they forgot to include the ones from Haswell and Zen doesnt seem to be having any of them which is illogical.


I seem to remember reading quite a while ago that Zen was removing certain hardware blocks like MMX because it is used so little and there are other things that are more moderns and offer more performance. This is done to save die space and increase performance. IDK if it really is true that MMX is not used much at all anymore, but that's what the reasoning was.


----------



## The Stilt

Quote:


> Originally Posted by *The Stilt*
> 
> Did four tests on four different designs.
> 
> BD Gen. 2 (Piledriver) - A10-6800K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 3 (Steamroller) - A10-7870K
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.8GHz NCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> BD Gen. 4 (Excavator) - FX-8800P
> 2CU / 2T (no CMT penalty), 3.0GHz, 1.3GHz NCLK, 1600MHz DRAM 9-9-9-27-2T
> 
> Haswell - i5-4430
> 2C / 2T (no SMT penalty), 3.0GHz, 3.0GHz Cache / UCCLK, 1600MHz DRAM 9-10-9-24-2T
> 
> *C-Ray V1.1 (Raytracer)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> 1600x1200 with 8 rays per pixel (15360000)
> 
> PD = 187155ms (82071.0pps) - 100.0%
> SR = 184502ms (83251.1pps) - 101.438%
> XV = 151462ms (101411.6pps) - 123.566%
> HW = 116907ms (131386.5pps) - 160.089%
> 
> *Euler3D (CFD)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> NACA0012.097K air foil
> 
> PD = 177.076s (11.2946 IPS) - 100.0%
> SR = 151.380s (13.2102 IPS) - 116.960%
> XV = 122.726s (16.2965 IPS) - 144.286%
> HW = 94.521s (21.1593 IPS) - 187.340%
> 
> *X265 (Encoder)*
> Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
> Version 1.7+512
> 
> PD = 225.25s (1.57 fps) - 100.0%
> SR = 213.97s (1.65 fps) - 105.1%
> XV = 187.47s (1.88 fps) - 119.745%
> HW = 117.12s (3.01 fps) - 191.720%
> 
> *Cinebench R15*
> 
> PD = 71pts - 100.0%
> SR = 72pts - 101.408%
> XV = 75pts - 105.634%
> HW = 119pts - 167.606%
> 
> These are all naturally single threaded results.
> All of the systems had additional core enabled in order to offload the operating system overhead.


I feel like an idiot, but this must be corrected








I made a rookie mistake with Carrizo while reconfiguring the power management for the tests.

Carrizo requires some re-configuring in order to be able to run at static frequency.
While the power limits were configured correctly and all the power management features were fine (i.e. no throttling possible), I managed to screw up the frequency itself









The frequency was configured correctly to 3.0GHz, however since I had disabled the power management in order to produce results at static frequency the new frequency was never actuated. When the power management is enabled manual actuation of the CorePLL is not required, since the power management will be changing the PLL FID several times per second. If the power management is disabled the new frequency (which in this case was changed from 3.4GHz to 3.0GHz) must be refreshed by hand. Otherwise the actual frequency won´t change. For the first three results I forgot to do this procedure, however for Cinebench R15 it was done because the system was rebooted prior running it









Quote from BKDG: *"The PstateId field must be updated to cause a new CpuFid value to take effect."*

*C-Ray V1.1 (Raytracer)*
Compiler GCC 5.20 x86-64, CFlags = O3 & static
1600x1200 with 8 rays per pixel (15360000)

PD = 187155ms (82071.0pps) - 100.0%
SR = 184502ms (83251.1pps) - 101.438%
XV = 170368ms (90157.8pps) - 109.853%
HW = 116907ms (131386.5pps) - 160.089%

*Euler3D (CFD)*
Compiler GCC 5.20 x86-64, CFlags = O3 & static
NACA0012.097K air foil

PD = 177.076s (11.2946 IPS) - 100.0%
SR = 151.380s (13.2102 IPS) - 116.960%
XV = 135.674s (14.7412 IPS) - 130.515%
HW = 94.521s (21.1593 IPS) - 187.340%

*X265 (Encoder)*
Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
Version 1.7+512

PD = 225.25s (1.57 fps) - 100.0%
SR = 213.97s (1.65 fps) - 105.1%
XV = 204.81s (1.72 fps) - 109.554%
HW = 117.12s (3.01 fps) - 191.720%

*Cinebench R15*

PD = 71pts - 100.0%
SR = 72pts - 101.408%
XV = 75pts - 105.634%
HW = 119pts - 167.606%

If you quoted the original results, please correct them in your posts.

Sorry guys


----------



## epic1337

that puts Zen down to between sandy and ivy performance level, much closer to sandy, thats quite a huge drop.


----------



## 2010rig

Quote:


> Originally Posted by *The Stilt*
> 
> If you quoted the original results, please correct them in your posts.
> 
> Sorry guys


I had a feeling those were too good to be true, thanks for clarifying!

AMD has a lot of ground to cover.


----------



## Robenger

Quote:


> Originally Posted by *2010rig*
> 
> I had a feeling those were too good to be true, thanks for clarifying!
> 
> AMD has a lot of ground to cover.


What a surprise to find you here.


----------



## 2010rig

Quote:


> Originally Posted by *Robenger*
> 
> What a surprise to find you here.


Ditto


----------



## Disharmonic

Quote:


> Originally Posted by *Dom-inator*
> 
> Broadwell does so well in CPU intensive games like Arma. Some eDRAM / L4 cache please AMD?


Zen APUs with HBM2 should be appearing sometime in 2017.


----------



## epic1337

Quote:


> Originally Posted by *Disharmonic*
> 
> Zen APUs with HBM2 should be appearing sometime in 2017.


so long as its 512MB minimum on-die cache, or ideally 1GB.

with that large of a cache, it'd be sufficient for typical workloads that're fit for IGPs.
considering the IGP isn't sufficient for high-resolutions, its just extra costs to stuff it with anything larger.
it'd be a miracle for an IGP to manage 60fps on 1080P Ultra x8 AA settings.


----------



## The Stilt

Once the memory bandwidth limitations have been solved, the iGPU performance will ramp up rather quickly.
Currently there is absolutely no point in increasing the performance on the iGPU, since at least 25% of the current performance is already wasted by the bandwidth restriction. No conventional DRAM or compression technology can overcome the issue.

Until HBM2 is available to APU / SoCs it is game over for them.

I personally think that it is too early for HBM2 to appear in Raven Ridge (Zen based APU). The technology will be available at the time, however if it is available at reasonable cost is a whole another question. My guess is that it is not.


----------



## Faithh

Quote:


> Originally Posted by *EniGma1987*
> 
> I seem to remember reading quite a while ago that Zen was removing certain hardware blocks like MMX because it is used so little and there are other things that are more moderns and offer more performance. This is done to save die space and increase performance. IDK if it really is true that MMX is not used much at all anymore, but that's what the reasoning was.


They won't. The MMX unit is still there, didnt notice it was Zen. I mean why would you make your own "AMD"- like Zen pictures? Kinda feels like it's official, thats why I actually havent noticed in the first place because I thought it was Bulldozer orsomething.



The MMX units are used for integer calculations stuff like AES/SHA etc and much more than that, considering Zen will be a server platform it would be stupid dropping it. Space wont be really an issue especially with the massive IPC boost & 14/16nm (whatever it is) they won't need to use that many cores to compete with Intel's mainstream line, it's not that AMD needs to drop 20 cores just to get on par with a i5's MT performance. Not sure if you noticed but Intel cores are really big, I wouldn't be surprised a Skylake core @ 32nm being nearly bigger than a BD module.


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> I feel like an idiot, but this must be corrected
> 
> 
> 
> 
> 
> 
> 
> 
> I made a rookie mistake with Carrizo while reconfiguring the power management for the tests.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Carrizo requires some re-configuring in order to be able to run at static frequency.
> While the power limits were configured correctly and all the power management features were fine (i.e. no throttling possible), I managed to screw up the frequency itself
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The frequency was configured correctly to 3.0GHz, however since I had disabled the power management in order to produce results at static frequency the new frequency was never actuated. When the power management is enabled manual actuation of the CorePLL is not required, since the power management will be changing the PLL FID several times per second. If the power management is disabled the new frequency (which in this case was changed from 3.4GHz to 3.0GHz) must be refreshed by hand. Otherwise the actual frequency won´t change. For the first three results I forgot to do this procedure, however for Cinebench R15 it was done because the system was rebooted prior running it
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Quote from BKDG: *"The PstateId field must be updated to cause a new CpuFid value to take effect."*
> 
> *C-Ray V1.1 (Raytracer)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> 1600x1200 with 8 rays per pixel (15360000)
> 
> PD = 187155ms (82071.0pps) - 100.0%
> SR = 184502ms (83251.1pps) - 101.438%
> XV = 170368ms (90157.8pps) - 109.853%
> HW = 116907ms (131386.5pps) - 160.089%
> 
> *Euler3D (CFD)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> NACA0012.097K air foil
> 
> PD = 177.076s (11.2946 IPS) - 100.0%
> SR = 151.380s (13.2102 IPS) - 116.960%
> XV = 135.674s (14.7412 IPS) - 130.515%
> HW = 94.521s (21.1593 IPS) - 187.340%
> 
> *X265 (Encoder)*
> Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
> Version 1.7+512
> 
> PD = 225.25s (1.57 fps) - 100.0%
> SR = 213.97s (1.65 fps) - 105.1%
> XV = 204.81s (1.72 fps) - 109.554%
> HW = 117.12s (3.01 fps) - 191.720%
> 
> *Cinebench R15*
> 
> PD = 71pts - 100.0%
> SR = 72pts - 101.408%
> XV = 75pts - 105.634%
> HW = 119pts - 167.606%
> 
> 
> If you quoted the original results, please correct them in your posts.
> 
> Sorry guys


You're still only using four benchmarks, and you're much more in line with my results than you were now.

My estimate it just at 64% improvement, while your older figures had it at 72%, your current ones put it at 59.44%.

Your real problem is that your particular selection of benchmarks heavily favors Haswell's improvements. You need to run some more benchmarks









My collection is spread around in too many spreadsheets to summarize right now (bed time), but I'll make a point to run a full suite of tests using the 4690k that should be coming in the next couple of weeks (hate building computers piecemeal for people, but when they don't have all the money... what's a loon to do?).

I will say, though, my Haswell score for CB15, equalized to 3GHz, is 123 vs your 119. My x265 and POVRay numbers don't seem to be compatible with yours at all.

Unadjust 4.4GHz Haswell

POVRay:1834
x265: 1.75

An i7 2600k (my current CPU) will do
POVRay: 1532
x265: 1.49

That said, I do have a spreadsheet with a range for Zen, and it bottoms out just above Sandy Bridge, and just breaks even with Haswell as the top, somewhere in the middle is quite likely.


----------



## The Stilt

Quote:


> Originally Posted by *looncraz*
> 
> You're still only using four benchmarks, and you're much more in line with my results than you were now.
> 
> My estimate it just at 64% improvement, while your older figures had it at 72%, your current ones put it at 59.44%.
> 
> Your real problem is that your particular selection of benchmarks heavily favors Haswell's improvements. You need to run some more benchmarks
> 
> 
> 
> 
> 
> 
> 
> 
> 
> My collection is spread around in too many spreadsheets to summarize right now (bed time), but I'll make a point to run a full suite of tests using the 4690k that should be coming in the next couple of weeks (hate building computers piecemeal for people, but when they don't have all the money... what's a loon to do?).
> 
> I will say, though, my Haswell score for CB15, equalized to 3GHz, is 123 vs your 119. My x265 and POVRay numbers don't seem to be compatible with yours at all.
> 
> Unadjust 4.4GHz Haswell
> 
> POVRay:1834
> x265: 1.75
> 
> An i7 2600k (my current CPU) will do
> POVRay: 1532
> x265: 1.49
> 
> That said, I do have a spreadsheet with a range for Zen, and it bottoms out just above Sandy Bridge, and just breaks even with Haswell as the top, somewhere in the middle is quite likely.


By saying "your particular selection of benchmarks heavily favors Haswell's improvements" you of course mean that these benchmarks are floating point heavy instead of being integer heavy.

15h CPUs never had real issues with integer performance since they had sufficient resources built into them, which never was the case with FP.
It would be silly to use integer heavy workloads to predict the performance of Zen, since the improvement in integer performance won´t decide the fate of AMD. The floating point performance of Zen will. 15h CPUs are behind the competition in integer performance too, but the difference isn´t even remotely as massive as in floating point.


----------



## spurdomantbh

Quote:


> Originally Posted by *Faithh*
> 
> They won't. The MMX unit is still there, didnt notice it was Zen. I mean why would you make your own "AMD"- like Zen pictures? Kinda feels like it's official, thats why I actually havent noticed in the first place because I thought it was Bulldozer orsomething.
> 
> 
> 
> The MMX units are used for integer calculations stuff like AES/SHA etc and much more than that, considering Zen will be a server platform it would be stupid dropping it.


That image is made by wccf. It's incorrect, it shows only 4 integer pipelines and 256bit FMACs, which is wrong. MMX functions use the same pipelines as the fp functions. It's hard to say whether there is no MMX unit or whether it's sharing pipelines.


----------



## LuckyStarV

This is the supposed Zen, came out before actual reveal but showing the six integer pipelines lends it some credibility










It may well be labeled as 256bit FMAc if AMD is planning on letting the 128bit units merge or allowing a 256bit unit to be split. Males sense so they can get better legacy FPU performance









This early slide even has the right L2 numbers and move to inclusive cache


----------



## spurdomantbh

Quote:


> Originally Posted by *LuckyStarV*
> 
> This is the supposed Zen, came out before actual reveal but showing the six integer pipelines lends it some credibility
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> It may well be labeled as 256bit FMAc if AMD is planning on letting the 128bit units merge or allowing a 256bit unit to be split. Males sense so they can get better legacy FPU performance


That image was proven fake. It's not 2x256bit FMAC, because FMAC requires both FADD and FMUL to work. So since it's 2x128bit FADD and 2x128bit FMUL, that results in 2x128bit FMAC, which can fuse into 1x256bit FMAC.


----------



## Seronx

Quote:


> Originally Posted by *spurdomantbh*
> 
> That image was proven fake. It's not 2x256bit FMAC, because FMAC requires both FADD and FMUL to work. So since it's 2x128bit FADD and 2x128bit FMUL, that results in 2x128bit FMAC, which can fuse into 1x256bit FMAC.


Well actually the design states it can only execute one 128-bit FMAC per cycle.

128-bit FADD/FMUL/FMAC{iADD/iMUL} + 128-bit FADD/FMUL/FMAC{iADD/SHUF} + 128-bit FADD/SHUF + 128-bit FADD/FMAC{iADD} if you check out the execution allocation.

FP0 needs FP3 to execute FMA and FP1 needs FP3 to execute FMA. The integer/memory side is lackluster, and the FPU side is a clusterhump.

BD/PD: 1 x 128-bit FMAC/iMAC + 1 x 128-bit FMAC/XBAR + 128-bit iADD + 128-bit iADD
SR/XV: 1 x 128-bit FMAC/iADD/iMAC + 1 x 128-bit FMAC/XBAR + 128-bit iADD/SHUF
p.s. Packed Integer MAC = XOP , XBAR = Super Shuf

Zen FPU = {Mind-blown} // BD/SR FPU = {Oh that is pretty simple.}


----------



## spurdomantbh

Quote:


> Originally Posted by *Seronx*
> 
> FP0 needs FP3 to execute FMA and FP1 needs FP3 to execute FMA. The integer/memory side is lackluster, and the FPU side is a clusterhump.


Indeed, I noticed it, but as I said previously in this thread, I think it might be a typo, since FP2 is fully used in other operations. I don't see how crippling that one unit for that one instruction makes sense in any way. Who knows though, just seems too weird of a design choice IMO.


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> By saying "your particular selection of benchmarks heavily favors Haswell's improvements" you of course mean that these benchmarks are floating point heavy instead of being integer heavy.
> 
> 15h CPUs never had real issues with integer performance since they had sufficient resources built into them, which never was the case with FP.
> It would be silly to use integer heavy workloads to predict the performance of Zen, since the improvement in integer performance won´t decide the fate of AMD. The floating point performance of Zen will. 15h CPUs are behind the competition in integer performance too, but the difference isn´t even remotely as massive as in floating point.


By which I mean your ENTIRE selection of benchmarks is known to heavily favor one genre of instructions over the entire spectrum of instructions. And, in fact, favor the instructions not most heavily used in the majority of applications. i.e. it is not a representative sample. You need more to add to this.

WebXPRT, 3dPM, Google Octane V2, and many other real-world benchmarks don't show anywhere near the improvements on the Intel side (intel over intel).

Benefits over Sandy Bridge for Haswell:

Single Threaded (using forced affinity, or single threaded benchmarks):
CB-10:ST: 19%
CB-15 ST: 16%
CB-11.5 S:14%
3dPM ST: 5%
7-zip: 0%*
WebXPRT: -1.5%*
Octane V2: -3%*

Multi-threaded (i7 2600k @ 4.4 vs i7 4790k @ 4.4):
HB-4K: 29%
Agisoft: 28%
HB-LQ: 19%
x265: 17%
x264-P2: 16%
CB-R10 MT: 16%
CB-15 MT: 15%
CB-R11.5 MT: 12%
x264: P1: 9%
3dPM: 6%
7-Zip: 4%*
WebXPT: 3%*
Octane V2: -5%*

* I'll be re-running these benchmarks soon to verify, most of these are just collected from the web

In the end, that is a 13% improvement over Sand Bridge, which is also in-line with Intel claims and most/all reviews.

Notice that your benchmark choices are entirely at the top end of the spectrum? You're showing Haswell in the best possible light, and also focusing on the one area that will show 15h CPUs in their worst light. Using just the above benchmarks (and POV-Ray, whose results seem to be in yet another spreadsheet...) Puts Zen in the position of matching Haswell.

I have another collection of tests which compares Intel CPUs from Penryn to Skylake. The benchmarks aren't fully uniform, but they are as close as I could manage with the time spread. The full spread of benchmarks (as many as 20 comparing adjacent iterations) shows the following:

Penryn: 100%
Nehalem: 109%
Sandy: 118.8%
Ivy: 125.94%
Haswell: 137.27%
Skylake: 149.63%

Intel's claims are very little different. In fact, Intel's claimed performance increase for Haswell over Penryn is just 39.39%, so my results are beautifully inline with Intel's.

Excavator is effectively tied with Penryn, though it undoubtedly excels in some areas I am unable to do a direct comparison, so I assume them to be equal. This method, too, shows Zen and Haswell almost exactly even.

Again, of course, provided Zen actually gives us a 40% boost.

A 38% boost changes the story, as does a 42% boost. But that's just because Intel has almost exclusively focused on improvement that help sell certain classes of CPU.

Skylake is out of reach for Zen, but Zen+ should match or exceed it. At which point in time Intel will already have Kaby or Cannon out, and will maintain a 10~15% or greater IPC lead.

Zen gets AMD closer, but it does not deliver them an IPC win.

I expect AMD to make up for that deficit using SMT and mroe cores at lower price points. If they can't reach 4.5GHz, having Haswell IPC would be a moot point... since Haswell CAN reach 4.5GHz.


----------



## st0necold

Godbless you guys.

Still wish I went to school.


----------



## guttheslayer

I always thought the gap between Nehalem and SB should be quite big. A possible 25% improvement in IPC. On the other hand, Skylake seem to be too much improvement over Haswell as I recall its only 4% at best IPC improvement.

If the above comparison is true between different generation of Intel CPUs, den its sad to see AMD being at least 5 generations behind.

Lets just hope DX12 can give AMD more edge over for Zen in gaming (due to higher amt of cores). So even if their IPC only matches Haswell, the more cores will leverage the disadvantage on lower IPC against KB lake.


----------



## variant

Quote:


> Originally Posted by *looncraz*
> 
> Again, of course, provided Zen actually gives us a 40% boost.
> 
> A 38% boost changes the story, as does a 42% boost. But that's just because Intel has almost exclusively focused on improvement that help sell certain classes of CPU.
> 
> Skylake is out of reach for Zen, but Zen+ should match or exceed it. At which point in time Intel will already have Kaby or Cannon out, and will maintain a 10~15% or greater IPC lead.
> 
> Zen gets AMD closer, but it does not deliver them an IPC win.
> 
> I expect AMD to make up for that deficit using SMT and mroe cores at lower price points. If they can't reach 4.5GHz, having Haswell IPC would be a moot point... since Haswell CAN reach 4.5GHz.


Zen's 40% boost over Excavator is suppose to be independent of process shrinkage. It's impossible to determine where Zen will be in comparison to Skylake because none of the Intel benchmarks really account for improvement because of the architecture or improvement because the process shrunk.


----------



## Themisseble

Quote:


> Originally Posted by *spurdomantbh*
> 
> That image was proven fake. It's not 2x256bit FMAC, because FMAC requires both FADD and FMUL to work. So since it's 2x128bit FADD and 2x128bit FMUL, that results in 2x128bit FMAC, which can fuse into 1x256bit FMAC.


Not entirely fake.


----------



## looncraz

Quote:


> Originally Posted by *st0necold*
> 
> Godbless you guys.
> 
> Still wish I went to school.


You don't have to go to school (assuming you mean college) to learn it, you just have to be at least as insane as I am









Of course, it isn't something you can figure out in a year or two on your own, either. Guidance is extremely helpful.


----------



## looncraz

Quote:


> Originally Posted by *variant*
> 
> Zen's 40% boost over Excavator is suppose to be independent of process shrinkage. It's impossible to determine where Zen will be in comparison to Skylake because none of the Intel benchmarks really account for improvement because of the architecture or improvement because the process shrunk.


This is true, we don't know what Intel did to reach Skylake's performance levels. If the transistors switch fast enough and all they did was simplify a couple stages as a result, then this is easily something that AMD could have done independent of the larger architecture as well (or, perhaps, something they expect to do for Zen+)

Maybe this is why Intel won't disclose what they have done? Because, in essence, they did nothing.


----------



## variant

Quote:


> Originally Posted by *looncraz*
> 
> This is true, we don't know what Intel did to reach Skylake's performance levels. If the transistors switch fast enough and all they did was simplify a couple stages as a result, then this is easily something that AMD could have done independent of the larger architecture as well (or, perhaps, something they expect to do for Zen+)
> 
> Maybe this is why Intel won't disclose what they have done? Because, in essence, they did nothing.


Considering Penryn and Nehalem were 45nm and Skylake is now at 14nm, there should have been large IPC gains from process shrinkage alone and 8 core should probably be standard for the top of the line consumer CPU. Yet we don't even see 50% gains and we are still only getting 4 cores. Instead they've added an integrated GPU and we have seen much larger increases in the power of the iGPU with each generation. We even know Karby Lake, which will be competing with Zen+, is basically Skylake with a better iGPU. Intel's focus simply does not appear to have been on providing a better CPU, but instead providing a better APU.


----------



## looncraz

Quote:


> Originally Posted by *variant*
> 
> Considering Penryn and Nehalem were 45nm and Skylake is now at 14nm, there should have been large IPC gains from process shrinkage alone and 8 core should probably be standard for the top of the line consumer CPU. Yet we don't even see 50% gains and we are still only getting 4 cores. Instead they've added an integrated GPU and we have seen much larger increases in the power of the iGPU with each generation. We even know Karby Lake, which will be competing with Zen+, is basically Skylake with a better iGPU. Intel's focus simply does not appear to have been on providing a better CPU, but instead providing a better APU.


Skylake is right at 50% faster than Penryn from my estimates, but, no doubt, 8-cores should be pretty standard. Obviously AMD believed this as well.

The inclusion of a GPU, however, does throw something of a wrench into the simple anticipated progression as two fronts are now important. With Intel now holding the APU crown as well, AMD holds no more cards in any market.

They will need to regain one, or both, markets to survive to Zen+. The problem is that people are poisoned against AMD. AMD really has no choice but to push for the top. They have to get back to making Intel and nVidia look like the evil corporations they really are (not that AMD is actually better). One thing that happened that was beyond their control was a shift in public mentality. Computers and their parts were usually something we'd keep for six months, maybe a year, and we'd move on. With that throw-away mentality in mind, we'd seek out value. Today, however, people who once upgraded that quickly don't upgrade for years at a time. When they do, they want something that will last them/ With lower prices and an increased willingness to spend more, people will eschew the value proposition by default.

What AMD needs, though, is to cover both types of people by creating an alternative luxury brand (not a branding, an actual brand). AMD for the mainstream, low end stuff, and the new brand for upper-mainstream and the high end. Same tech, but the best stuff goes to the other brand, and the cheap stuff keeps the AMD brand. Then motherboard makers can proudly state: "Compatible with XXX Luxury CPUs" "... oh, and also those AMD CPUs as well."


----------



## variant

Quote:


> Originally Posted by *looncraz*
> 
> Skylake is right at 50% faster than Penryn from my estimates, but, no doubt, 8-cores should be pretty standard. Obviously AMD believed this as well.


I mean 50% from the shrinkage of the process. Penryn and Nehalem were both 45nm and there were gains from Penryn to Nahalem. So there were certainly some architecture advances made at points that were independent of the process. Yes, there was a total of 50% from Penryn to Skylake, but I think we should be seeing a whole lot more, including more improvements from just the process shrink. Instead of utilizing the greater space available to them to improve the CPU, they used it to improve the GPU.
Quote:


> Originally Posted by *looncraz*
> 
> The inclusion of a GPU, however, does throw something of a wrench into the simple anticipated progression as two fronts are now important. With Intel now holding the APU crown as well, AMD holds no more cards in any market.
> 
> They will need to regain one, or both, markets to survive to Zen+. The problem is that people are poisoned against AMD. AMD really has no choice but to push for the top. They have to get back to making Intel and nVidia look like the evil corporations they really are (not that AMD is actually better). One thing that happened that was beyond their control was a shift in public mentality. Computers and their parts were usually something we'd keep for six months, maybe a year, and we'd move on. With that throw-away mentality in mind, we'd seek out value. Today, however, people who once upgraded that quickly don't upgrade for years at a time. When they do, they want something that will last them/ With lower prices and an increased willingness to spend more, people will eschew the value proposition by default.
> 
> What AMD needs, though, is to cover both types of people by creating an alternative luxury brand (not a branding, an actual brand). AMD for the mainstream, low end stuff, and the new brand for upper-mainstream and the high end. Same tech, but the best stuff goes to the other brand, and the cheap stuff keeps the AMD brand. Then motherboard makers can proudly state: "Compatible with XXX Luxury CPUs" "... oh, and also those AMD CPUs as well."


There has never been a time when AMD controlled any major CPU market. That said, AMD has been trying to differentiate themselves by doing custom processors for companies, consoles being their first ones. If they can expand that business, and have the technology that at least matches Intel, they could easily create themselves a niche. We know AMD has at least 3 other contracts to create custom processors.

I don't think it's a change in mentality so much as it's simply not worthwhile to upgrade your CPU. Intel simply isn't producing processors with a large enough leap and low enough prices to justify upgrades as often. The enthusiast marketshare is probably one of the easiest ones to capture marketshare in if AMD is competitive in performance and price. Many simply don't have a brand loyalty, they will buy what has the highest benchmarks for what they can afford. It's also a market where there is nothing holding someone to their CPU since they end up having to buy a whole new motherboard anyway since Intel switches sockets so quickly. If AMD can produce an 8 core CPU that is as powerful or more powerful than the 4 core equivalent Intel, they could capture a large part of the enthusiast marketshare since DirectX 12 and Vulkan are going to make multicore worth owning.

It's the prebuilt PC market that AMD has the most problems with and that has always been the case. Even when AMD had performance advantage over Intel, we saw what Intel will do to maintain their marketshare. The biggest hope for AMD here is custom processors as it's rumored Apple is having AMD create one for 2017-2018 iMacs and MacBooks. If Apple devices start using AMD, it may bump AMD's prestige a little higher in the prebuilt market.


----------



## flopper

Quote:


> Originally Posted by *variant*
> 
> Considering Penryn and Nehalem were 45nm and Skylake is now at 14nm, there should have been large IPC gains from process shrinkage alone and 8 core should probably be standard for the top of the line consumer CPU. Yet we don't even see 50% gains and we are still only getting 4 cores. Instead they've added an integrated GPU and we have seen much larger increases in the power of the iGPU with each generation. We even know Karby Lake, which will be competing with Zen+, is basically Skylake with a better iGPU. Intel's focus simply does not appear to have been on providing a better CPU, but instead providing a better APU.


well the high pole jump raise the world record with one centimeter each time due to the bonus is for the record not how high you jump.
Intel have had no reason to produce better ipc etc..due to even a 3% or whatever sells new processors.
great to find amd back in the game again next year


----------



## looncraz

Quote:


> Originally Posted by *variant*
> 
> I mean 50% from the shrinkage of the process. [snip] Yes, there was a total of 50% from Penryn to Skylake, but I think we should be seeing a whole lot more, including more improvements from just the process shrink.


Yes, they no doubt have been resting on their laurels. But this was expected since AMD stopped being competitive.
Quote:


> Originally Posted by *variant*
> 
> There has never been a time when AMD controlled any major CPU market.


AMD/ATI actually had the dominant graphics market share for a very long time, they were just outmaneuvered by a smaller company marketing inferior products as if they were superior. nVidia focused on the most user-visible issues at the same time AMD was trying to figure out how to survive after spending WAY too much for ATI.

http://regmedia.co.uk/2007/10/29/gpu_q3_1.png

In addition, for years they had the most powerful CPUs which were in the highest demand, including overtaking Intel sales for a few non-consecutive quarters once they got their production capabilities in order (which cost them massive amounts of money)

http://www.extremetech.com/wp-content/uploads/2014/06/chart_amd.gif

The problem is that they were not on top long enough to make back the money they invested in expansion.
Quote:


> Originally Posted by *variant*
> 
> I don't think it's a change in mentality so much as it's simply not worthwhile to upgrade your CPU. Intel simply isn't producing processors with a large enough leap and low enough prices to justify upgrades as often.


Partly that, but also Windows has become smoother on slower hardware and the biggest performance jumps that you can feel are with SSDs. The secret to Windows feeling smoother was desktop GPU acceleration. We no longer can feel the small increases in performance because the desktop is running at a framerate governed by demand. This is part of why so much focus by Intel has been on GPUs. Once you get to the enthusiast sockets you ditch the GPU and gain more cores, but you're going to pay... and you will likely be behind the curve of desktop parts since the enthusiast parts are just server parts which require more time for validation than desktop parts.
Quote:


> Originally Posted by *variant*
> 
> The enthusiast marketshare is probably one of the easiest ones to capture marketshare in if AMD is competitive in performance and price.


Yes, certainly. Intel has a stranglehold on every other market and will only be letting go strategically.
Quote:


> Originally Posted by *variant*
> 
> It's also a market where there is nothing holding someone to their CPU since they end up having to buy a whole new motherboard anyway since Intel switches sockets so quickly.


This very much.
Quote:


> Originally Posted by *variant*
> 
> If AMD can produce an 8 core CPU that is as powerful or more powerful than the 4 core equivalent Intel, they could capture a large part of the enthusiast marketshare since DirectX 12 and Vulkan are going to make multicore worth owning.


They actually did this. Bulldozer was every bit as powerful, overall, as an i7 2600k. The FX-8350 was more so (I know, I switched from an FX-8350 to Sandy Bridge). But you couldn't get much more out of it than it had stock, and you could easily get more out of Sandy Bridge. I upgraded to a known 5GHz @ stock voltage i5 2500k (and ran it at 4.5GHz for comfort) and was wowed by the difference in single threaded loads, but the FX-8350 was still faster in a few things I cared about - such as rendering. When I went to an i7 2600k and clocked it to 4.5GHz, all bets were off.

Zen will need to be scratching around Haswell or better in single threaded workloads and be able to clock similarly to gain any love. Even then, Intel already has Skylake, which Zen will not match, let alone beat, unless the 40% number was some form of trickery (which might even be illegal).
Quote:


> Originally Posted by *variant*
> 
> The biggest hope for AMD here is custom processors as it's rumored Apple is having AMD create one for 2017-2018 iMacs and MacBooks. If Apple devices start using AMD, it may bump AMD's prestige a little higher in the prebuilt market.


This is very true. I hate Apple, I really do. I hated them when they were a tiny company barely able to sustain their existence. I hate them now that they are huge. But they can be a force to drive markets. If AMD makes some killer APU for it (which is what I suspect they will be doing), I think other OEMs will want to duplicate Apple's usage of it, and that is all good for AMD.


----------



## AmericanLoco

Quote:


> Originally Posted by *variant*
> 
> There has never been a time when AMD controlled any major CPU market. That said, AMD has been trying to differentiate themselves by doing custom processors for companies, consoles being their first ones. If they can expand that business, and have the technology that at least matches Intel, they could easily create themselves a niche. We know AMD has at least 3 other contracts to create custom processors.


In late '04/early '05, AMD actually had over 50% of consumer PC desktop sales. They were absolutely on fire with the original single core Athlon 64. They probably would have been near 75% if it weren't for Dell's exclusive use of Intel processors for an extremely long time. They nailed it again with the Athlon 64 X2.

The problem is they had no real answer for Conroe, and they wasted a year trying to be the first to come out with a monolithic quad-core. AMD got their wish, and had the first monolithic quad core - complete with crippling TLB bug setting them back another couple months. So for the entire *year* AMD had been wasting time developing their quad, Intel had been churning out millions of dual-die Core 2 Quads.

AMD basically vanished from high performance and enthusiast builds over night. IMO, it was the Agena quad-core flop that really did AMD in. Bulldozer was just icing on the cake.


----------



## SpeedyVT

Quote:


> Originally Posted by *AmericanLoco*
> 
> In late '04/early '05, AMD actually had over 50% of consumer PC desktop sales. They were absolutely on fire with the original single core Athlon 64. They probably would have been near 75% if it weren't for Dell's exclusive use of Intel processors for an extremely long time. They nailed it again with the Athlon 64 X2.
> 
> The problem is they had no real answer for Conroe, and they wasted a year trying to be the first to come out with a monolithic quad-core. AMD got their wish, and had the first monolithic quad core - complete with crippling TLB bug setting them back another couple months. So for the entire *year* AMD had been wasting time developing their quad, Intel had been churning out millions of dual-die Core 2 Quads.
> 
> AMD basically vanished from high performance and enthusiast builds over night. IMO, it was the Agena quad-core flop that really did AMD in. Bulldozer was just icing on the cake.


This is true.


----------



## escksu

This is their last chance, if Zen is not competitive, AMD will be gone or split up. Only their embedded division is making money, they are better off closing down GPU and CPU side.


----------



## SpeedyVT

Quote:


> Originally Posted by *escksu*
> 
> This is their last chance, if Zen is not competitive, AMD will be gone or split up. Only their embedded division is making money, they are better off closing down GPU and CPU side.


Very pessimistic aren't you.

This is corporate merry-go round bull, companies live and die. When a company becomes as vital to Intel and it's conglomeration there is no way in hell Intel will allow themselves to be sanctioned by the law to divide it's enterprise into tinier companies.

Intel through another company or something will pump money into AMD without stirring red flags. If a company see interest in their patents they just may want majority of the shares and annex it's control.


----------



## guttheslayer

It is thanks to AMD poor performance that we are seeing 3-5% ipc gain every generation, the only notable jump was Westmere to SB and Intel regret it straightaway.

No more good CPU coming in future until Zen really kick Intel butt...

Every year is wasted in seeing GPU improvement of 40-50% when its suppose to be CPU performance.









F the iGPU really.


----------



## looncraz

Quote:


> Originally Posted by *guttheslayer*
> 
> It is thanks to AMD poor performance that we are seeing 3-5% ipc gain every generation, the only notable jump was Westmere to SB and Intel regret it straightaway.


Actually, except for Ivy Bridge, Intel has done more significant changes than that. It's just that software has become increasingly accelerated by GPU, has better threading, AND have their performance benefits hidden behind a composited windowing system.

That last part is one of the biggest areas. Once upon a time, you could feel a 20% increase in performance very easily. Every menu, window, action would be that much faster. Then, we offloaded drawing almost entirely to the GPU. The GPU is so fast that we actually have to slow down the drawing and limit it to 60fps. That means that we no longer see changes in performance in everything we do, we pretty much have to measure it. We might notice that we spent a minute less encoding a video, but it won't impress us because of how much time went by anyway.

Once the computer is running smoothly, more performance can't really be felt. An SSD, enough memory, and a decent GPU can made even old systems feel very fast. My wife's Phenom II x4 955 boots as fast as my i7 2600k @ 4.5GHz, feels as fast on the desktop, runs just as smoothly (sometimes better - she has less junk running







). In day to day activities, you can't tell that my computer is drastically faster than hers. You can barely tell when playing games.

Of course, she lacks SATA3, but I raided two 64GB m4 SSDs so the performance is actually very similar (most apps go to a hard drive, but critical ones such as Firefox are on the SSD - using junctions and links).
Quote:


> Originally Posted by *guttheslayer*
> 
> Every year is wasted in seeing GPU improvement of 40-50% when its suppose to be CPU performance.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> F the iGPU really.


Agreed. I don't what my CPU to have a GPU except for my HTPC. And, even then, I have a dGPU so I can do some serious games on my 65" TV









That said, I'd like to take the GPU out of my HTPC, it pulls 40W more while watching DVDs or BluRays than the iGPU in the A8-7600. I'm eagerly awaiting AM4 APUs. I will not mind upgrading the HTPC in late 2017 or early 2018.


----------



## Cursedqt

Quote:


> Originally Posted by *AmericanLoco*
> 
> In late '04/early '05, AMD actually had over 50% of consumer PC desktop sales. They were absolutely on fire with the original single core Athlon 64. They probably would have been near 75% if it weren't for Dell's exclusive use of Intel processors for an extremely long time. They nailed it again with the Athlon 64 X2.
> 
> The problem is they had no real answer for Conroe, and they wasted a year trying to be the first to come out with a monolithic quad-core. AMD got their wish, and had the first monolithic quad core - complete with crippling TLB bug setting them back another couple months. So for the entire *year* AMD had been wasting time developing their quad, Intel had been churning out millions of dual-die Core 2 Quads.
> 
> AMD basically vanished from high performance and enthusiast builds over night. IMO, it was the Agena quad-core flop that really did AMD in. Bulldozer was just icing on the cake.


Completely true and you forgot to add Lenovo,HP,Nec , Toshiba , Fujitsu,Acer,Sony and a couple more OEMS if you guys want to check the report
https://www.ftc.gov/sites/default/files/documents/cases/091216intelcmpt.pdf


----------



## guttheslayer

Quote:


> Originally Posted by *looncraz*
> 
> Actually, except for Ivy Bridge, Intel has done more significant changes than that. It's just that software has become increasingly accelerated by GPU, has better threading, AND have their performance benefits hidden behind a composited windowing system.
> 
> That last part is one of the biggest areas. Once upon a time, you could feel a 20% increase in performance very easily. Every menu, window, action would be that much faster. Then, we offloaded drawing almost entirely to the GPU. The GPU is so fast that we actually have to slow down the drawing and limit it to 60fps. That means that we no longer see changes in performance in everything we do, we pretty much have to measure it. We might notice that we spent a minute less encoding a video, but it won't impress us because of how much time went by anyway.
> 
> Once the computer is running smoothly, more performance can't really be felt. An SSD, enough memory, and a decent GPU can made even old systems feel very fast. My wife's Phenom II x4 955 boots as fast as my i7 2600k @ 4.5GHz, feels as fast on the desktop, runs just as smoothly (sometimes better - she has less junk running
> 
> 
> 
> 
> 
> 
> 
> ). In day to day activities, you can't tell that my computer is drastically faster than hers. You can barely tell when playing games.
> 
> Of course, she lacks SATA3, but I raided two 64GB m4 SSDs so the performance is actually very similar (most apps go to a hard drive, but critical ones such as Firefox are on the SSD - using junctions and links).
> Agreed. I don't what my CPU to have a GPU except for my HTPC. And, even then, I have a dGPU so I can do some serious games on my 65" TV
> 
> 
> 
> 
> 
> 
> 
> 
> 
> That said, I'd like to take the GPU out of my HTPC, it pulls 40W more while watching DVDs or BluRays than the iGPU in the A8-7600. I'm eagerly awaiting AM4 APUs. I will not mind upgrading the HTPC in late 2017 or early 2018.


I believe u still can make computation and processing of workload faster...

AVX 512, L4 eDRAM, 6 cores mainstream, (Not to mentioned removal of Direct soldering), all this is suppose to be integrated into SL by now... but intel aint interested cos there is no benefit for them to push for such instant 20% performance. They rather milk the 20% by going 3-5% every year which will last them 4-5 years and spent all the R&D budget into something else.


----------



## looncraz

Quote:


> Originally Posted by *guttheslayer*
> 
> I believe u still can make computation and processing of workload faster...
> 
> AVX 512, L4 eDRAM, 6 cores mainstream, (Not to mentioned removal of Direct soldering), all this is suppose to be integrated into SL by now... but intel aint interested cos there is no benefit for them to push for such instant 20% performance. They rather milk the 20% by going 3-5% every year which will last them 4-5 years and spent all the R&D budget into something else.


Indeed we should have advanced more. The lack of solder, though, is because FinFets (as employed by Intel) are created by etching away from of the wafer's thickness then filling in some of the spacing with an insulator. This dramatically weakens the die and it can't handle a pool of hot solder attaching it to the IHS. There may well also be thermal expansion considerations that requires the die to be, effectively, flexible in its mounting.

A 14nm FinFet wafer is actually floppy.


----------



## guttheslayer

Quote:


> Originally Posted by *looncraz*
> 
> Indeed we should have advanced more. The lack of solder, though, is because FinFets (as employed by Intel) are created by etching away from of the wafer's thickness then filling in some of the spacing with an insulator. This dramatically weakens the die and it can't handle a pool of hot solder attaching it to the IHS. There may well also be thermal expansion considerations that requires the die to be, effectively, flexible in its mounting.
> 
> A 14nm FinFet wafer is actually floppy.


In that case it why is the more expensive HEDT able to endure soldering?


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> Indeed we should have advanced more. The lack of solder, though, is because FinFets (as employed by Intel) are created by etching away from of the wafer's thickness then filling in some of the spacing with an insulator. This dramatically weakens the die and it can't handle a pool of hot solder attaching it to the IHS. There may well also be thermal expansion considerations that requires the die to be, effectively, flexible in its mounting.
> 
> A 14nm FinFet wafer is actually floppy.


Would that imply they are easier to go lidless?


----------



## svenge

Quote:


> Originally Posted by *guttheslayer*
> 
> In that case it why is the more expensive HEDT able to endure soldering?


It's possible that the heat density is more uniform in the HEDT lineup, as there's no die area allotted to the integrated GPU.


----------



## looncraz

Quote:


> Originally Posted by *guttheslayer*
> 
> In that case it why is the more expensive HEDT able to endure soldering?


You can use a more expensive, lower temperature, solder and put it in the oven longer to reduce the thermal stress. It genuinely increases processing and material costs.

I don't expect Zen to have a soldered lid for that very reason.


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> Would that imply they are easier to go lidless?


Exactly the opposite. The core is more fragile than ever.


----------



## dave12

I'm too dumb as hell to understand this thread. Does this mean Zen is good on paper?


----------



## looncraz

Quote:


> Originally Posted by *dave12*
> 
> I'm too dumb as hell to understand this thread. Does this mean Zen is good on paper?


On paper, all else being equal, it will, in the very least, trade shots with Haswell in integer, and spank it in floating point.

The reality is that there are too many unknowns to really know what is on the paper. We have one corner of the paper, and it's a delicious corner.


----------



## christoph

Quote:


> Originally Posted by *guttheslayer*
> 
> It is thanks to AMD poor performance that we are seeing 3-5% ipc gain every generation, the only notable jump was Westmere to SB and Intel regret it straightaway.
> 
> No more good CPU coming in future until Zen really kick Intel butt...
> 
> Every year is wasted in seeing GPU improvement of 40-50% when its suppose to be CPU performance.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> F the iGPU really.


is it????

is it not that, not only you're getting 3-5% jumps in performance but you also have to by another motherboard so you can upgrade 5% in performance, but peoples faults that buys Intel's no matter what, or because Intel's still the king and/or AMD should never surpass Intel's performance

I really don't care what people do with their money but I sure don't have 1000 bucks just for the sake of upgrading 5% in performance...


----------



## prjindigo

Quote:


> Originally Posted by *BiG StroOnZ*
> 
> I agree with this, if it becomes a 980 Ti vs Fury X with Zen vs Skylake-E/Kaby Lake then yes it will definitely be problematic as you see with the 980 Ti vs Fury X. The majority of people simply just buy a 980 Ti because it is clearly faster and overclocks better, therefore allowing more for your $650. Whereas the Fury X performs worse than a 980 Ti and actually doesn't really overclock at all, but yet still maintains a $650 price tag.
> 
> If AMD does this again with their Zen CPUs, they will have a problem moving units as seen with their Fury lineup.


yeah.... about that.

Fury X is kicking the Titan-X in the boot right now and getting better. You keep perpetuating something you heard from nVidia before the Fury released...


----------



## Clocknut

Quote:


> Originally Posted by *dave12*
> 
> I'm too dumb as hell to understand this thread. Does this mean Zen is good on paper?


Assuming AMD is over claim their performance,

take half of what AMD claim = 20% improve IPC over excavator = where it will land on Intel's CPU performance?


----------



## BiG StroOnZ

Quote:


> Originally Posted by *prjindigo*
> 
> yeah.... about that.
> 
> Fury X is kicking the Titan-X in the boot right now and getting better. You keep perpetuating something you heard from nVidia before the Fury released...


So the Fury X suddenly overclocks better? Because, while they might have improved driver performance. It still doesn't overclock well. Which still means the 980 Ti and Titan X wins after overclocked. Since both can easily overclock to 1400MHz without hesitation, and 1450MHz seems pretty easy to attain and 1480-1500MHz with a non-reference 980 Ti is also pretty common (or with a Titan X under water).


----------



## Cyro999

Quote:


> Originally Posted by *Clocknut*
> 
> Assuming AMD is over claim their performance,
> 
> take half of what AMD claim = 20% improve IPC over excavator = where it will land on Intel's CPU performance?


If that was the case, it would be DOA

Remember that they're trading MT scaling for ST performance. If they lost the MT scaling without dramatic improvements in ST perf, it would make a worse CPU. There's no reason to think they can't make decent ST performance as Skylake has huge IPC advantages (as much as ~80-90% for video encoding, for example) vs Piledriver, and even their claimed 40% over excavator would fall short of that (with better MT scaling and/or more cores?)


----------



## The Stilt

The sTIM used in CPUs is basically pure indium, which melts at 156.7°C. All semiconductor dies, FinFet or not can easily withstand the temperature when powered off.


----------



## chrisjames61

Quote:


> Originally Posted by *Cursedqt*
> 
> Okay can you link AMD reporting this?
> 
> It peeked my interest


If you are going to bash people for discussing this kind of speculation at least learn how to spell. "piqued my interest"


----------



## Clocknut

Quote:


> Originally Posted by *Cyro999*
> 
> If that was the case, it would be DOA
> 
> Remember that they're trading MT scaling for ST performance. If they lost the MT scaling without dramatic improvements in ST perf, it would make a worse CPU. There's no reason to think they can't make decent ST performance as Skylake has huge IPC advantages (as much as ~80-90% for video encoding, for example) vs Piledriver, and even their claimed 40% over excavator would fall short of that (with better MT scaling and/or more cores?)


I hate to say this, but I take whatever the chip maker claim and then minus half of it.

Haswell can perform 30% quicker than Sandy bridge, but in real world, it is roughly about half of that. The same could be said for AMD, their 40% claim might have cover the bench with those new instructions sets to get that 40% number, but real world bench might be different than what it is bench with those instruction set enhancement.

Lets hope the 14nm finfet process can actually make up the rest of the gain.


----------



## Asus11

AMD's answer to IVYBRIDGE









knew they had it in them


----------



## 2010rig

Quote:


> Originally Posted by *Asus11*
> 
> AMD's answer to IVYBRIDGE
> 
> 
> 
> 
> 
> 
> 
> 
> 
> knew they had it in them


What is their answer for Skylake?


----------



## Seraphic

If this is true, what does it mean for AMD's Xen?
Seeing there was talk it would be 8cores/16threads?

Since Zen is already "taped" out, is it too late for them to increase their core count?


----------



## Cyro999

Quote:


> Haswell can perform 30% quicker than Sandy bridge, but in real world, it is roughly about half of that.


Sure, i don't ever go by what the chip maker says. I run and triple check benchmarks; they say that Skylake is about 35-38% faster than SB for x264 encoding


----------



## Ashura

Quote:


> Originally Posted by *Asus11*
> 
> AMD's answer to IVYBRIDGE
> 
> 
> 
> 
> 
> 
> 
> 
> 
> knew they had it in them


Hopefully








Quote:


> Originally Posted by *2010rig*
> 
> What is their answer for Skylake?


baby steps..


----------



## ebduncan

why do people keep mentioning haswell?

haswell is old news, no one in their right mind would purchase a haswell cpu right now, because skylake is out. So why are you making comparisons to haswell?

Skylake is on the market now, by the time AMD gets Zen out the door they would be lucky to be competing with just skylake, but likely intel's next chips. Point being AMD needs to perform at the Skylake level, not haswell.

All I know is I've been itching for a upgrading here lately. Trying to hold off on a Skylake 6700k build, until Zen comes out, but probably just gonna go with the Skylake.


----------



## Seronx

Quote:


> Originally Posted by *2010rig*
> 
> What is their answer for Skylake?


Harvester // 3rd Gen Bulldozer.
---
1st Gen -> Bulldozer
Enhanced 1st Gen -> Piledriver
2nd Gen -> Steamroller
Enhanced 2nd Gen -> Excavator
3rd Gen -> Harvester

Generations are determined by large architectural changes. New fetch units, new decode units, new FPU arrangement, new Integer/Memory execution arrangement, etc

Just to note: http://www.ponsse.com/products/harvesters // Why it can also be hinted that the architecture might be called Crane instead. [Forestry or Skyscraper]


----------



## Clocknut

Quote:


> Originally Posted by *ebduncan*
> 
> why do people keep mentioning haswell?
> 
> haswell is old news, no one in their right mind would purchase a haswell cpu right now, because skylake is out. So why are you making comparisons to haswell?
> 
> Skylake is on the market now, by the time AMD gets Zen out the door they would be lucky to be competing with just skylake, but likely intel's next chips. Point being AMD needs to perform at the Skylake level, not haswell.
> 
> All I know is I've been itching for a upgrading here lately. Trying to hold off on a Skylake 6700k build, until Zen comes out, but probably just gonna go with the Skylake.


because we dont think it will beat Haswell, infact if AMD match haswell = it is consider very good already, Haswell IPC is within 10% of skylake.


----------



## Cyro999

Quote:


> Haswell IPC is within 10% of skylake.


Not always, it's generally about 10%


----------



## looncraz

Quote:


> Originally Posted by *Clocknut*
> 
> Assuming AMD is over claim their performance,
> 
> take half of what AMD claim = 20% improve IPC over excavator = where it will land on Intel's CPU performance?


It would be right between Nehalem and Sandy Bridge at that point.


----------



## looncraz

Quote:


> Originally Posted by *Cyro999*
> 
> If that was the case, it would be DOA
> 
> Remember that they're trading MT scaling for ST performance. If they lost the MT scaling without dramatic improvements in ST perf, it would make a worse CPU. There's no reason to think they can't make decent ST performance as Skylake has huge IPC advantages (as much as ~80-90% for video encoding, for example) vs Piledriver, and even their claimed 40% over excavator would fall short of that (with better MT scaling and/or more cores?)


40% over Excavator is about 64% over Piledriver.

Code:



Code:


Benchmark-derived relative performance, multi-benchmark means.
Zen values inferred from blind +40% jump.
Excavator is generally within 1% of Penryn (Core 2).

Bulldozer  Piledriver   Steamroller  Excavator   Zen
   --          +9%         +6.7%      +9.85%     +40%   
100.00%      109.00%      116.30%     127.76%   178.86%    Bulldozer
             100.00%      106.70%     117.21%   164.09%    Piledriver
                          100.00%     109.85%   153.79%    Steamroller
                                      100.00%   140.00%    Excavator


----------



## looncraz

Quote:


> Originally Posted by *christoph*
> 
> is it????
> 
> is it not that, not only you're getting 3-5% jumps in performance but you also have to by another motherboard so you can upgrade 5% in performance, but peoples faults that buys Intel's no matter what, or because Intel's still the king and/or AMD should never surpass Intel's performance
> 
> I really don't care what people do with their money but I sure don't have 1000 bucks just for the sake of upgrading 5% in performance...


The good side of this is that my Sandy Bridge hardware is still quite valuable. The downside to this is that replacement Sandy Bridge hardware is still valuable.


----------



## looncraz

Quote:


> Originally Posted by *Clocknut*
> 
> I hate to say this, but I take whatever the chip maker claim and then minus half of it.
> 
> Haswell can perform 30% quicker than Sandy bridge, but in real world, it is roughly about half of that. The same could be said for AMD, their 40% claim might have cover the bench with those new instructions sets to get that 40% number, but real world bench might be different than what it is bench with those instruction set enhancement.
> 
> Lets hope the 14nm finfet process can actually make up the rest of the gain.


Intel claims are actually pretty close to reality:


(the above results are from an average of 24 benchmarks from generation to generation)

Peak IPC increase is 30%, but Intel only claimed a small average increase.


----------



## looncraz

Quote:


> Originally Posted by *ebduncan*
> 
> why do people keep mentioning haswell?
> 
> haswell is old news, no one in their right mind would purchase a haswell cpu right now, because skylake is out. So why are you making comparisons to haswell?
> 
> Skylake is on the market now, by the time AMD gets Zen out the door they would be lucky to be competing with just skylake, but likely intel's next chips. Point being AMD needs to perform at the Skylake level, not haswell.
> 
> All I know is I've been itching for a upgrading here lately. Trying to hold off on a Skylake 6700k build, until Zen comes out, but probably just gonna go with the Skylake.


A few reasons:

1. Intel has released no architectural details about Skylake, so we can only compare with Haswell design.
2. Most CPUs currently available are Haswell. Skylake, while available, has limited SKUs.
3. Zen will not match Skylake, it will be around Haswell performance - which is still a drastic improvement.
4. Imagine a mainstream CPU with 8 cores with Haswell IPC vs a 4 core Skylake CPU...

My motherboard just went all wonky on me, won't let me overclock my i7 2600k anymore, and only allows me to use two DIMMs at once. I swapped CPUs and RAM with the same results, so I had to buy a new motherboard. I went through the benchmarks, the overclocking capabilities, and the prices, and it made more sense to just keep what I have. I could easily use more cores, but the prices are unsupportable - even for used. In fact, I paid nearly the same price for a used version of my exact same motherboard as I paid for mine new three years ago. The CPU is only about $100 cheaper, four years later. That tells you how little incentive there is to upgrade to anything Intel is offering.

I can handle Sandy Bridge IPC, at 4.5GHz or thereabouts, no problem. I'd be willing to spend more to get 6 or 8 cores at that same performance level, but not as much as I have to pay to go with Intel to do it. The benefit just isn't there. My next upgrade cycle also just happens to be spring of 2017 (excl GPU, which is purely performance - and need - based +80% or better, stock vs stock, - no GPU currently offers that).


----------



## iLeakStuff

The good thing for AMD for staying so far behind Intel in CPUs for so long, is that if even they get Sandy Bridge performance out of their Zen CPUs, it will be good enough to recapture a ton of market share. That will be a good start for them.

It will be enough for people to think "OK, I can support the underdog since the CPUs wont bottleneck my dGPUs and its not so far behind Intel`s Skylake chips"
This will make AMD CPU division profitable again, which allow them to hire more engineers and spend more money on R&D which create a better potential for the future.

I think they had a lot of momentum before Bulldozer, but it was eventually so far behind Intel, that people just couldnt support them so it almost killed AMD. God knows how much cash they spent on that catastropic project.


----------



## SpeedyVT

Quote:


> Originally Posted by *iLeakStuff*
> 
> The good thing for AMD for staying so far behind Intel in CPUs for so long, is that if even they get Sandy Bridge performance out of their Zen CPUs, it will be good enough to recapture a ton of market share. That will be a good start for them.
> 
> It will be enough for people to think "OK, I can support the underdog since the CPUs wont bottleneck my dGPUs and its not so far behind Intel`s Skylake chips"
> This will make AMD CPU division profitable again, which allow them to hire more engineers and spend more money on R&D which create a better potential for the future.
> 
> I think they had a lot of momentum before Bulldozer, but it was eventually so far behind Intel, that people just couldnt support them so it almost killed AMD. God knows how much cash they spent on that catastropic project.


It's an entirely new chip design instead of the progressive rehash of Intel. This is ground breaking! It's worth the experience if you know what I mean and that's enough to capture a lot of clients.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> Intel claims are actually pretty close to reality:
> 
> 
> (the above results are from an average of 24 benchmarks from generation to generation)
> 
> Peak IPC increase is 30%, but Intel only claimed a small average increase.


IPC is stupid, it's only relative to the performance of the application it utilizes. It's comparable on the x86 instruction architecture purely by the use of application that compares in usage. So it's not entirely useless it just compares for our everyday tasks in Windows. However if a company buys up it's own proprietary x86 chips from AMD it could hold instructions that are exploited exponentially fast or translate better than the chips designed for the x86 instruction architecture. Custom instruction sets. Sorry not easy to explain here, far too complicated.


----------



## Themisseble

Quote:


> Originally Posted by *Seronx*
> 
> Harvester // 3rd Gen Bulldozer.
> ---
> 1st Gen -> Bulldozer
> Enhanced 1st Gen -> Piledriver
> 2nd Gen -> Steamroller
> Enhanced 2nd Gen -> Excavator
> 3rd Gen -> Harvester
> 
> Generations are determined by large architectural changes. New fetch units, new decode units, new FPU arrangement, new Integer/Memory execution arrangement, etc
> 
> Just to note: http://www.ponsse.com/products/harvesters // Why it can also be hinted that the architecture might be called Crane instead. [Forestry or Skyscraper]


isnt AMD killing Bulldozer arch?


----------



## SpeedyVT

Quote:


> Originally Posted by *Themisseble*
> 
> isnt AMD killing Bulldozer arch?


Yes


----------



## Themisseble

Quote:


> Originally Posted by *SpeedyVT*
> 
> Yes


Why is he saying about next gen of bulldozers? 3rd Gen -> Harvester?


----------



## geoxile

Quote:


> Originally Posted by *Themisseble*
> 
> Why is he saying about next gen of bulldozers? 3rd Gen -> Harvester?


Just ignore him. He's never once been right about anything in the years he's been handing out these "premonitions" like candy.


----------



## SpeedyVT

Quote:


> Originally Posted by *Themisseble*
> 
> Why is he saying about next gen of bulldozers? 3rd Gen -> Harvester?


Just a type-o. It's something else entirely but it has to fit in the naming system they're going with so it's going to be harvesters instead of bulldozing equipment.

I would've done mythological gods.


----------



## Seronx

Quote:


> Originally Posted by *Themisseble*
> 
> isnt AMD killing Bulldozer arch?


Never have all your eggs in one basket. AMD has a Bulldozer architecture in development side by side with Zen.

Stuff I have caught.. a lot of the 22nm PDSOI(22SHP) and 14nm FinFET(14XM) features were fused and melded.

Level 2 Branch Predictor // Predicts branches hitting simultaneously // Module has one of these.
Level 1 Branch Predictor // Predicts branches hitting separately // Module has two of these.
Dual-ported L1 Instruction Cache
Instruction Fetch 0 // 32B Fetch for Core 0 or Core 0 + Core 1
Instruction Fetch 1 // 32B Fetch for Core 1 or Core 0 + Core 1
Unified Pick // IFU01 stores instructions to be decode, identify bits are 00 -> Core 0, 01 -> Core 1, 11 -> Core 0 + Core 1(Has to be decoded by both decodes)
Decode 0 for Core 0 + FPU Context 0 // Small Decode -> 64b/128b ops -- Big Decode -> 64b/128b/256b ops // 2x SDEs(FP128 only) -- 2x BDEs(FP128/FP256)
Decode 1 for Core 1 + FPU Context 1 // Small Decode -> 64b/128b ops -- Big Decode -> 64b/128b/256b ops // 2x SDEs(FP128 only) -- 2x BDEs(FP128/FP256)
Each core has a Dispatch and Macro-op cache

No real large changes to the Front-end of the Integer cores other than size of queues // Retirement Queue, Schedular Queue, etc.
Execution side -> LD/MUL AGLU + LD/DIV AGLU + LD/ST/Branch AGLU + LD/ST/Popcnt AGLU, All capable of executing ALU and AGU ops.
The L1D Cache is 2R2RW ported.

The L2 Cache is partitioned and the L2 interface has a data deduplication unit.
512KB/1 MB L2_0 -> L1i cache(R), L1d for Core 0(RW), L1d for Core 1(RW), Outbound(RW)
512KB/1 MB L2_1 -> L1i cache(R), L1d for Core 1(RW), L1d for Core 0(RW), Outbound(RW)
Write Coalescing Cache is not needed anymore as the fusion of WT is done by the L2 interface.

The FPU has a particularly weird evolution.
4x LD/EP FMACs + 4x EP PRFs
4x LD/ST/DP FMACs + 4x DP PRFs
Since there is no MMX units, that area has been digested into a multi-PRF interconnect. Much like the one in the integer side, oh wait I didn't mention that above oops. Each AGLU has its own PRF, there.

So, far based on the usual suspects(AMD contractors) that processor is most likely being built on 22FDX.


----------



## SpeedyVT

Quote:


> Originally Posted by *Seronx*
> 
> Never have all your eggs in one basket. AMD has a Bulldozer architecture in development side by side with Zen.
> 
> Stuff I have caught.. a lot of the 22nm PDSOI(22SHP) and 14nm FinFET(14XM) features were fused and melded.
> 
> Level 2 Branch Predictor // Predicts branches hitting simultaneously // Module has one of these.
> Level 1 Branch Predictor // Predicts branches hitting separately per thread. // Module has two of these.
> Dual-ported L1 Instruction Cache
> Instruction Fetch 0 // 32B Fetch for Core 0 or Core 0 + Core 1
> Instruction Fetch 1 // 32B Fetch for Core 1 or Core 0 + Core 1
> Unified Pick // IFU01 stores instructions to be decode, identify bits are 00 -> Core 0, 01 -> Core 1, 11 -> Core 0 + Core 1
> Decode 0 // Small Decode -> 64b/128b ops -- Big Decode -> 64b/128b/256b ops // 2x SDEs -- 2x BDEs
> Decode 1 // Small Decode -> 64b/128b ops -- Big Decode -> 64b/128b/256b ops // 2x SDEs -- 2x BDEs
> Each core has a Dispatch and Macro-op cache
> 
> No real large changes to the Front-end of the Integer cores other than size of queues // Retirement Queue, Schedular Queue, etc.
> Execution side -> LD/MUL AGLU + LD/DIV AGLU + LD/ST/Branch AGLU + LD/ST/Popcnt AGLU
> The L1D Cache is 2R2RW ported.
> 
> The L2 Cache is partitioned and the L2 interface has a data deduplication unit.
> 512KB/1 MB L2_0 -> L1i cache(R), L1d for Core 0(RW), L1d for Core 1(RW), Outbound(RW)
> 512KB/1 MB L2_1 -> L1i cache(R), L1d for Core 1(RW), L1d for Core 0(RW), Outbound(RW)
> Write Coalescing Cache is not needed anymore as the fusion of WT is done by the L2 interface.
> 
> The FPU has a particularly weird evolution.
> 4x LD/EP FMACs + 4x EP PRFs
> 4x LD/ST/DP FMACs + 4x DP PRFs
> Since there is no MMX units, that area has been digested into a multi-PRF interconnect. Much like the one in the integer side, oh wait I didn't mention that above oops. Each AGLU has its own PRF, there.
> 
> So, far based on the usual suspects that processor is most likely being built on 22FDX.


This is true, but for a marketing purpose and what they're going to be selling consumer end as we know it is dead. Till revitalized of course.

I was looking at the Zen design and I thought it was reminiscent of two Jaguar Cores stacked into each other, literally stacked into. Jaguar as you know is pretty powerful for it's low TDP design and is a lot more powerful than Bulldozer I'm sure if actually scaled up.

AMD is always working a series of CPU designs.


----------



## iLeakStuff

Quote:


> Originally Posted by *SpeedyVT*
> 
> IPC is stupid, it's only relative to the performance of the application it utilizes. It's comparable on the x86 instruction architecture purely by the use of application that compares in usage. So it's not entirely useless it just compares for our everyday tasks in Windows. However if a company buys up it's own proprietary x86 chips from AMD it could hold instructions that are exploited exponentially fast or translate better than the chips designed for the x86 instruction architecture. Custom instruction sets. Sorry not easy to explain here, far too complicated.


IPC is one thing. Having more cores and using software that can utilize these extra cores is another. How high the cores can clock does also make up for less IPC. TDP target, ie having higher TDP vs Intel chips does also help. Software side we have instructions and features that can boost applications.

A ton of unknowns here that makes it impossible to guess where Zen will be performance wise


----------



## SpeedyVT

Quote:


> Originally Posted by *iLeakStuff*
> 
> IPC is one thing. Having more cores and using software that can utilize these extra cores is another. How high the cores can clock does also make up for less IPC. TDP target, ie having higher TDP vs Intel chips does also help. Software side we have instructions and features that can boost applications.
> 
> A ton of unknowns here that makes it impossible to guess where Zen will be performance wise


Exactly!

It's why all the PCMRs are like PS4 is super weak. I'm like da faq? PS4 is not a PC x86 design and it could be exploited potentially more than the XB1. Hardware exploitation is far more complicated than IPC because even though it is designed around x86 it could be using a whole complete set of Sony's own proprietary instruction sets that give 10-20% more CPU efficient advantage over Microsoft's.


----------



## iLeakStuff

If anyone wonders about 14nm/16nm and all the BS thats been floating around lately. I see no point in making a new thread about it since it will be moved to the dead CPU subforum anyway.
Notice that the work on 16nm Zen ended before the summer of 2015. We know it taped out shortly after. The 14nm SOCs are currently under development (began May this year)

You`re welcome











14nm SOC design/Cheetah is this one


----------



## delboy67

Quote:


> Originally Posted by *SpeedyVT*
> 
> It's an entirely new chip design instead of the progressive rehash of Intel. This is ground breaking! It's worth the experience if you know what I mean and that's enough to capture a lot of clients.


It looks like zen could be on time and on a good process node, as anyone older will notice that has been a long time arriving. Amd on time could throw a surprise, its been that long! Remember, fx4*** was supposed to be 45nm and in 2008, fx8*** in 2009.


----------



## SpeedyVT

Quote:


> Originally Posted by *delboy67*
> 
> It looks like zen could be on time and on a good process node, as anyone older will notice that has been a long time arriving. Amd on time could throw a surprise, its been that long! Remember, fx4*** was supposed to be 45nm and in 2008, fx8*** in 2009.


I don't hold such an expectation that it will defeat an Intel. However considering every module contains 4 cores in the SMT design with Hyper-threading we're likely to get 8 cores for less than what Intel charges and get 16 threads. This being the current release doesn't also mean there isn't potential for more modules to be stacked like twice more (16 cores in the future). If this is so this is obviously not just good for consumer market but also server.


----------



## delboy67

Quote:


> Originally Posted by *SpeedyVT*
> 
> *I don't hold such an expectation that it will defeat an Intel.* However considering every module contains 4 cores in the SMT design with Hyper-threading we're likely to get 8 cores for less than what Intel charges and get 16 threads. This being the current release doesn't also mean there isn't potential for more modules to be stacked like twice more (16 cores in the future). If this is so this is obviously not just good for consumer market but also server.


In outright st performance? Me neither tbh but I hope its at least competitive enough to push everything on a bit more.


----------



## SpeedyVT

Quote:


> Originally Posted by *delboy67*
> 
> In outright st performance? Me neither tbh but I hope its at least competitive enough to push everything on a bit more.


Regardless it'll be a good upgrade for my computer.

I'm not bias toward Intel. I just rather use AMD if I could.

If someone said I'll build you an Intel i7 computer with so and so specs. I would probably take and use it.


----------



## Themisseble

Something good from AMD would, be good replacement for my FX 6300, although it serves me well. Looking at some games actually even, if ST is very important piledriver does very well. (comapring oc-ed FX 6300 or FX 4300 or FX 8320 to i3 or i5 non K)
I think main problem with AMD CPU is balance of FPU and INTEGER performance... specially when game uses instructions which are more based on FPU.. like AVX (SC2).

https://www.youtube.com/watch?v=MGfyCkH4vpw

Where is first DX12 (ONLY) MP game? MMO or FPS?... 2016 or 2017 or later?


----------



## SpeedyVT

Quote:


> Originally Posted by *Themisseble*
> 
> Something good from AMD would, be good replacement for my FX 6300, although it serves me well. Looking at some games actually even, if ST is very important piledriver does very well. (comapring oc-ed FX 6300 or FX 4300 or FX 8320 to i3 or i5 non K)
> I think main problem with AMD CPU is balance of FPU and INTEGER performance... specially when game uses instructions which are more based on FPU.. like AVX (SC2).
> 
> https://www.youtube.com/watch?v=MGfyCkH4vpw
> 
> Where is first DX12 (ONLY) MP game? MMO or FPS?... 2016 or 2017 or later?


A lot of those FPU instructions can be offloaded, but they don't.


----------



## Kuivamaa

Quote:


> Originally Posted by *Seraphic*
> 
> If this is true, what does it mean for AMD's Xen?
> Seeing there was talk it would be 8cores/16threads?
> 
> Since Zen is already "taped" out, is it too late for them to increase their core count?


Zen building block is reported to be 4 cores. This means that technically they can easily release a 12C/24T unit by using three blocks. Depends on clocks/thermals and expected ROI really. I think for desktop at least , they will stick with 8/16.

Quote:


> Originally Posted by *iLeakStuff*
> 
> The good thing for AMD for staying so far behind Intel in CPUs for so long, is that if even they get Sandy Bridge performance out of their Zen CPUs, it will be good enough to recapture a ton of market share. That will be a good start for them.


SBlevel if performance in late 2016 will be the death of them. A 4C/8T CPU would even be a sidetrack to vishera in MT workloads , good luck selling such a chip in big quantities for more than 150. A 6C/12T chip would probably just match Kaby lake i7 quads as well. This would bring them back to 2012 , having to undercut Intel massively to get sales.


----------



## looncraz

Quote:


> Originally Posted by *Kuivamaa*
> 
> SBlevel if performance in late 2016 will be the death of them. A 4C/8T CPU would even be a sidetrack to vishera in MT workloads , good luck selling such a chip in big quantities for more than 150. A 6C/12T chip would probably just match Kaby lake i7 quads as well. This would bring them back to 2012 , having to undercut Intel massively to get sales.


Better than < Core 2 performance in 2015 and most of 2016









You have to look at where they are to consider the value of an improvement. Intel is ahead and will stay ahead. Big whoop, AMD can be an alternative in many other parts of the market... except not without getting closer to Intel's performance.

If they have Sandy Bridge IPC and Haswell clocks, they will need to add SMT in for free to compete with the i5s, while also throwing in the highest clocks and features they can manage. Sandy Bridge IPC, 6 threads, and SMT will be used against Skylake i7 Quads. They will then have the 8 core to against the Hexa core i7s.

Right now, the six-core FX-6300 competes, barely, against a Pentium G3xxx. A $60 CPU.


----------



## warpuck

Quote:


> Originally Posted by *chrisjames61*
> 
> If you are going to bash people for discussing this kind of speculation at least learn how to spell. "piqued my interest"


It has peaked my interest, but it is still far from the summit


----------



## Clocknut

Quote:


> Originally Posted by *iLeakStuff*
> 
> IPC is one thing. Having more cores and using software that can utilize these extra cores is another. How high the cores can clock does also make up for less IPC. TDP target, ie having higher TDP vs Intel chips does also help. Software side we have instructions and features that can boost applications.
> 
> A ton of unknowns here that makes it impossible to guess where Zen will be performance wise


Without iGPU, 14nm should allow AMD to clock Zen high enough to make up the inferior IPC to close up the gap between Skylake.

Knowing that Zen design was started around Haswell era, their engineer probably is aiming Haswell as their target. I still think the worst case scenario Zen IPC should still hit somewhere around Ivy bridge IPC.


----------



## epic1337

theres a brick-wall at the 5Ghz range, its quite unlikely that we'd see a stock processor that exceeds that clock speed out of the box.

even if Zen were to get clocked 5GHz out of the box, we'd end up with just no headroom for OC and a much more expensive chip altogether due to binning issues.
FX-9xxx is one of such example.

though i'd have to point out, if they manage to get it dramatically more power efficient than their bulldozer series, then we can get consistent high clocks without blowing up the VRMs.
as such, even if IPC were at Ivy Bridge level, so long as it's priced right and clocks well, then it'd be able to go up to skylake's cost efficiency upfront, barely losing on performance as well.


----------



## KarathKasun

Quote:


> Originally Posted by *warpuck*
> 
> It has peaked my interest, but it is still far from the summit


No, it has peeked at my interest. It needs to quit spying on me.


----------



## Kuivamaa

Quote:


> Originally Posted by *iLeakStuff*
> 
> The good thing for AMD for staying so far behind Intel in CPUs for so long, is that if even they get Sandy Bridge performance out of their Zen CPUs, it will be good enough to recapture a ton of market share. That will be a good start for them.


Quote:


> Originally Posted by *looncraz*
> 
> Better than < Core 2 performance in 2015 and most of 2016
> 
> 
> 
> 
> 
> 
> 
> 
> 
> You have to look at where they are to consider the value of an improvement. Intel is ahead and will stay ahead. Big whoop, AMD can be an alternative in many other parts of the market... except not without getting closer to Intel's performance.
> 
> If they have Sandy Bridge IPC and Haswell clocks, they will need to add SMT in for free to compete with the i5s, while also throwing in the highest clocks and features they can manage. Sandy Bridge IPC, 6 threads, and SMT will be used against Skylake i7 Quads. They will then have the 8 core to against the Hexa core i7s.
> 
> Right now, the six-core FX-6300 competes, barely, against a Pentium G3xxx. A $60 CPU.


Point is the FX is soundly beating the pentium in MT workloads. And many games even, because there are quite a few these days that do not like 2 threads processors. And Vishera is very outdated at this point. Going back to 2012 situation is hardly an improvement really.


----------



## Themisseble

Quote:


> Originally Posted by *Kuivamaa*
> 
> Point is the FX is soundly beating the pentium in MT workloads. And many games even, because there are quite a few these days that do not like 2 threads processors. And Vishera is very outdated at this point. Going back to 2012 situation is hardly an improvement really.


Fx 6300 is destroying i3 is most games...


----------



## SpeedyVT

If this is the repeat argument of whether or not an Intel i3 or Pentium can beat an FX processor the answer is always going to be the same, only in multi-threaded environments. However there is a fair number of single threaded games that we can't get past and so the i3 or the Pentium display their potency. While I would never in my right mind recommend a dual core for the demands of today, I would not recommend a processor as old as the FX series. Perhaps last year I would've, but with things just around the corner with AMD's and Intel's new chips it's seems more promising to hold off.

Also if DX12 does pick up the additional threads rather than raw IPC power will benefit the user.


----------



## EniGma1987

Quote:


> Originally Posted by *Themisseble*
> 
> Looking at some games actually even, if ST is very important piledriver does very well. (comapring oc-ed FX 6300 or FX 4300 or FX 8320 to i3 or i5 non K)


No it does not. Piledriver has significantly worse minimum FPS than the i3+ processors, and on particularly heavy games it shows even more.

Quote:


> Originally Posted by *SpeedyVT*
> 
> While I would never in my right mind recommend a dual core for the demands of today, I would not recommend a processor as old as the FX series. Perhaps last year I would've, but with things just around the corner with AMD's and Intel's new chips it's seems more promising to hold off.


Id definitely recommend a dual core, 4 thread Pentium G4400 for someone doing a really budget build. You cant really beat a 4 thread Skylake for $65 in price to performance right now.


----------



## Themisseble

Quote:


> Originally Posted by *EniGma1987*
> 
> No it does not. Piledriver has significantly worse minimum FPS than the i3+ processors, and on particularly heavy games it shows even more.
> Id definitely recommend a dual core, 4 thread Pentium G4400 for someone doing a really budget build. You cant really beat a 4 thread Skylake for $65 in price to performance right now.


pentium is 2C/2T
- In well threaded games more cores = more stable fps

Please dont look at becnhmarks which are useless.. Like I said in a past I have i7 ivy and FX 6300 but I also tested i3 skylake and i5 haswell. I5 is for gamers and athlon x4 860K is for budget gamers.

I am not making blind statements like you do... I actually did some benchmarks... and in GW2/BF4 and few another games FX 6300 4.5GHz completely destroyed i3 ivy at 3.8GHz

Biggest problems are games with AVX instructions... SC2 or DIABLO... Bulldozer core has good design for INTEGER performance, but FPU is main problem (some FPU calc. could be offloaded to GPU).

Enough of this....

AMD needs to make core that will have strong FPU and strong INTEGER performance... smaller caches size and larger cores. I hope that ZEN will surprise me...


----------



## EniGma1987

Quote:


> Originally Posted by *Themisseble*
> 
> pentium is 2C/2T
> - In well threaded games more cores = more stable fps
> 
> Please dont look at becnhmarks which are useless.. Like I said in a past I have i7 ivy and FX 6300 but I also tested i3 skylake and i5 haswell. I5 is for gamers and athlon x4 860K is for budget gamers.
> 
> I am not making blind statements like you do... I actually did some benchmarks... and in GW2/BF4 and few another games FX 6300 4.5GHz completely destroyed i3 ivy at 3.8GHz


Hmm, coulda sworn it was listed under 4 threads on Newegg earlier today, but now it isnt even available at Newegg. Weird.

Nice how you assume everyone makes blind statements besides you. I too have had an FX6300, 8350, Sandy, Ivy, Haswell, and Skylake and have tested them all. Still have the 8350, 4790K and 6700K.
Maybe you are simply mistaken in your previous posts, since you were specifically saying single threaded games do better on the FX and then you just now listed games that use multithreading...


----------



## Themisseble

Quote:


> Originally Posted by *EniGma1987*
> 
> Hmm, coulda sworn it was listed under 4 threads on Newegg earlier today, but now it isnt even available at Newegg. Weird.
> 
> Nice how you assume everyone makes blind statements besides you. I too have had an FX6300, 8350, Sandy, Ivy, Haswell, and Skylake and have tested them all. Still have the 8350, 4790K and 6700K.
> Maybe you are simply mistaken in your previous posts, since you were specifically saying single threaded games do better on the FX and then you just now listed games that use multithreading...


Who said that? Its the point that is some games that are single threaded OCed FX can match haswell, but some games use different instructions and FX might fall way behind. Just look at benchmarks.

CB vs PovRay
Athlon x4 860K will do just about 15% slower than i5 2500K.... in CB that will never happen.


----------



## hojnikb

Quote:


> Originally Posted by *Themisseble*
> 
> *Who said that? Its the point that is some games that are single threaded OCed FX can match haswell,* but some games use different instructions and FX might fall way behind. Just look at benchmarks.


i bet those games are gpu limited anyway...


----------



## hojnikb

Quote:


> Originally Posted by *Themisseble*
> 
> Fx 6300 is destroying i3 is most games...


Please give us a list of those "most" games. And with a proper lead, not just a fps or two differenc (thats not destroying):


----------



## flopper

Zen me up


----------



## KyadCK

Quote:


> Originally Posted by *EniGma1987*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Themisseble*
> 
> Looking at some games actually even, if ST is very important piledriver does very well. (comapring oc-ed FX 6300 or FX 4300 or FX 8320 to i3 or i5 non K)
> 
> 
> 
> No it does not. Piledriver has significantly worse minimum FPS than the i3+ processors, and on particularly heavy games it shows even more.
> 
> Quote:
> 
> 
> 
> Originally Posted by *SpeedyVT*
> 
> While I would never in my right mind recommend a dual core for the demands of today, I would not recommend a processor as old as the FX series. Perhaps last year I would've, but with things just around the corner with AMD's and Intel's new chips it's seems more promising to hold off.
> 
> Click to expand...
> 
> Id definitely recommend a dual core, 4 thread Pentium G4400 for someone doing a really budget build. You cant really beat a 4 thread Skylake for $65 in price to performance right now.
Click to expand...

No Pentium has HyperThreading. It is 2c/2t.

Celeron = 2c/2t "slow"
Pentium = 2c/2t "fast"
Core i3 = 2c/4t
Core i5 = 4c/4t
Core i7 = 4c/8t
Core i7-E = 4c/8t - 8c/16t

This is true for every desktop CPU. On laptops Intel starts bending the rules.

And sure you can. There are games that simply will not run on dual-cores now. As a result, they are unacceptable for gaming rigs if anyone wants to play said games.


----------



## Redwoodz

Quote:


> Originally Posted by *ebduncan*
> 
> why do people keep mentioning haswell?
> 
> haswell is old news, no one in their right mind would purchase a haswell cpu right now, because skylake is out. So why are you making comparisons to haswell?
> 
> Skylake is on the market now, by the time AMD gets Zen out the door they would be lucky to be competing with just skylake, but likely intel's next chips. Point being AMD needs to perform at the Skylake level, not haswell.
> 
> All I know is I've been itching for a upgrading here lately. Trying to hold off on a Skylake 6700k build, until Zen comes out, but probably just gonna go with the Skylake.


Skylake cpu is $300+ dollars,besides the dual core i3 or 4 core i5. I don't feel the need to pay that much for a desktop processor,nor does probably 98% of the market. AMD does not have to beat a 6700K to be profitable. Not that hard to understand.


----------



## Themisseble

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *hojnikb*
> 
> Please give us a list of those "most" games. And with a proper lead, not just a fps or two differenc (thats not destroying):


I gave it already... I dont want to hijack this topic for FX vs i3. Its been clear that some games will use more threads, different game codes.... But if you look at game engine. So I am not going to do asnmy sort of test again and please do no talk, there where plenty of benchmarks and plenty of conversation.. final conclusion was i7>i5>FX 8/6>i3,oced athlon x4 860K>athlon x4 860K>pentium /OC. Thats it has much higher IPC than AMD, but AMD is not that bad as some people talk around it.



... How much will quad or six ZEN core cost? what do you think? Will AMD unlock dual core with SMT?


----------



## variant

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I gave it already... I dont want to hijack this topic for FX vs i3. Its been clear that some games will use more threads, different game codes.... But if you look at game engine. So I am not going to do asnmy sort of test again and please do no talk, there where plenty of benchmarks and plenty of conversation.. final conclusion was i7>i5>FX 8/6>i3,oced athlon x4 860K>athlon x4 860K>pentium /OC. Thats it has much higher IPC than AMD, but AMD is not that bad as some people talk around it.
> 
> 
> 
> ... How much will quad or six ZEN core cost? what do you think? Will AMD unlock dual core with SMT?


There may not even be a quad or six core Zen. Zen modules are 8 cores from what I understand.


----------



## looncraz

Quote:


> Originally Posted by *Kuivamaa*
> 
> Point is the FX is soundly beating the pentium in MT workloads. And many games even, because there are quite a few these days that do not like 2 threads processors. And Vishera is very outdated at this point. Going back to 2012 situation is hardly an improvement really.


I suggest you look again.

Pentium G3258 vs FX-6350

http://www.anandtech.com/bench/product/1281?vs=1265

The Pentium wins most of the match-ups. And it's only a dual core, 3.2GHz, no turbo, and no hyper-threading. The extra cores on the FX help with some of the newer games, no doubt, and you can get better in certain (but not all) multithreaded workloads.

And the Pentium is HALF the cost and draw half the power. There are times where that six core CPU is twice as fast, but usually that is not the case. If you want an upgrade from an old Core 2 Duo system, AMD offers nothing better for the average user.

We have to compare CPUs, of course, on price. So, let's move up to a ~$125 Intel CPU. The competiiton is the Core i3-3220. 3.3GHz, with Hyper-Threading. The FX-6350 beats it in the same places, but with much less headroom, to the point that the game lead almost certainly vanishes. And you have 30% faster single threaded performance, and only that much worse multi-threaded performance. Most of the time ST > MT.

http://www.anandtech.com/bench/product/1281?vs=677

If you're coming from an overclocked Core 2 Duo:

http://www.anandtech.com/bench/product/1281?vs=54

The FX-6350 doesn't offer anything for single threaded performance... and, usually, you want to notice some improvement all around.

The i3, or even the Pentium, does that nicely:

http://www.anandtech.com/bench/product/677?vs=54


----------



## Themisseble

looncraz never ever ever again compare cpus on anandtech. And stop with madness .. nonsense.


----------



## Robenger

Quote:


> Originally Posted by *Themisseble*
> 
> looncraz never ever ever again compare cpus on anandtech. And stop with madness .. nonsense.


Excellent retort.


----------



## Themisseble

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *looncraz*
> 
> I suggest you look again.
> 
> Pentium G3258 vs FX-6350
> 
> http://www.anandtech.com/bench/product/1281?vs=1265
> 
> The Pentium wins most of the match-ups. And it's only a dual core, 3.2GHz, no turbo, and no hyper-threading. The extra cores on the FX help with some of the newer games, no doubt, and you can get better in certain (but not all) multithreaded workloads.
> 
> And the Pentium is HALF the cost and draw half the power. There are times where that six core CPU is twice as fast, but usually that is not the case. If you want an upgrade from an old Core 2 Duo system, AMD offers nothing better for the average user.
> 
> We have to compare CPUs, of course, on price. So, let's move up to a ~$125 Intel CPU. The competiiton is the Core i3-3220. 3.3GHz, with Hyper-Threading. The FX-6350 beats it in the same places, but with much less headroom, to the point that the game lead almost certainly vanishes. And you have 30% faster single threaded performance, and only that much worse multi-threaded performance. Most of the time ST > MT.
> 
> http://www.anandtech.com/bench/product/1281?vs=677
> 
> If you're coming from an overclocked Core 2 Duo:
> 
> http://www.anandtech.com/bench/product/1281?vs=54
> 
> The FX-6350 doesn't offer anything for single threaded performance... and, usually, you want to notice some improvement all around.
> 
> The i3, or even the Pentium, does that nicely:
> 
> http://www.anandtech.com/bench/product/677?vs=54


We had that video with pentium g3258 where guy was lying about performance... by my experience playing a lot of BF4, witcher 3, also PVZ and GW2 and also WoW I havent see FX slower than i3 in any case.

GW2 and WoW will take best from 3-4 cores and also BF4... mantle mode was weird while OC-ed FX almost matched stock i5 (1-3%) on DX11 Radeon GPU... yet i5 got away by at least 20% OC vs stock. Seriously ... FX frames much more stable.. specially when you use skype, teamspeak or even recording program... Just stop with nonsense. Yes, everybody support INTEL and we know it.
No need to talk such madness... i5 4460 is not that great either but it is better option than any FX for gaming.

Is there any bottleneck... or let me put it this way.. is there any GPU bottleneck??
https://www.youtube.com/watch?v=8CaqBRMFJlQ

and yes OCed FX 8320 will do better than this and even FX 6300 might be little faster.... I always tested empty map.



Please stop with FX vs INTEL.. AMD already comfirmed that they lost the fight with bulldozers. Is that not enough humiliation? Or you wanna talk and show benchmarks which do not show anything but nonsense to show bulldozers even worse? C`mon

Please be happy that we will get DX12 games... that will actually make difference to MP.


----------



## KarathKasun

Quote:


> Originally Posted by *Themisseble*
> 
> pentium is 2C/2T
> - In well threaded games more cores = more stable fps
> 
> Please dont look at becnhmarks which are useless.. Like I said in a past I have i7 ivy and FX 6300 but I also tested i3 skylake and i5 haswell. I5 is for gamers and athlon x4 860K is for budget gamers.
> 
> I am not making blind statements like you do... I actually did some benchmarks... and in GW2/BF4 and few another games FX 6300 4.5GHz completely destroyed i3 ivy at 3.8GHz
> 
> Biggest problems are games with AVX instructions... SC2 or DIABLO... Bulldozer core has good design for INTEGER performance, but FPU is main problem (some FPU calc. could be offloaded to GPU).
> 
> Enough of this....
> 
> AMD needs to make core that will have strong FPU and strong INTEGER performance... smaller caches size and larger cores. I hope that ZEN will surprise me...


Those games do not have AVX instructions, PERIOD.
They are bound to one primary thread and use x87 floating point instructions. This is why they perform terribly on AMD hardware.

FX series is at a %50 deficit per core in integer math, and that gap is even wider when it comes to overall floating point math (especially when using legacy instructions like SC2/ D3).
ARK is another good example of how bad the FX series can be at gaming. Its limited to two threads, AMD CPUs at 5 ghz are having a hard time pushing a solid 60FPS where a sandy bridge CPU at ~4ghz has no problem.

I have both, but the AMD systems have not aged as well as the Intel systems of the same age. At least this time around.


----------



## Themisseble

Quote:


> Originally Posted by *variant*
> 
> There may not even be a quad or six core Zen. Zen modules are 8 cores from what I understand.


well... modules?

CMT and SMT... INTEL cores are sharing L3 and they are not recognized as modules. CMT design is much different than SMT.. mostly CMT cores are smaller... but there is so many other things.

If AMD ZEN core comes with GOOD IPC ( and float and integer on same lvl) .... then dual cores with 4 threads and unlocked could be around 100$...

PS: Yeah FPUs are terrible in bulldozers.... I think AMD had vision to put most FPU tasks to GPU. As we remember Froblins six years ago, hard to code... but AMD is pushed low lvl API... maybe HSA... you never know.


----------



## KarathKasun

They are in "modules" because AMDs design philosophy is to make things easily expandable. They are not modules in the sense that BD had shared components. And last I had seen, ZEN is supposed to come in "modules" of 4 fully independent cores.


----------



## Seraphic

Quote:


> Originally Posted by *Kuivamaa*
> 
> Zen building block is reported to be 4 cores. This means that technically they can easily release a 12C/24T unit by using three blocks. Depends on clocks/thermals and expected ROI really. I think for desktop at least , they will stick with 8/16.


If Intel is rocking 10c/20t, I think they need to at least match that if not beat it by offering a 12c/24t model.


----------



## Kuivamaa

Quote:


> Originally Posted by *looncraz*
> 
> I suggest you look again.
> 
> Pentium G3258 vs FX-6350
> 
> http://www.anandtech.com/bench/product/1281?vs=1265
> 
> The Pentium wins most of the match-ups. And it's only a dual core, 3.2GHz, no turbo, and no hyper-threading. The extra cores on the FX help with some of the newer games, no doubt, and you can get better in certain (but not all) multithreaded workloads.
> 
> And the Pentium is HALF the cost and draw half the power. There are times where that six core CPU is twice as fast, but usually that is not the case. If you want an upgrade from an old Core 2 Duo system, AMD offers nothing better for the average user.
> 
> We have to compare CPUs, of course, on price. So, let's move up to a ~$125 Intel CPU. The competiiton is the Core i3-3220. 3.3GHz, with Hyper-Threading. The FX-6350 beats it in the same places, but with much less headroom, to the point that the game lead almost certainly vanishes. And you have 30% faster single threaded performance, and only that much worse multi-threaded performance. Most of the time ST > MT.
> 
> http://www.anandtech.com/bench/product/1281?vs=677
> 
> If you're coming from an overclocked Core 2 Duo:
> 
> http://www.anandtech.com/bench/product/1281?vs=54
> 
> The FX-6350 doesn't offer anything for single threaded performance... and, usually, you want to notice some improvement all around.
> 
> The i3, or even the Pentium, does that nicely:
> 
> http://www.anandtech.com/bench/product/677?vs=54


Come again? Ι stated that FX-6300 is handily beating haswell pentiums in MT workloads. From your own link I can see the FX performing roughly 100% better in the typical MT stuff, like x264,7zip, CB etc. Where I live the FX is about 30 euros more expensive, not double the price so ,yeah, for this type of things it is a much better solution. It is more comparable to an i3 really, so here we agree but the question was Pentium vs FX and my point stands. As for games, yes these days there are plenty that do not work well at all with 2T processors. Crysis 3, BF4 multiplayer, DA:I, Fallout 4 ,The witcher 3 etc etc etc. Pentiums are a gamble at this point, you never know if they are gonna be able to perform well enough with the next AAA game or not.


----------



## Tojara

Quote:


> Originally Posted by *KarathKasun*
> 
> They are in "modules" because AMDs design philosophy is to make things easily expandable. They are not modules in the sense that BD had shared components. And last I had seen, _ZEN is supposed to come in "modules" of 4 fully independent cores._


This was from one of the supposed "leaked" slides that are not by any means confirmed. Take it with a grain of salt.


----------



## Kuivamaa

Quote:


> Originally Posted by *Tojara*
> 
> This was from one of the supposed "leaked" slides that are not by any means confirmed. Take it with a grain of salt.


No, the info about Zen building blocks being in groups of 4 cores that share only L3 cache (like intel ones more or less) came from the official AMD documentation.


----------



## warpuck

When I bought the 9590 instead of a i5 K It was because the total system price for a i5K or a i7 that would do better cost more. Really 8 FX cores running at 5.0 Ghz has enough Gittix to keep a pair of R9 285s happy. Even if you are are playing a game that starts with the Nvidia logo. It can still do it and run PhysiX for the CPU. intel dropped the price of i5K because AMD is selling a lot of 83XX and 93/9590s.
Yes it does do better on Win 10 when M$ is not updating.
Even does better with Firestrike with the new Beta AMD CCC load.









As for the Zen yes, if it runs DDR 4. I think the one the problems with AMD is the 16 bit memory clogger. You know that part in the BIOS where you chose between 8 and 16 ? A 128 bit path between the system memory and the CPU would speed things up quite a bit. I may go for it if they package a AM3+ ver , maybe.
My mental image of this may be wrong, that appears to be one of the things intel has a big advantage


----------



## Tojara

Quote:


> Originally Posted by *Kuivamaa*
> 
> No, the info about Zen building blocks being in groups of 4 cores that share only L3 cache (like intel ones more or less) came from the official AMD documentation.


I hope you find an official source for it then, since I can't. It certainly wasn't displayed at FAD. The only sources which have it say it's a leaked slide, hardly anything to rely on.


----------



## KarathKasun

Quote:


> Originally Posted by *warpuck*
> 
> When I bought the 9590 instead of a i5 K It was because the total system price for a i5K or a i7 that would do better cost more. Really 8 FX cores running at 5.0 Ghz has enough Gittix to keep a pair of R9 285s happy. Even if you are are playing a game that starts with the Nvidia logo. It can still do it and run PhysiX for the CPU. intel dropped the price of i5K because AMD is selling a lot of 83XX and 93/9590s.
> Yes it does do better on Win 10 when M$ is not updating.
> Even does better with Firestrike with the new Beta AMD CCC load.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> As for the Zen yes, if it runs DDR 4. I think the one the problems with AMD is the 16 bit memory clogger. You know that part in the BIOS where you chose between 8 and 16 ? A 128 bit path between the system memory and the CPU would speed things up quite a bit. I may go for it if they package a AM3+ ver , maybe.
> My mental image of this may be wrong, that appears to be one of the things intel has a big advantage


That 8/16 bit setting is for HyperTransport, not memory. AMD uses 2x64 bit memory controllers like Intel.

HyperTransport is the link to the chipset, which does PCIe duties.


----------



## Faithh

Quote:


> Originally Posted by *Kuivamaa*
> 
> Come again? Ι stated that FX-6300 is handily beating haswell pentiums in MT workloads.


Even in a MT workload, a too slow thread can slow the rest down to the point core count is almost irrelevant. You have to be more specific about a multithreaded workload, multi means 2 or more. The pentium will be mostly better up to 4 threads.

Benchmarks like CB etc are like near perfection multithreaded, they're not a good indication of real world use imo.
Quote:


> Originally Posted by *warpuck*
> 
> When I bought the 9590 instead of a i5 K It was because the total system price for a i5K or a i7 that would do better cost more.


For the price of a board + cooler a 9590 needs, you could get a i5 4430 + a motherboard which is and has been faster for games and most multithreaded workloads.
Quote:


> Originally Posted by *warpuck*
> 
> Really 8 FX cores running at 5.0 Ghz has enough Gittix to keep a pair of R9 285s happy.


8 of them work at 4.7GHz, sadly.
Quote:


> Originally Posted by *warpuck*
> 
> intel dropped the price of i5K because AMD is selling a lot of 83XX and 93/9590s.


They never dropped their prices because AMD has been a ghost to them.


----------



## The Stilt

The slide showing Zen´s core layout is obviously fake. All slides presented in Financial Analyst Day are available online.

So far AMD has revealed only few details about Zen:

- FinFet manufacturing process
- SMT instead of CMT
- 40% IPC improvement over 15h 60-7Fh "Excavator" cores
- 2016 availability (not specified if sampling or retail)


----------



## WhiteCrane

Can't delete my own post.


----------



## hojnikb

Quote:


> Originally Posted by *Themisseble*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> I gave it already... I dont want to hijack this topic for FX vs i3. Its been clear that some games will use more threads, different game codes.... But if you look at game engine. So I am not going to do asnmy sort of test again and please do no talk, there where plenty of benchmarks and plenty of conversation.. final conclusion was i7>i5>FX 8/6>i3,oced athlon x4 860K>athlon x4 860K>pentium /OC. Thats it has much higher IPC than AMD, but AMD is not that bad as some people talk around it.
> 
> 
> 
> ... How much will quad or six ZEN core cost? what do you think? Will AMD unlock dual core with SMT?


That highly depends on how they will perform compared to intel stuff. They ain't gonna give away performance chips for pennies.
Its always been like that.

As for unlocking; there could be different SKUs for that (K or black edition). Just like it is with APUs.


----------



## 0razor1

Quote:


> Originally Posted by *KyadCK*
> 
> No Pentium has HyperThreading. It is 2c/2t.
> 
> Celeron = 2c/2t "slow"
> Pentium = 2c/2t "fast"
> Core i3 = 2c/4t
> *Core i5 = 4c/4t*
> Core i7 = 4c/8t
> Core i7-E = 4c/8t - 8c/16t
> 
> *This is true for every desktop CPU*. On laptops Intel starts bending the rules.
> 
> And sure you can. There are games that simply will not run on dual-cores now. As a result, they are unacceptable for gaming rigs if anyone wants to play said games.


Not so.

We got crappy 4570T's at work. i5 - 2c/4t HT on.


----------



## 2010rig

Quote:


> Originally Posted by *0razor1*
> 
> Not so.
> 
> We got crappy 4570T's at work. i5 - 2c/4t HT on.


4670K, 4670 are 4/4.


----------



## looncraz

What a terrible CPU for the money!

It's basically an underclocked i3-4170, with turbo not even as high as the 4170's base clocks. The only things it offers is +1MB L3 (big whoop) and VT-d (usually also a big whoop). And it costs $70 more...

An AMD Eight-core makes more sense at that point.


----------



## 0razor1

Quote:


> Originally Posted by *2010rig*
> 
> 4670K, 4670 are 4/4.


Well aware







I recently had a 4670k and now a 4790k









I was disgruntled to find out the reason behind my i5 desktop @ work to being abysmal @ Excel spreadsheets- realized, it's not four cores!!! The task manager behaved like this i5 had HT on!

Google'd, and wouldyoubelieveit!
Quote:


> Originally Posted by *looncraz*
> 
> What a terrible CPU for the money!
> 
> It's basically an underclocked i3-4170, with turbo not even as high as the 4170's base clocks. The only things it offers is +1MB L3 (big whoop) and VT-d (usually also a big whoop). And it costs $70 more...
> 
> An AMD Eight-core makes more sense at that point.


Yup, barring enterprises will shop for cheap i5's, and Intel served them well hashes special dual core i5's with HT on so everyone's happy but the guy who has Excel sheets taking forever !!!!








VT-D would make sense if the IT guy really knew what it did. We use cloud products so VT-D is wasted on us.


----------



## 2010rig

For around $20 more they could've gotten 4670's,which come with VT-d support if you guys need it, but it sounds like it's not even being used.


----------



## ZenFX

Design Engineer
AMD
June 2007 - Present (8 years 6 months)

Verification Engineer (2007-2011)
RTL Design Engineer (2012-present)

Successfully implemented Zen(znver1) hybrid branch prediction algorithm. Algorithm works by keeping smith counter to choose between stable quantum bogosort and PigeonRank algorithms.



https://www.linkedin.com/in/steve-havlir-714b337


----------



## MapRef41N93W

Quote:


> Originally Posted by *CataclysmZA*
> 
> I think a lot of people are assuming that AMD will simply price themselves out of competition by keeping some kind of parity with Intel. I'm not sure if they're basing this solely off the value of the brand, or the public's reception of the FX-9000 chips.
> 
> *AMD did the smart thing by pricing Fury X and Nano like they did. There's nothing out there like these products, and they priced them at $650 because they expected consumers to snap them up despite that price tag. What ended up happening in the launch week? All available Fiji-based cards got snapped up within hours. The market supported that $650 price point and so long as AMD continues to sell out on every Fiji chip they make, they're perfectly happy with not dropping the price.
> *
> If Zen costs $1000 for a dual-threaded, eight-core chip that keeps up with the Core i7-5960X, who are we to deny them the right to begin pricing it like that? If the market doesn't support that price point, AMD will notice it in their partner's reported sales figures, and drop the price accordingly in markets where it isn't selling. Tom Peterson, in a PCPer interview, told the audience that this is why they priced the GTX 980 like they did, and why it is sold as a high-end product - consumers were buying it up in droves, so why sell themselves short on a product they're seeing healthy margins on? Intel does the exact same thing with the Xeon family.
> 
> Zen's going to be good, that's what I believe through following all the leaks and hints about the chip's structure. It'll be a strong contender for multi-threaded performance, it has more of a focus on compatibility with what Intel's doing rather than trying to play their own game, and it's going to be the first new CPU architecture from AMD in almost five years. A lot has changed in that time and they're not going to throw away good opportunities now like previous management did in the past. This is going to be exciting stuff to play with.


Nope not true at all, but good job trying to spin a terribly negative AMD situation into a pro AMD situation, you should do PR work. Fury X were "off the shelves" when they released because they didn't send any to the shelves. The Fury X owners club was a literal ghost town for months (and is still a ghost town compared to even mid-range NVIDIA GPU threads on an enthusiast board) after it's release. If you only ship 4 units to Newegg and all of them sell, that doesn't make your product in demand.
Quote:


> Originally Posted by *warpuck*
> 
> . intel dropped the price of i5K because AMD is selling a lot of 83XX and 93/9590s.


Hence their single digit desktop market share that has been in free fall for years. Like the guy quoted above "selling a lot of units" to this guy is apparently selling a few thousand a year.


----------



## SpeedyVT

Quote:


> Originally Posted by *0razor1*
> 
> Well aware
> 
> 
> 
> 
> 
> 
> 
> I recently had a 4670k and now a 4790k
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I was disgruntled to find out the reason behind my i5 desktop @ work to being abysmal @ Excel spreadsheets- realized, it's not four cores!!! The task manager behaved like this i5 had HT on!
> 
> Google'd, and wouldyoubelieveit!
> Yup, barring enterprises will shop for cheap i5's, and Intel served them well hashes special dual core i5's with HT on so everyone's happy but the guy who has Excel sheets taking forever !!!!
> 
> 
> 
> 
> 
> 
> 
> 
> VT-D would make sense if the IT guy really knew what it did. We use cloud products so VT-D is wasted on us.


In your environment if you're doing cloud your best processor is an Athlon 5350 not some expensive i5. You pour the money in a hardened server and powerful server with light and low power terminals that access it. The beauty of the Athlon 5350 is that it's 4 Int Units with 4 FPUs.

Correct me if I'm wrong but HSA with supported applications can accelerate excel documents. LibreOffice.


----------



## Kuivamaa

Quote:


> Originally Posted by *MapRef41N93W*
> 
> Hence their single digit desktop market share that has been in free fall for years. Like the guy quoted above "selling a lot of units" to this guy is apparently selling a few thousand a year.


It is the server and laptop markets (which are actually more important that desktop) that AMD tanked hard. On the desktop they are still selling relatively well (for such an antiquated design), exactly because BD family's greatest shortcomings (efficiency-die size to perf ratio) are less relevant on desktop.


----------



## SpeedyVT

Quote:


> Originally Posted by *Kuivamaa*
> 
> It is the server and laptop markets (which are actually more important that desktop) that AMD tanked hard. On the desktop they are still selling relatively well (for such an antiquated design), exactly because BD family's greatest shortcomings (efficiency-die size to perf ratio) are less relevant on desktop.


Honestly it's the retailer issue. HP, DELL, Toshiba, ETC they don't put out many AMD processors because they're getting kick backs by Intel.

Intel laptop processors are horrible to their desktop comparisons unless you buy the most expensive i7.


----------



## epic1337

Quote:


> Originally Posted by *hojnikb*
> 
> Please give us a list of those "most" games. And with a proper lead, not just a fps or two differenc (thats not destroying):


it actually does have a point though, well threaded games would prefer being on an FX6 instead of an i3.

though as someone had mentioned, piledriver's inferior IPC causes lower FPS than a mere i3 in most scenario.
BUT, the lack of overclockability on an i3 gives FX6 an edge, regardless of the low IPC, once the FX6 is overclocked enough it'll catch up in single-thread performance.
now if we consider that FX6 has 6 total threads, instead of the i3's 4threads, the FX6 will pull an advantage over the i3 in overall usage.
even in games that are mostly single threaded, you'd have to take note that the OS and the API isn't simply stuffed into a single core.
the more threads means they're load-balanced across them, or in other words, less pipeline overhead.

although regarding that, it had made sense with ivy bridge and to an extent the early i3s haswell, the ones that had less than 3.5Ghz base clock.
but skylake i3 on the other hand has a generous base clock on top of it's "a bit" better IPC, which doesn't leave piledriver cores enough room for overclocking to cover.

speaking of "more cores" for "load-balancing", i wonder if ARM has any patent restrictions regarding the big.LITTLE architecture.
from what i can see, if intel were to merge 2 i cores and 2 atom cores, we'd have 4 cores with better throughput than 2cores + hyperthreading.
as for the load-balancing, i'm pretty sure microsoft is smart enough abuse the heck out of it, to offload all the meager tasks on the atom cores, and even put the entire OS in it during idle to conserve even more power.
of course, its not only intel that can apply this strategy, if AMD hadn't been so stuck up with their bulldozer cores, they could've merged bigcats with bulldozer for the same results.


----------



## Quantum Reality

Quote:


> Originally Posted by *Noufel*
> 
> If the price is right AMD could have a hit especialy with an 8 cores 16 threads zen cpu to compete with the i7, and in the end we'll see price drops hopefully


I'm just hoping it won't be another Bulldozer flop.

As I said on another thread somewhere my expectations are pretty modest for AMD's new CPU: if it can trade blows with stock Haswell CPUs without needing an absurdly huge nominal TDP (for example, I noted that AMD's flagship CPU, the FX-9590, is 220W) to be able to match Intel's CPU lines, then it may well be able to force Intel to slash prices, win-win all around.
Quote:


> Originally Posted by *The Stilt*
> 
> *C-Ray V1.1 (Raytracer)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> 1600x1200 with 8 rays per pixel (15360000)
> 
> PD = 187155ms (82071.0pps) - 100.0%
> SR = 184502ms (83251.1pps) - 101.438%
> XV = 170368ms (90157.8pps) - 109.853%
> HW = 116907ms (131386.5pps) - 160.089%
> 
> *Euler3D (CFD)*
> Compiler GCC 5.20 x86-64, CFlags = O3 & static
> NACA0012.097K air foil
> 
> PD = 177.076s (11.2946 IPS) - 100.0%
> SR = 151.380s (13.2102 IPS) - 116.960%
> XV = 135.674s (14.7412 IPS) - 130.515%
> HW = 94.521s (21.1593 IPS) - 187.340%
> 
> *X265 (Encoder)*
> Compiler GCC 5.20 x86-64 / YASM 1.30 (default flags)
> Version 1.7+512
> 
> PD = 225.25s (1.57 fps) - 100.0%
> SR = 213.97s (1.65 fps) - 105.1%
> XV = 204.81s (1.72 fps) - 109.554%
> HW = 117.12s (3.01 fps) - 191.720%
> 
> *Cinebench R15*
> 
> PD = 71pts - 100.0%
> SR = 72pts - 101.408%
> XV = 75pts - 105.634%
> HW = 119pts - 167.606%
> 
> If you quoted the original results, please correct them in your posts.
> 
> Sorry guys


The interesting thing about this is how neatly they line up with the IPC graphs I saw on this thread. In effect then, my hope that Zen will match Haswell might pan out!


----------



## 364901

Quote:


> Originally Posted by *MapRef41N93W*
> 
> Nope not true at all, but good job trying to spin a terribly negative AMD situation into a pro AMD situation, you should do PR work. Fury X were "off the shelves" when they released because they didn't send any to the shelves. The Fury X owners club was a literal ghost town for months (and is still a ghost town compared to even mid-range NVIDIA GPU threads on an enthusiast board) after it's release. If you only ship 4 units to Newegg and all of them sell, that doesn't make your product in demand.


I'm not sure how that's a "terribly negative situation" for AMD. They sell out on Fury and Fury X stock pretty often. If they're selling every chip they can make (and let's be honest here, HBM isn't a piece of cake), and their AIB partners sell every GPU they can make, and both are making a profit off these things, then that's a good product as far as AMD is concerned. They're not sitting with mountains of stock in the channel and they've price it high enough so that it doesn't incur them a huge loss. They learned a lot from the Llano launch and how much stock they had to write off because their brand wasn't big enough to shift APUs on their own.

And specifically for this line; "The Fury X owners club was a literal ghost town for months (and is still a ghost town compared to even mid-range NVIDIA GPU threads on an enthusiast board) after it's release." Don't you think that's a bit obvious? Not everyone is going out to snap up a $650 GPU. Considerably more people are snapping up a $250-$350 by comparison. That's just how the market currently works, and both AMD and NVIDIA make sure that they have more mid-range stock in the channel to support this.

Seriously, the GTX 970 has a 3.45% market share on steam looking at overall GPU numbers, the second-highest after Intel HD 4000. Contrast that with 0.91% for GTX 980 owners. People just aren't buying those high-end cards in droves, and AMD knows this, that's why they're not trying to sell HBM to everyone and their mother.


----------



## warpuck

I don't what the relationship here is. but

http://www.newegg.com/Processors-Desktops/SubCategory/ID-343?Order=REVIEWS

Good indicator for AMD? or total sales ranking ? Unless AMD users are more likely to review than intel? Bet ya the Egg has sold more than the total number of reviews in the past year. How much more? Take note that the 1st 15 CPUs are unlocked.
DIY and corner system build market is much smaller than brand name.
Your corner store builder is not as likely to buy a K model intel or AMD, because who is going waste the time to tune it? Just slap a fixed frequency intel in it and forget about it. Besides it got intel inside.
Of the 180 units in my apartment building. I only know of 10 PCs including those in the leasing office (2). 4 of those were AMD. One of them was a 8 year old AMD 2 core hp box. I replaced that with a 4 core 95 watt Thuban with a conservative OC of 3.7 Ghz
ipads, galaxys, laptops, notebooks and tablets are what the majority have. The only people I know that buy desktops are businesses (and that is because it wont fit in your briefcase and walk out the door) or gamers. Gaming laptop ? Desktop still has the advantage there by price and performance.

I can see the Zen for my next build and selling off the FX-8350 to someone who needs a better box. But what do I know? My last intel build was a Win 2000, Supermicro, Dual 1.3 Ghz P3 with Geforce video. I am not going to spend that much again.


----------



## Cyro999

Quote:


> Originally Posted by *Themisseble*
> 
> Something good from AMD would, be good replacement for my FX 6300, although it serves me well. Looking at some games actually even, if ST is very important piledriver does very well. (comapring oc-ed FX 6300 or FX 4300 or FX 8320 to i3 or i5 non K)
> I think main problem with AMD CPU is balance of FPU and INTEGER performance... specially when game uses instructions which are more based on FPU.. like AVX (SC2).
> 
> https://www.youtube.com/watch?v=MGfyCkH4vpw
> 
> Where is first DX12 (ONLY) MP game? MMO or FPS?... 2016 or 2017 or later?


sc2 is not heavily based on AVX. In fact i accidentally had AVX disabled during benchmarking it once and the FPS didn't even change by a fraction of a percent.


----------



## Kuivamaa

SC2 as Starcraft? Not a chance in hell to be using AVX , it is a DX9 game , adding AVX support would be akin to equipping a 1400cc sedan with semi slick tires. As a matter of fact I don't know a game besides grid 2 that uses AVX, and that was Intel sponsored.


----------



## Themisseble

Yep I made mistake... But someone already explained it. But SC2 was crippled on AMD CPUs.


----------



## Cyro999

Quote:


> Originally Posted by *Themisseble*
> 
> Yep I made mistake... But someone already explained it. But SC2 was crippled on AMD CPUs.


sc2 relies heavily on cache and memory performance as well as ST perf; three of the biggest weaknesses of bulldozer and its derivatives


----------



## ebduncan

Quote:


> Originally Posted by *SpeedyVT*
> 
> In your environment if you're doing cloud your best processor is an Athlon 5350 not some expensive i5. You pour the money in a hardened server and powerful server with light and low power terminals that access it. The beauty of the Athlon 5350 is that it's 4 Int Units with 4 FPUs.
> 
> Correct me if I'm wrong but HSA with supported applications can accelerate excel documents. LibreOffice.


if you're doing a network based vm why even have a workstation? seems like a waste of power to have even a athlon 5350, you just need something to put out a picture, and plug a mouse and keyboard into.

No one uses LibreOffice in a professional environment, everyone uses microsoft office products. It does support HSA acceleration, but again no one uses it.
Quote:


> though as someone had mentioned, piledriver's inferior IPC causes lower FPS than a mere i3 in most scenario.
> BUT, the lack of overclockability on an i3 gives FX6 an edge, regardless of the low IPC, once the FX6 is overclocked enough it'll catch up in single-thread performance.
> now if we consider that FX6 has 6 total threads, instead of the i3's 4threads, the FX6 will pull an advantage over the i3 in overall usage.
> even in games that are mostly single threaded, you'd have to take note that the OS and the API isn't simply stuffed into a single core.
> the more threads means they're load-balanced across them, or in other words, less pipeline overhead.


You cannot clock a FX chip high enough on water or air to achieve the single thread performance of the new I3's. Most games now are well threaded thanks to the new consoles, so in most cases in the newer games the FX6XXX, FX8XXXm, and FX9XXX cpus are actually faster, than the I3's. There are games though there the I3's destroy the FX chips in gaming performance, yes destroy. Load balancing is a load of croc, quad core Intel cpus have been beating AMD 8 core chips for awhile now.


----------



## looncraz

Quote:


> Originally Posted by *ZenFX*
> 
> Design Engineer
> AMD
> June 2007 - Present (8 years 6 months)
> 
> Verification Engineer (2007-2011)
> RTL Design Engineer (2012-present)
> 
> Successfully implemented Zen(znver1) hybrid branch prediction algorithm. Algorithm works by keeping smith counter to choose between stable quantum bogosort and PigeonRank algorithms.
> 
> 
> 
> https://www.linkedin.com/in/steve-havlir-714b337


Quantum bogosort is O, PigeonRank would be of no use, except when you really don't want to create and destroy a massive number of universes.


----------



## Majin SSJ Eric

Quote:


> Originally Posted by *Redwoodz*
> 
> Skylake cpu is $300+ dollars,besides the dual core i3 or 4 core i5. I don't feel the need to pay that much for a desktop processor,nor does probably 98% of the market. AMD does not have to beat a 6700K *to be profitable.* Not that hard to understand.


Certainly they don't have to beat Skylake to be profitable but they desperately need the PR to prove to people that they are at least legitimate competition for Intel. I've said before that Zen really needs to at least match IB or Haswell (with comparable overclocking) to be considered a success in my book. Matching Skylake would be a seismic shift in the perception of the CPU market. For the record I have a lot of optimism for Zen as I believe AMD are swinging for the fences this time around (because they know this is likely their last chance). Keller designing the thing gives me hope at least...


----------



## raghu78

Quote:


> Originally Posted by *Majin SSJ Eric*
> 
> Certainly they don't have to beat Skylake to be profitable but they desperately need the PR to prove to people that they are at least legitimate competition for Intel. I've said before that Zen really needs to at least match IB or Haswell (with comparable overclocking) to be considered a success in my book. Matching Skylake would be a seismic shift in the perception of the CPU market. For the record I have a lot of optimism for Zen as I believe AMD are swinging for the fences this time around (because they know this is likely their last chance). Keller designing the thing gives me hope at least...


Right now the lesser we expect from AMD the better it is. AMD's falling revenues and R&D over the past few years does not give me a lot of hope for Zen. Moreover Zen will also be constrained by the foundry FINFET processes. Here is how I see it

1. Zen IPC = Sandy Bridge Base/Turbo Clocks - 3.0/3.5 Ghz (8C/16T, 95W) Max Clocks - 4.0 - 4.2 Ghz. Mild Success. Will help them regain market share provided pricing is very aggressive . 8C/16T top end SKU at USD 350. Most Likely.
2. Zen IPC = Haswell. Base/Turbo Clocks - 3.0/3.5 Ghz (8C/16T, 95W) Max Clocks - 4.0 - 4.2 Ghz. Decent Success. Will help them regain market share provided pricing is really good. 8C/16T top end SKU at USD 400-450. Less Likely
3. Zen IPC = Skylake. Base/Turbo Clocks - 3.0/3.5 Ghz (8C/16T, 95W) Max Clocks - 4.5 Ghz. Hugely successful. Very Unlikely.


----------



## Iwamotto Tetsuz

Quote:


> Originally Posted by *Noufel*
> 
> If the price is right AMD could have a hit especialy with an 8 cores 16 threads zen cpu to compete with the i7, and in the end we'll see price drops hopefully


I'd say too old for a 16Core now.
They should release a 5GHZ or higher ZEN with 20Cores or more


----------



## PiOfPie

Quote:


> Originally Posted by *Cyro999*
> 
> sc2 relies heavily on cache and memory performance as well as ST perf; three of the biggest weaknesses of bulldozer and its derivatives


This is half or most of it; there is also some weak evidence that SC2 was compiled with ICC, and spoofing the CPU ID in a VM appears to improve performance on AMD processors by about 10%. Sadly, the executable can't be patched using the ICP, and the person testing performance in the VM couldn't get the same performance boost out of spoofing the CPU in Dota 2 (although Dota 2 dropped in 2013, after AMD v. Intel).


----------



## Dapman02

everyone keeps thing that they are focusing on having an Intel IPC, AMDis trying to get the best performance per dollar they can . AMD's core's are fine for gaming even today. for better or worse.


----------



## Robenger

Quote:


> Originally Posted by *PiOfPie*
> 
> This is half or most of it; there is also some weak evidence that SC2 was compiled with ICC, and spoofing the CPU ID in a VM appears to improve performance on AMD processors by about 10%. Sadly, the executable can't be patched using the ICP, and the person testing performance in the VM couldn't get the same performance boost out of spoofing the CPU in Dota 2 (although Dota 2 dropped in 2013, after AMD v. Intel).


Not just SC2, this has also been confirmed in WoW.


----------



## Iwamotto Tetsuz

I think guys are into Benchmark per dollar. More than real wolrd performance per $


----------



## christoph

Quote:


> Originally Posted by *Iwamotto Tetsuz*
> 
> I think guys are into Benchmark per dollar. More than real wolrd performance per $[/quote
> 
> always


----------



## TranquilTempest

Quote:


> Originally Posted by *Iwamotto Tetsuz*
> 
> I think guys are into Benchmark per dollar. More than real wolrd performance per $


The first consideration for CPU is pure perf/$, but once you have a basic outline you look at your bottleneck and consider alternate parts as a proportion of total system cost. Most of the time that means picking a 4 core chip with the best single thread performance the budget allows.


----------



## imran27

Quote:


> Originally Posted by *TranquilTempest*
> 
> The first consideration for CPU is pure perf/$, but once you have a basic outline you look at your bottleneck and consider alternate parts as a proportion of total system cost. Most of the time that means picking a 4 core chip with the best single thread performance the budget allows.


Perf/$/Watt.

Better perf/$ but with a higher power consumption ultimately lowers the perf/$ of your build every month or year. That is where Intel leads, they have worst perf/$ but a pretty low wattage/TDP. As a result [perf/dollar/watt] = [perf/dollar]/watts, overall increases a lot. That quantifies what matters us the most.


----------



## Cyro999

Wattage doesn't matter very much. You can have a Skylake quad core at 4ghz, running 6h a day at 100% load and eating half a kilowatt hour(~83w loaded)

Ran like that for a year, electricity cost is low. @ 15p per kwh, that's £27 of electricity in a year - total for the whole CPU. That's assuming you're at full load for the whole time, idle power is an order of magnitude lower.

The CPU power is a pretty small part of total system power right now. My first, second and third thoughts when people talk about power are of the heat output resulting from it, PSU compatibility etc; not electricity cost.


----------



## imran27

Quote:


> Originally Posted by *Cyro999*
> 
> Wattage doesn't matter very much. You can have a Skylake quad core at 4ghz, running 6h a day at 100% load and eating half a kilowatt hour(~83w loaded)
> 
> Ran like that for a year, electricity cost is low. @ 15p per kwh, that's £27 of electricity in a year - total for the whole CPU. That's assuming you're at full load for the whole time, idle power is an order of magnitude lower.
> 
> The CPU power is a pretty part of total system power right now. My first, second and third thoughts when people talk about power are of the heat output resulting from it, PSU compatibility etc; not electricity cost.


My mistake.

I was not referring to power bill actually. Power consumption matters when we go into overclocking that CPU since it can then OC with a humble cooling setup. Also, if we have a good cooling system then lower power consumption or TDP provides more OC headroom.


----------



## ebduncan

Quote:


> Originally Posted by *Cyro999*
> 
> Wattage doesn't matter very much. You can have a Skylake quad core at 4ghz, running 6h a day at 100% load and eating half a kilowatt hour(~83w loaded)
> 
> Ran like that for a year, electricity cost is low. @ 15p per kwh, that's £27 of electricity in a year - total for the whole CPU. That's assuming you're at full load for the whole time, idle power is an order of magnitude lower.
> 
> The CPU power is a pretty part of total system power right now. My first, second and third thoughts when people talk about power are of the heat output resulting from it, PSU compatibility etc; not electricity cost.


E(kWh/day) = P(W) × t(h/day) / 1000(W/kW) first. Second power does matter more than you give it credit. Start thinking of your home as a complete system. Every 100 watts you add to it means to also have to cool. So yes the cpu can only use say 200 watts at full load, but then your cooling system has to kick on and cool the now warm room. Of course the opposite is true in the winter time.

Power costs vary wildly based on location. You may be blessed with cheap power, but others are not.

Quote:


> Originally Posted by *imran27*
> 
> My mistake.
> 
> I was not referring to power bill actually. Power consumption matters when we go into overclocking that CPU since it can then OC with a humble cooling setup. Also, if we have a good cooling system then lower power consumption or TDP provides more OC headroom.


Just because the power consumption is lower doesn't mean it will overclock better. Yes, I get what you're saying about it being easier to cool. Typically power consumption has no impact on the ability to overclock. It's more about the Fmax of the cpu's transistors.


----------



## KyadCK

Quote:


> Originally Posted by *ebduncan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *Cyro999*
> 
> Wattage doesn't matter very much. You can have a Skylake quad core at 4ghz, running 6h a day at 100% load and eating half a kilowatt hour(~83w loaded)
> 
> Ran like that for a year, electricity cost is low. @ 15p per kwh, that's £27 of electricity in a year - total for the whole CPU. That's assuming you're at full load for the whole time, idle power is an order of magnitude lower.
> 
> The CPU power is a pretty part of total system power right now. My first, second and third thoughts when people talk about power are of the heat output resulting from it, PSU compatibility etc; not electricity cost.
> 
> 
> 
> E(kWh/day) = P(W) × t(h/day) / 1000(W/kW) first. Second power does matter more than you give it credit. Start thinking of your home as a complete system. Every 100 watts you add to it means to also have to cool. So yes the cpu can only use say 200 watts at full load, but then your cooling system has to kick on and cool the now warm room. Of course the opposite is true in the winter time.
> 
> Power costs vary wildly based on location. You may be blessed with cheap power, but others are not.
> 
> Quote:
> 
> 
> 
> Originally Posted by *imran27*
> 
> My mistake.
> 
> I was not referring to power bill actually. Power consumption matters when we go into overclocking that CPU since it can then OC with a humble cooling setup. Also, if we have a good cooling system then lower power consumption or TDP provides more OC headroom.
> 
> Click to expand...
> 
> Just because the power consumption is lower doesn't mean it will overclock better. Yes, I get what you're saying about it being easier to cool. Typically power consumption has no impact on the ability to overclock. It's more about the Fmax of the cpu's transistors.
Click to expand...

\

His electric costs are actually expensive, at least compared to us. At 83w load 4 hours per day, as he says, it would cost him $13.33 USD per year in most of the US ($0.11/KwH) to use his CPU every single day for 4 hours. +50% for AC in hot months and no discount since we mostly use natural gas for heat, which is much cheaper than electric = ~$20USD.

He's paying $40.83 ($0.33 USD/KwH) for the CPU alone, though using Pounds says he's in a place where he really doesn't need AC and may actually welcome the heat.

You can't even use yourself as a comparison, Alabama averages out well below the usual $0.11. Closer to $0.07. In fact, including the CPU-related AC, you probably average out to ~$12-14USD for his scenario.

tl;dr, your attempt to correct him failed since he was right. You failed to take into consideration his conditions.


----------



## ebduncan

Quote:


> Originally Posted by *KyadCK*
> 
> \
> 
> His electric costs are actually expensive, at least compared to us. At 83w load 4 hours per day, as he says, it would cost him $13.33 USD per year in most of the US ($0.11/KwH) to use his CPU every single day for 4 hours. +50% for AC in hot months and no discount since we mostly use natural gas for heat, which is much cheaper than electric = ~$20USD.
> 
> He's paying $40.83 ($0.33 USD/KwH) for the CPU alone, though using Pounds says he's in a place where he really doesn't need AC and may actually welcome the heat.
> 
> You can't even use yourself as a comparison, Alabama averages out well below the usual $0.11. Closer to $0.07. In fact, including the CPU-related AC, you probably average out to ~$12-14USD for his scenario.
> 
> tl;dr, your attempt to correct him failed since he was right. You failed to take into consideration his conditions.


what are you talking about? his electric is cheap, some parts of the world it can be .40-.50 cents per kwh. I made no comparison to his cost vs mine either! I was just stating he vastly underestimated the impact of his cpu. If you want to dig down into it deeper, his system would probably pull 250-300 watts from the wall at full load. Cooling is more like 300%-400% not 50%, This is what i was referring to, until you unexpectedly joined our conversation.


----------



## warpuck

My GPUs use more power than my CPU when playing games, except when idle then CPU uses more. I play games 10-12 hours a week. If I am not using it, I turn it off. Not very expensive when compared to pay per view or purchasing the whatever channel bundle. My internet access for 1 month costs more than the electricity for one year.
Folding is another story, yes I can do 600-700 watts draw that way. There are not many days I need to produce heat that way

Corporations look at this if you got 1000 seats, but for an individual, if that is a factor, you might want to look into solar power.

Usually the person gauging this looking is looking at the power consumed and system purchase price, not how much the person waiting on the CPU to churn out an answer gets paid to wait for it.


----------



## christoph

Quote:


> Originally Posted by *ebduncan*
> 
> what are you talking about? his electric is cheap, some parts of the world it can be .40-.50 cents per kwh. I made no comparison to his cost vs mine either! I was just stating he vastly underestimated the impact of his cpu. If you want to dig down into it deeper, his system would probably pull 250-300 watts from the wall at full load. Cooling is more like 300%-400% not 50%, This is what i was referring to, until you unexpectedly joined our conversation.


what??? hell noooooooooo , in my country is like 0.85 $, so we have to consider every single thing, cheaper is always good


----------



## STEvil

Hot PC? vent heat out window... or enjoy less use of furnace in winter.


----------



## Dom-inator

Quote:


> Originally Posted by *STEvil*
> 
> Hot PC? vent heat out window... or enjoy less use of furnace in winter.


Depends where you live, e.g. in Australia we pretty much have no winter. If I vent the PC out the window in summer, I'll just be letting 40c air come inside!


----------



## warr10r

In South Africa, we both have no winter and are home to expensive power (Thanks for that by the way, President Jacob "I can't read big numbers, heh-heh-hehh" Zuma).

A hot, power-hungry CPU is useless to me.


----------



## epic1337

Quote:


> Originally Posted by *Dom-inator*
> 
> Depends where you live, e.g. in Australia we pretty much have no winter. If I vent the PC out the window in summer, I'll just be letting 40c air come inside!


this exactly, places like Egypt and Africa, even leaving a candle on the window sill for a few minutes would turn it into molten wax.

and people tend to forget that heat dumped into the room needs to be cooled by something, like an air conditioner.
as such, 300W of heat means an additional 300W for your air conditioner.
depending on the AC's efficiency, you could pretty much double electricity cost.

otherwise, you could just simply imagine how a slow-cook oven roasts human meat.


----------



## Cyro999

Quote:


> Second power does matter more than you give it credit. Start thinking of your home as a complete system. Every 100 watts you add to it means to also have to cool. So yes the cpu can only use say 200 watts at full load, but then your cooling system has to kick on and cool the now warm room. Of course the opposite is true in the winter time.


I live in Scotland at the moment so our ambient room temperatures can be held at ~18c or lower probably 95-97% of the time for free (only exception during the sunny hours of some rare days in summer). Nobody here has any kind of AC or system to cool down their buildings, only heating! In this case, your point works against you and almost every watt that goes in is a watt that doesn't have to be spent heating the room. Power's not really cheap here either; it's far more expensive than some highly populated areas of the world.

It varies place to place, there is no problem at all with effects on room temperature here (that could even be considered a positive thing) but that's a big factor in purchasing if you're in australia or africa as mentioned. They also have to buy parts that can be cooled effectively with a much smaller delta-over-ambient temp
Quote:


> and people tend to forget that heat dumped into the room needs to be cooled by something, like an air conditioner.


I don't think they forget that often, it's just that only a fraction of the world actually has air conditioning. I'm not sure how big the fraction is but it's there - and i think most people not part of that will save money sometimes, waste it other times.


----------



## KyadCK

Quote:


> Originally Posted by *ebduncan*
> 
> Quote:
> 
> 
> 
> Originally Posted by *KyadCK*
> 
> \
> 
> His electric costs are actually expensive, at least compared to us. At 83w load 4 hours per day, as he says, it would cost him $13.33 USD per year in most of the US ($0.11/KwH) to use his CPU every single day for 4 hours. +50% for AC in hot months and no discount since we mostly use natural gas for heat, which is much cheaper than electric = ~$20USD.
> 
> He's paying $40.83 ($0.33 USD/KwH) for the CPU alone, though using Pounds says he's in a place where he really doesn't need AC and may actually welcome the heat.
> 
> You can't even use yourself as a comparison, Alabama averages out well below the usual $0.11. Closer to $0.07. In fact, including the CPU-related AC, you probably average out to ~$12-14USD for his scenario.
> 
> tl;dr, your attempt to correct him failed since he was right. You failed to take into consideration his conditions.
> 
> 
> 
> what are you talking about? his electric is cheap, some parts of the world it can be .40-.50 cents per kwh. I made no comparison to his cost vs mine either! I was just stating he vastly underestimated the impact of his cpu. If you want to dig down into it deeper, his system would probably pull 250-300 watts from the wall at full load. Cooling is more like 300%-400% not 50%, This is what i was referring to, until you unexpectedly joined our conversation.
Click to expand...

Quote:


> Originally Posted by *Cyro999*
> 
> Quote:
> 
> 
> 
> Second power does matter more than you give it credit. Start thinking of your home as a complete system. Every 100 watts you add to it means to also have to cool. So yes the cpu can only use say 200 watts at full load, but then your cooling system has to kick on and cool the now warm room. Of course the opposite is true in the winter time.
> 
> 
> 
> I live in Scotland at the moment so our ambient room temperatures can be held at ~18c or lower probably 95-97% of the time for free (only exception during the sunny hours of some rare days in summer). Nobody here has any kind of AC or system to cool down their buildings, only heating! In this case, your point works against you and almost every watt that goes in is a watt that doesn't have to be spent heating the room. Power's not really cheap here either; it's far more expensive than some highly populated areas of the world.
> 
> It varies place to place, there is no problem at all with effects on room temperature here (that could even be considered a positive thing) but that's a big factor in purchasing if you're in australia or africa as mentioned. They also have to buy parts that can be cooled effectively with a much smaller delta-over-ambient temp
> Quote:
> 
> 
> 
> and people tend to forget that heat dumped into the room needs to be cooled by something, like an air conditioner.
> 
> Click to expand...
> 
> I don't think they forget that often, it's just that only a fraction of the world actually has air conditioning. I'm not sure how big the fraction is but it's there - and i think most people not part of that will save money sometimes, waste it other times.
Click to expand...

Oh look, I was right.

And boo hoo, someone joined your conversation. Bring it to PMs.\

It absolutely does not take 500w of AC to compete with just 100w of PC. Ever. The only reason this would be the case is if it's also cooling other things, like a house getting beat on by a hot sun, which is irrelevant to the calculations.


----------



## STEvil

Quote:


> Originally Posted by *Dom-inator*
> 
> Depends where you live, e.g. in Australia we pretty much have no winter. If I vent the PC out the window in summer, I'll just be letting 40c air come inside!


Quote:


> Originally Posted by *epic1337*
> 
> this exactly, places like Egypt and Africa, even leaving a candle on the window sill for a few minutes would turn it into molten wax.
> 
> and people tend to forget that heat dumped into the room needs to be cooled by something, like an air conditioner.
> as such, 300W of heat means an additional 300W for your air conditioner.
> depending on the AC's efficiency, you could pretty much double electricity cost.
> 
> otherwise, you could just simply imagine how a slow-cook oven roasts human meat.


Which is why I said dump it outside. I didnt say open the window to let the heat in.


----------



## Dom-inator

Quote:


> Originally Posted by *STEvil*
> 
> Which is why I said dump it outside. I didnt say open the window to let the heat in.


Yeah with a custom 120mm hole in the window/wall with a flexi pipe running from my PC no thanks lol


----------



## epic1337

Quote:


> Originally Posted by *STEvil*
> 
> Which is why I said dump it outside. I didnt say open the window to let the heat in.


your practice works if you source the air from the outside as well, e.g. a closed loop liquid with the radiator physically outside the house.
otherwise just "dumping" the exhaust outside would cause the entire room to have a negative pressure, slowly pulling the "hot air" from the outside as a result.


----------



## AmericanLoco

Quote:


> Originally Posted by *STEvil*
> 
> Which is why I said dump it outside. I didnt say open the window to let the heat in.


Houses "breathe". Any air you pump outside the house will be pulled in through the various gaps and vents every house is built with. So every cubic foot of air you duct outside from your PC will be matched with a cubic foot of 40*C outside air being pulled into your air conditioned home. The only way to combat that would be to have a convoluted duct system that lets your PC intake air from outside. However you're then feeding your PC with 40*C air...

It's a big enough problem that almost most new oil/gas furnaces in the U.S. have a convoluted duct system that draws fresh combustion air from directly outside vs. ambient air in the house.


----------



## STEvil

Quote:


> Originally Posted by *Dom-inator*
> 
> Yeah with a custom 120mm hole in the window/wall with a flexi pipe running from my PC no thanks lol


That would be one way, yes.
Quote:


> Originally Posted by *epic1337*
> 
> your practice works if you source the air from the outside as well, e.g. a closed loop liquid with the radiator physically outside the house.
> otherwise just "dumping" the exhaust outside would cause the entire room to have a negative pressure, slowly pulling the "hot air" from the outside as a result.


Doesnt necessarily need to be closed, but yes that would be the best way. Some negative air pressure would be ok as you have air circulation through the room to begin with in a properly done house, though to be best done you would probably vent into the venting system of the room to keep with the circulation pattern of the house and reduce your impact on the intake of hot air from outside that the AC would have to deal with.
Quote:


> Originally Posted by *AmericanLoco*
> 
> Houses "breathe". Any air you pump outside the house will be pulled in through the various gaps and vents every house is built with. So every cubic foot of air you duct outside from your PC will be matched with a cubic foot of 40*C outside air being pulled into your air conditioned home. The only way to combat that would be to have a convoluted duct system that lets your PC intake air from outside. However you're then feeding your PC with 40*C air...
> 
> It's a big enough problem that almost most new oil/gas furnaces in the U.S. have a convoluted duct system that draws fresh combustion air from directly outside vs. ambient air in the house.


See above. Also feeding 40c air would be fine depending on your objective.


----------



## KarathKasun

You do know that when you vent the PC exhaust out the window you are still pulling in outside air into the house? Still drives up house temps and costs money, possibly more than not venting to the outside.


----------



## AmericanLoco

Quote:


> Originally Posted by *STEvil*
> 
> That would be one way, yes.
> Doesnt necessarily need to be closed, but yes that would be the best way. Some negative air pressure would be ok as you have air circulation through the room to begin with in a properly done house, though to be best done you would probably vent into the venting system of the room to keep with the circulation pattern of the house and reduce your impact on the intake of hot air from outside that the AC would have to deal with.
> See above. Also feeding 40c air would be fine depending on your objective.


Doesn't matter what the "natural circulation of the house/room" is. Whatever air you pump outside of the house, has to come back in from outside the house.


----------



## STEvil

Quote:


> Originally Posted by *KarathKasun*
> 
> You do know that when you vent the PC exhaust out the window you are still pulling in outside air into the house? Still drives up house temps and costs money, possibly more than not venting to the outside.


Quote:


> Originally Posted by *AmericanLoco*
> 
> Doesn't matter what the "natural circulation of the house/room" is. Whatever air you pump outside of the house, has to come back in from outside the house.


Yes, which is natural circulation. Houses are typically built to induce airflow to take fresh air from outside and replace the air in the house over time. Yes, some positive/negative pressure would influence this to be slightly (negligibly) higher, assuming you only induce what is needed (very little).

If you want more or a better setup you use one of the other options mentioned.


----------



## Cyro999

Quote:


> You do know that when you vent the PC exhaust out the window you are still pulling in outside air into the house? Still drives up house temps


It drops house temperature if the air outside is cooler than inside or keeps it roughly the same if it's similar temp inside and out before having PC running

During summer day that's usually about 60-70f here. During a not-particularly-cold winter day right now, it's 40f.

You're failing to consider the vast majority of the world that lives in a climate quite different than yours (hot, cold, more or less volatile etc)


----------



## AmericanLoco

Quote:


> Originally Posted by *STEvil*
> 
> Yes, which is natural circulation. Houses are typically built to induce airflow to take fresh air from outside and replace the air in the house over time. Yes, some positive/negative pressure would influence this to be slightly (negligibly) higher, assuming you only induce what is needed (very little).
> 
> If you want more or a better setup you use one of the other options mentioned.


The natural circulation of a house is actually generally very low. Adding 100-150 CFM of discharge air can be enough to induce a noticeable draft in a particularly "leaky" room. By venting outside, you're not use the natural circulation already there - you're adding to it.

Anyways, the power usage of a PC at idle is pretty negligible anyways - typically <75 watts. Yeah some monster rigs can push 500-600 watts while gaming, but let's be honest:

1. Unless you're gaming 12 hours a day every day, 300-500 watts of heat output for 0.5-2 hours a few ties a week isn't likely to matter much
2. If you can afford a gaming PC that consumes 500+ watts of power, you can probably afford the air conditioning to cool it.


----------



## KarathKasun

Quote:


> Originally Posted by *Cyro999*
> 
> It drops house temperature if the air outside is cooler than inside or keeps it roughly the same if it's similar temp inside and out before having PC running
> 
> During summer day that's usually about 60-70f here. During a not-particularly-cold winter day right now, it's 40f.
> 
> You're failing to consider the vast majority of the world that lives in a climate quite different than yours (hot, cold, more or less volatile etc)


We get temps into the 80f-90f range all the way into the first of winter here. Most of the world sees at least 80f-90f in the summer during the day. And the average house temperature is in the 70-75 range from what I have seen.

Then you have humidity in coastal areas, where the AC is often times just serving de-humidification duty.
Quote:


> Originally Posted by *STEvil*
> 
> Yes, which is natural circulation. Houses are typically built to induce airflow to take fresh air from outside and replace the air in the house over time. Yes, some positive/negative pressure would influence this to be slightly (negligibly) higher, assuming you only induce what is needed (very little).
> 
> If you want more or a better setup you use one of the other options mentioned.


This is actually not a thing in a well insulated house. The end goal is to have the house very nearly sealed and to have enough air volume that the oxygen concentration isnt a problem. The air handling equipment in most houses with central air does not have mixing valves or the like to pull in new air. They rely on external doorway traffic to do that. Ive done work on residential central air units and have seen the design of many houses, the net internal/external airflow is null (zero effective static pressure/vacuum). When you create negative static pressure in a room you will probably draw air in from the attic, which can be 10 degrees over (or more) outside temperatures.


----------



## Cyro999

Quote:


> We get temps into the 80f-90f range all the way into the first of winter here


But not everybody does
Quote:


> Most of the world sees at least 80f-90f in the summer during the day


That's like 15% of the year if it's even correct as a number

the point is, it really depends. Some people pay over double the cost of the power, some people get it for free. Most people are somewhere in between so it's a bit of an awkward point to bring up


----------



## Iwamotto Tetsuz

Quote:


> Originally Posted by *AmericanLoco*
> 
> The natural circulation of a house is actually generally very low. Adding 100-150 CFM of discharge air can be enough to induce a noticeable draft in a particularly "leaky" room. By venting outside, you're not use the natural circulation already there - you're adding to it.
> 
> Anyways, the power usage of a PC at idle is pretty negligible anyways - typically <75 watts. Yeah some monster rigs can push 500-600 watts while gaming, but let's be honest:
> 
> 1. Unless you're gaming 12 hours a day every day, 300-500 watts of heat output for 0.5-2 hours a few ties a week isn't likely to matter much
> 2. If you can afford a gaming PC that consumes 500+ watts of power, you can probably afford the air conditioning to cool it.


If you can afford a 16 core you porbally wouldn't care about turning on a AC to cool it.


----------



## STEvil

Quote:


> Originally Posted by *KarathKasun*
> 
> We get temps into the 80f-90f range all the way into the first of winter here. Most of the world sees at least 80f-90f in the summer during the day. And the average house temperature is in the 70-75 range from what I have seen.
> 
> Then you have humidity in coastal areas, where the AC is often times just serving de-humidification duty.
> This is actually not a thing in a well insulated house. The end goal is to have the house very nearly sealed and to have enough air volume that the oxygen concentration isnt a problem. The air handling equipment in most houses with central air does not have mixing valves or the like to pull in new air. They rely on external doorway traffic to do that. Ive done work on residential central air units and have seen the design of many houses, the net internal/external airflow is null (zero effective static pressure/vacuum). When you create negative static pressure in a room you will probably draw air in from the attic, which can be 10 degrees over (or more) outside temperatures.


If your attic is hotter and leaking into the house you've probably got some work to do that could reduce your AC use.

That aside, you dont need to use a lot of airflow and could use two fans for a two mode setup to reduce needed airflow during idle times and increase it during high use, if heat is really that much of an issue in your house/area/pc that this would be beneficial to you.


----------



## yawa

So weird to see people on an overclocking enthusiast forum being weirded out about power consumption?

I can't imagine either company producing a chip that satisfies our NEED for overclocking, benching, gaming, and multi-tasking caring too much about power draw. I can't imagine e most of us doing so either.

All I know is that I'm so excited to see this chip leaked into the wild and get a real grasp of IPC improvements. That's the next leak that will have my full attention.


----------



## STEvil

Power use is a concern for some people, not everyone runs 100% load all day every day.


----------



## BulletBait

Quote:


> Originally Posted by *yawa*
> 
> So weird to see people on an overclocking enthusiast forum being weirded out about power consumption


Like going to a car show and worrying about their gas mileage, eh?









Although, there is a legitimate concern associated with heat production. So... you know, there is that as well.


----------



## epic1337

Quote:


> Originally Posted by *BulletBait*
> 
> Like going to a car show and worrying about their gas mileage, eh?


no actually that makes sense, people now a days go on a car show with emphasis on high mileage, see hybrid and electric cars.


----------



## BulletBait

Quote:


> Originally Posted by *epic1337*
> 
> no actually that makes sense, people now a days go on a car show with emphasis on high mileage, see hybrid and electric cars.


I meant high performance cars... I don't see GM, Ford, Dodge, Ferrari, Porsche, ect. trumpeting their gas mileage on their sports models all to often... or at all.


----------



## Dom-inator

Quote:


> Originally Posted by *BulletBait*
> 
> I meant high performance cars... I don't see GM, Ford, Dodge, Ferrari, Porsche, ect. trumpeting their gas mileage on their sports models all to often... or at all.


However, being behind the wheel of powerful car is a lot more exhilarating than getting a good result in a benchmark or being able to render faster. Well I hope everyone feels that way, or else this is a sad, sad place


----------



## BulletBait

Quote:


> Originally Posted by *Dom-inator*
> 
> However, being behind the wheel of powerful car is a lot more exhilarating than getting a good result in a benchmark or being able to render faster. Well I hope everyone feels that way, or else this is a sad, sad place


Different feelings I suppose. Although they do coincide for me sometimes, since I put as much care into getting all I can out of my car and computer both. Being able to squeeze that extra 100MHz or finally inch over 5GHz stable is like squeezing an extra 10HP or finally inching over 600HP on your car. You're not worried about power/gas consumption at that point, it's the thrill of the 'chase' I suppose is a way to put it.


----------



## epic1337

Quote:


> Originally Posted by *BulletBait*
> 
> I meant high performance cars... I don't see GM, Ford, Dodge, Ferrari, Porsche, ect. trumpeting their gas mileage on their sports models all to often... or at all.


maybe none of those manufacturers, but Tesla is getting popular.


----------



## warpuck

When I lived in the desert in Californy, it would get up to 45C in the daytime and 4C at night.
If you had a "swamp cooler" it would keep the inside temps down to 25C in the daytime.
Those don't work well if the humidity was above 50%. Plus you have to keep adding water to them and clean them The fan on it was not all that big.


----------



## chrisjames61

I will be happy when ZEN is released so the AMD forum gets more activity. All the tech site AMD cpu forums are like ghost towns.


----------



## svenge

Quote:


> Originally Posted by *chrisjames61*
> 
> I will be happy when ZEN is released so the AMD forum gets more activity. All the tech site AMD cpu forums are like ghost towns.


That is the natural consequence of AMD not introducing any ostensibly high-performance (read: non-APU) silicon for more than three years to date, a gap which will extend to a full fourth year by the time Zen is out...


----------



## Dom-inator

Still unconfirmed by AMD but a Q4 2016 release date for Zen FX CPU's is rumoured, followed by APUs in 2017

http://wccftech.com/amd-zen-launch-q4-2016/


----------



## epic1337

Quote:


> Originally Posted by *Dom-inator*
> 
> Still unconfirmed by AMD but a Q4 2016 release date for Zen FX CPU's is rumoured, followed by APUs in 2017
> 
> http://wccftech.com/amd-zen-launch-q4-2016/


wa... thats quite far ways off, good luck with AMD competing with kabylake and cannonlake.

and i wanted to see how far their APUs would go though, finally 4~6cores and DDR4 + 1024SP iGP?
if they're delaying their APUs that far though, they'd have a hard time selling their existing APUs even in the mobile segment.
if anyone had noticed, skylake's IGP is nearly on-par with AMD's fastest APUs.

by the way, is there any chance with AMD doing a revision of Carrizo into a 14nm or 10nm node with a better iGP and DDR4?
e.g. 14nm 4core, DDR4, 768:48:16 iGP, 95W TDP.


----------



## svenge

Quote:


> Originally Posted by *epic1337*
> 
> by the way, is there any chance with AMD doing a revision of Carrizo into a 14nm or 10nm node with a better iGP and DDR4?
> e.g. 14nm 4core, DDR4, 768:48:16 iGP, 95W TDP.


Construction core (e.g. Bulldozer, Steamroller, etc.) derivatives are all dead going forwards IIRC.


----------



## epic1337

Quote:


> Originally Posted by *svenge*
> 
> Construction core (e.g. Bulldozer, Steamroller, etc.) derivatives are all dead going forwards IIRC.


yes i know, but that doesn't mean they can't change their minds and tweak it as they wait for Q4 2016, thats a very long wait.


----------



## Dom-inator

Quote:


> Originally Posted by *epic1337*
> 
> wa... thats quite far ways off, good luck with AMD competing with kabylake and cannonlake.


Hopefully they get zen out by Q4. Kaby is scheduled for around the same time, and cannonlake at least a year if not longer after that. I'm optimistic for AMD. If all intel can deliver with kaby is a quad-core with a better igpu, then zen's in for a good chance. Of course there are many other variables involved like how AMD price zen and whether performance is at the level we expect.


----------



## Cursedqt

This thread has been on the news section of OCN for long enough please I would rather see some mod close it rather than see more speculation , thank you


----------



## svenge

Quote:


> Originally Posted by *epic1337*
> 
> yes i know, but that doesn't mean they can't change their minds and tweak it as they wait for Q4 2016, thats a very long wait.


They really don't have the resources to do a die-shrink of Excavator down to 14/16nm, and if AMD could've made a desktop-class (greater than 35w) APU with Excavator using GloFo's 28nm process they would have done so already.

It's not like AMD's not used to spending entire years without anything competitive in the CPU marketplace, so for them 2016 won't be a new experience. It'll be just like 2015 but slightly worse due to Kaby Lake.


----------



## ebduncan

Quote:


> Originally Posted by *epic1337*
> 
> wa... thats quite far ways off, good luck with AMD competing with kabylake and cannonlake.
> 
> and i wanted to see how far their APUs would go though, finally 4~6cores and DDR4 + 1024SP iGP?
> if they're delaying their APUs that far though, they'd have a hard time selling their existing APUs even in the mobile segment.
> if anyone had noticed, skylake's IGP is nearly on-par with AMD's fastest APUs.
> 
> by the way, is there any chance with AMD doing a revision of Carrizo into a 14nm or 10nm node with a better iGP and DDR4?
> e.g. 14nm 4core, DDR4, 768:48:16 iGP, 95W TDP.


I am pretty sure they have a excavator apu based on am4 chip coming. Should be available late 2016. The first APU code named summit ridge, then in 2017 will be replaced by the zen core version.

Quote:


> Originally Posted by *svenge*
> 
> Construction core (e.g. Bulldozer, Steamroller, etc.) derivatives are all dead going forwards IIRC.


pretty sure they have a final apu based on a cat core coming to the first am4 platform.


----------



## epic1337

Quote:


> Originally Posted by *Dom-inator*
> 
> Hopefully they get zen out by Q4. Kaby is scheduled for around the same time, and cannonlake at least a year if not longer after that. I'm optimistic for AMD. If all intel can deliver with kaby is a quad-core with a better igpu, then zen's in for a good chance. Of course there are many other variables involved like how AMD price zen and whether performance is at the level we expect.


if we get optimistic about kabylake it'd be the IMC, DDR4 with much better latency, on top of some cache fine-tunings and a slightly beefier iGP, we'd get performance quite close to broadwell-C.
the combination of that has some effects in CPU performance, more towards ram intensive tasks of course, and iGP gets the best boost out of it.
if such a case were to happen, we can expect iGP-based intel laptops to be quite better than carrizo on both CPU and GPU performance, even the mobile i3s might be worth better.

on the other hand, AMD can just drop price like their usual approach, and suddenly we'd see $300~$500 laptops directly competing against baytrail/cherrytrail based laptops.
then by some stroke of luck, AMD were to have a flawless release and a non-disappointing chip, they'd get a kick-start on their Zen, i just hope they don't price it as if it were the 2nd-coming of FX-9000 series.

Quote:


> Originally Posted by *svenge*
> 
> They really don't have the resources to do a die-shrink of Excavator down to 14/16nm, and if AMD could've made a desktop-class (greater than 35w) APU with Excavator using GloFo's 28nm process they would have done so already.
> 
> It's not like AMD's not used to spending entire years without anything competitive in the CPU marketplace, so for them 2016 won't be a new experience. It'll be just like 2015 but slightly worse due to Kaby Lake.


wait, so they hadn't gotten out of that dilemma? thats quite harmful towards technological progress...

Quote:


> Originally Posted by *ebduncan*
> 
> I am pretty sure they have a excavator apu based on am4 chip coming. Should be available late 2016. The first APU code named summit ridge, then in 2017 will be replaced by the zen core version.


just by using DDR4 on their carrizo chip could at least guarantee some life on that uarch, its not like its slow or anything, its just... old? no, more like unrefined.


----------



## MonarchX

This means its going to be fast, isn't it? It should be, especially when games start using DirectX 12 that utilized multi-threading a lot better.


----------



## delboy67

Quote:


> Originally Posted by *svenge*
> 
> They really don't have the resources to do a die-shrink of Excavator down to 14/16nm, and if AMD could've made a desktop-class (greater than 35w) APU with Excavator using GloFo's 28nm process they would have done so already.
> 
> It's not like AMD's not used to spending entire years without anything competitive in the CPU marketplace, so for them 2016 won't be a new experience. It'll be just like 2015 but slightly worse due to Kaby Lake.


Its called bristol ridge and iirc mid 2016 on am4, unless its been canceled?


----------



## epic1337

Quote:


> Originally Posted by *delboy67*
> 
> Its called bristol ridge and iirc mid 2016 on am4, unless its been canceled?


i somewhat recall that, but haven't heard of it for such a long time i entirely forgot about it.


----------



## Themisseble

some say it will be on excavator cores?.. but i think that this will be on ZEN cores 28nm for budget ...


----------



## warpuck

I have not tried watching a Blu-ray on my 17" 5750M APU note book to see if the movie finished to see if the battery dies 1st. It works OK and did not cost that much for a 17" I don't think one that was stored on a SDHC flash would be much problem. I do find the touch pad to be annoying. hp Pavillion

This is where AMD needs to get, notebooks, tablets portable devices and efficient desktops for all those cubicles. Most DYI builders know how to get the most for their bucks. But we are the side show. AMD needs to do some thing good enough not to be poop-pooded as a feature main attraction in ready rolled market


----------



## ZenFX

Expertise in Synthesis, Physical Design, Timing Closure, Formal Equivalence and Power estimation of High Speed X86 processors and ASICs. Currently working in physical design for scheduler block in CPU team. This CPU is for the next generation X86 AMD processor on 14nm.

https://www.linkedin.com/in/chakradhar-tallury-aa310323


----------



## looncraz

Quote:


> Originally Posted by *ZenFX*
> 
> Expertise in Synthesis, Physical Design, Timing Closure, Formal Equivalence and Power estimation of High Speed X86 processors and ASICs. Currently working in physical design for scheduler block in CPU team. This CPU is for the next generation X86 AMD processor on 14nm.
> 
> https://www.linkedin.com/in/chakradhar-tallury-aa310323


Well, Zen's obvious greatest challenge would come from scheduling, so I'd suspect that most future enhancements would center around scheduling and queuing, so this fits well. Also, isn't this the second time an alleged AMD employee has stated 14nm? ;-)


----------



## Quantum Reality

Quote:


> Originally Posted by *looncraz*
> 
> Quote:
> 
> 
> 
> Originally Posted by *ZenFX*
> 
> Expertise in Synthesis, Physical Design, Timing Closure, Formal Equivalence and Power estimation of High Speed X86 processors and ASICs. Currently working in physical design for scheduler block in CPU team. This CPU is for the next generation X86 AMD processor on 14nm.
> 
> https://www.linkedin.com/in/chakradhar-tallury-aa310323
> 
> 
> 
> Well, Zen's obvious greatest challenge would come from scheduling, so I'd suspect that most future enhancements would center around scheduling and queuing, so this fits well. Also, isn't this the second time an alleged AMD employee has stated 14m? ;-)
Click to expand...

I hope you mean 14 nm 'cause 14 m would be one huge piece of silicon


----------



## looncraz

Quote:


> Originally Posted by *Quantum Reality*
> 
> I hope you mean 14 nm 'cause 14 m would be one huge piece of silicon


LOL! Yeah, that would be impressive! We'd need a heck of a crew


----------



## ZenFX

Architecture Simulators
October 2014

Developed the following architecture simulators in C/C++ and used them to measure and analyze the performance for various configurations in each context

A Cache simulator consisting of a L1 cache augmented with a victim cache and a L2 cache

Extended the above simulator to support Symmetric Multiprocessor (SMP) systems by implementing bus based cache coherence protocols (MSI,MESI and Dragon)

A hybrid (bimodal + gshare) branch predictor simulator and supported it with a Brach Target Buffer

A dynamic instruction scheduling simulator using the Thomasulo algorithm for superscalar pipelinesless

https://www.linkedin.com/in/pradan



http://web.mit.edu/~bdaya/www/Optimization%20Paper.pdf



Worked with the RTL, verification, and DFT teams to define and build a verified, testable design that meets the unit timing, power, IPC, and area constraints for the Dispatch block in x86 based 14 nm processor. (16 months)

https://www.linkedin.com/profile/view?id=AAEAAAHRAdUBM0Dk9qUuCQehtmDgwH_hoBiwOo8&authType=name&authToken=JFFX&trk=prof-sb-browse_map-name

Spotlight Award
AMD
June 2015

For significant contributions to solve the timing problems in EX for project Zen .

https://www.linkedin.com/profile/view?id=AAEAAAK9UGQBb5miH4qMcrGN7hqW5vZr4P3Y-Z8&authType=name&authToken=o5qS&trk=prof-sb-browse_map-name


----------



## rtikphox

Zen looks pretty much doomed. They had such a good pedigree too. First Athlon, then 64-bit cpu, heck even the APU lineup. Only way I can see them survive is making a 1090T have a baby with a Fury X and ddr4 ramdisk all in 1 chip APU. Imagine ddr4 timings of 1-2-2-4 instead of 15-15-15-35.


----------



## epic1337

Quote:


> Originally Posted by *rtikphox*
> 
> Zen looks pretty much doomed. They had such a good pedigree too. First Athlon, then 64-bit cpu, heck even the APU lineup. Only way I can see them survive is making a 1090T have a baby with a Fury X and ddr4 ramdisk all in 1 chip APU. Imagine ddr4 timings of 1-2-2-4 instead of 15-15-15-35.


doomed by what standard? its not like everyone needs a super-fast CPU, by your logic ARM is doomed as well.

for all we care, they could just abandon the desktop altogether, and develop a chip to obliterate intel's atom in all fields.
that market is literally magnitudes bigger than the desktop market, i'm talking about laptops, tablets, phones, consoles and embedded devices all combined.


----------



## Kuivamaa

If zen delivers the goodies (~Haswell level of perf) and intel is indeed bringing HEDT decacores with broadwell-E then enthusiast market will start to look pretty well. Intel will probably still offer the absolute best performance and demand the usual premium but AMD could respond with ST that is very very close to intel one, offer a couple more cores at the same price and pair this with cheaper platofrm overall (X99 boards are expensive). Right now they are selling their octocores at 150-250 a piece, managing to double this and sell more of those even would be huge boon for AMD. As long as they can achieve this performance an execute in time.


----------



## Hueristic

Quote:


> Originally Posted by *looncraz*
> 
> LOL! Yeah, that would be impressive! We'd need a heck of a crew












Not quite this size but only 15k taking up some real-state.


----------



## ZenFX

https://www.linkedin.com/profile/view?id=AAEAAACXFvQBqktWYQxFaq5EFCejPKjbW1dgEPE&authType=name&authToken=F6RM&trk=prof-sb-browse_map-name

I am a tech lead designing hybrid main memories at AMD research. I have several years of experience as a research scientist specializing in emerging computer architecture, system software and technologies for high performance machines. Prior to AMD I worked as a senior computational scientist in the performance modeling lab at the San Diego Supercomputing Center.

My areas of focus has been in :
* Design of hardware and runtime systems using non-volatile memories and stacked memories
* Performance tuning of systems and software for hardware multi-threaded cpu processors
* Performance modeling and tuning of high-end applications on emerging architectures
* Tools development for performance modeling, workload characterization, simulation and instrumentation


----------



## Quantum Reality

The way ZenFX writes I keep wondering if they're practicing their resume writing on OCN


----------



## ZenFX

Working in Core CPU Circuit Design Team

-Currently working on designing AVFS version of 2way Read, 1way Write L1 cache macro for next generation x86 Microprocessor , which helps the AMD processors in self monitoring and adapting voltage and frequency to achieve performance and power efficiency across varying operating conditions.

-Designing clock macros for Core of CPU
- Hspice simulations for Insertion delay and skew measurement of L2 clock macro against core clock macro

https://www.linkedin.com/profile/view?id=AAEAAASGaFkBh1XMGE_rN1DPuwNiaEItf90RUPA&authType=name&authToken=vKVb&trk=prof-sb-browse_map-name

Currently working in RTL/Micro Architecture team developing the next-generation high-performance SOC interconnect fabric (memory subsystem).

RTL-level performance evaluation and debug for critical metrics like interconnect Bandwidth and Latency

Developing micro-benchmarks and related measurement infrastructure, and identifying performance enhancements and eliminating performance bottlenecks.

Working closely with RTL designers on enhancing the micro-architecture and also closely interacting with Physical designers on the Area/Power implications of the enhancements

Developing custom tools for visualization and debug of performance issues

https://www.linkedin.com/profile/view?id=AAEAAAOr3m8BTHUdJApEGQNVyCJSu19cU7GgQ1M&authType=name&authToken=0CTn&offset=2&trk=prof-sb-pdm-similar-name

A member of circuit technology (CT) verification and design-for-test (DFT) team.
Run Automatic Test Pattern Generation (ATPG) tool on Analog-Mixed-Signal (AMS) high-speed IPs;
IPs include MIPI CSI (camera serial interface), PCIe2/3, Global Memory Interface (GMI, AMD internal IP), DFT gasket, etc;
Synthesize RTL netlist, insert scan logic to gate level netlist, model analog portion of macros for ATPG purposes;
Handle multiple clock domains, generate and deliver test patterns to SoC;
Test in boundary scan, DC scan, AC scan, EDT compression mode, single chain mode and flyover mode;
Work closely with design team to debug design flaw related to stuck-at/ transition scan failures;
GF20nm/TSMC16nm/GF14nm technology.

https://www.linkedin.com/profile/view?id=AAEAAAzL85cBjoagtPbVtBC_TCGSI145HIRMVpo&authType=name&authToken=jxo9&trk=prof-sb-browse_map-name


----------



## ku4eto

Quote:


> Originally Posted by *Quantum Reality*
> 
> The way ZenFX writes I keep wondering if they're practicing their resume writing on OCN


Well, people who work in AMD are posting what exactly they are working on Zen. ZenFX is just copy/pasting (and searching) this. It is not bad, it actually gives us insight on what we can expect if we have enough knowledge what the previous gens had.


----------



## WhiteCrane

So... Does it even look competitive? Is anyone hopeful?


----------



## PiOfPie

Quote:


> Originally Posted by *WhiteCrane*
> 
> So... Does it even look competitive? Is anyone hopeful?


Define "competitive." If it has Ivy or Haswell levels of IPS and is priced 30-50 dollars cheaper than an Intel with the same number of cores, AMD will be in a good price/performance spot again like during the Phenom II days. Matching or exceeding Intel's current offerings in terms of raw performance? Almost certainly not going to happen.

On paper, at the same clocks, it looks like Zen should be able compete with Haswell in 128-bit floating point (good news for gamers), but will probably fall behind in anything that uses 256-bit AVX, because Zen is 2x128 bit and looks like it needs to fuse its pipes for an AVX instruction, while Haswell is 2x256. Fortunately, AVX is mostly relegated to HPC applications.

The big variables are:
1) Will the fab responsible for production of Zen be able to deliver good silicon? (This was about half of Bulldozer's problem.)
2) How much has AMD improved their cache latencies? (They have been working on this since the first revision of Steamroller and its dynamically-resizing L2.)
3) How much has AMD improved their IMC? (One of the things Jim Keller is said to enjoy doing is tuning IMCs, and AMD is moving to a new socket, so they won't be as hand-tied.)


----------



## WhiteCrane

Thanks for breaking it down for me. In your opinion, is AMD a full generation behind Intel? That's what it looks like to me if they're targeting Haswell performance levels. If they can match Haswell for less money, I'll buy that and I am sure many budget gamers will do the same.


----------



## looncraz

Quote:


> Originally Posted by *WhiteCrane*
> 
> Thanks for breaking it down for me. In your opinion, is AMD a full generation behind Intel? That's what it looks like to me if they're targeting Haswell performance levels. If they can match Haswell for less money, I'll buy that and I am sure many budget gamers will do the same.


AMD is currently five generations behind Intel, but that's just because Intel hasn't really pushed the performance envelope much.

However, that's not taking the entire ecosystem into account, in that regard, they are not really behind anywhere other than price/performance, power efficiency (and they aren't THAT far behind any more), and single threaded performance. Their chipsets (on FM2+) are updated, capable, and mature. The situations looks way worse if you just compare the desktop platform, though, where the offerings are generations behind AMD's own tech. Desktop APUs are a generation behind AMD's own tech as well and they don't have DDR4 support yet.

This coming year will hopefully see all of this remedied.


----------



## BiG StroOnZ

Quote:


> Originally Posted by *ZenFX*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> https://www.linkedin.com/profile/view?id=AAEAAACXFvQBqktWYQxFaq5EFCejPKjbW1dgEPE&authType=name&authToken=F6RM&trk=prof-sb-browse_map-name
> 
> I am a tech lead designing hybrid main memories at AMD research. I have several years of experience as a research scientist specializing in emerging computer architecture, system software and technologies for high performance machines. Prior to AMD I worked as a senior computational scientist in the performance modeling lab at the San Diego Supercomputing Center.
> 
> My areas of focus has been in :
> * Design of hardware and runtime systems using non-volatile memories and stacked memories
> * Performance tuning of systems and software for hardware multi-threaded cpu processors
> * Performance modeling and tuning of high-end applications on emerging architectures
> * Tools development for performance modeling, workload characterization, simulation and instrumentation


Is this relevant to the OP?


----------



## ebduncan

I think most folks in this thread will be surprised by Zen's performance if they expect Haswell level performance. You have to figure these chips won't be available until late 2016. This is around the same time as Kaby Lake will make it to market. I think the Zen core will be closer to skylake than haswell, while getting beat out by Kaby Lake.

I suspect a lot will depend on what sort of clock speeds AMD will be able to achieve with the 14nm node.


----------



## STEvil

15-40% performance boost for an A10-5800K if it used pure HBM... hmmm.


----------



## Cyro999

Quote:


> I suspect a lot will depend on what sort of clock speeds AMD will be able to achieve with the 14nm node.


4.2 vs 4.6ghz is less than 10% perf change, not much compared to the performance gap


----------



## The Stilt

Quote:


> Originally Posted by *ebduncan*
> 
> I think most folks in this thread will be surprised by Zen's performance if they expect Haswell level performance. You have to figure these chips won't be available until late 2016. This is around the same time as Kaby Lake will make it to market. I think the Zen core will be closer to skylake than haswell, while getting beat out by Kaby Lake.
> 
> I suspect a lot will depend on what sort of clock speeds AMD will be able to achieve with the 14nm node.


In that case Zen would have over 80% higher IPC than Excavator. AMD has stated 40%. Given the accuracy of their recent statements, I wouldn´t be surprised if the stated 40% is already slightly optimistic...


----------



## Themisseble

THE STILT

How much higher IPC that skylake have over sandy bridge?

What exactly is IPC? Dont tell me instructions per clock...


----------



## The Stilt

Quote:


> Originally Posted by *Themisseble*
> 
> THE STILT
> 
> How much higher IPC that skylake have over sandy bridge?
> 
> What exactly is IPC? Dont tell me instructions per clock...


Skylake has ~ 25% IPC advantage over Sandy Bridge, in FP scenarios where Skylake cannot utilize the newer instructions (AVX2). In workloads which support AVX2 (e.g X265) Skylake gains around 25% additional advantage over Sandy Bridge.

In Cinebench R15 (ST) for example:

Excavator = 24.705 points per GHz (100.00%)
Sandy Bridge = 34.737 points per GHz (140.60%)
Skylake = 43.333 points per GHz (175.40%)

Only 40% IPC improvement over Excavator in Zen is a disaster, since the frequencies are expected to drop significantly compared to current desktop line-up (Excavator isn´t THAT drastic improvement over Piledriver). If AMD had announced 40% improved single thread *performance* over Excavator, then everything would be great.


----------



## looncraz

Quote:


> Originally Posted by *ebduncan*
> 
> I think most folks in this thread will be surprised by Zen's performance if they expect Haswell level performance. You have to figure these chips won't be available until late 2016. This is around the same time as Kaby Lake will make it to market. I think the Zen core will be closer to skylake than haswell, while getting beat out by Kaby Lake.
> 
> I suspect a lot will depend on what sort of clock speeds AMD will be able to achieve with the 14nm node.


Haswell has almost exactly 40% higher IPC (in terms of average single threaded performance per clock) than Excavator.

Zen is suppose to have 40% higher IPC than Excavator.

Do the math.

Of course, Zen is 40% faster by design, which supposedly doesn't include the process improvements... so we might get some added bonus there.


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> In that case Zen would have over 80% higher IPC than Excavator. AMD has stated 40%. Given the accuracy of their recent statements, I wouldn´t be surprised if the stated 40% is already slightly optimistic...


No, it wouldn't. 40% over Excavator is Haswell IPC almost on the nose. Another 15% gets you to Skylake, but I doubt AMD will extract 15% from the process alone (probably more like 5%).

However, like you, I would not be surprised if the 40% was a little optimistic. However, the design lends itself very well to even providing more than Intel's latest, except with FPU multiplication.


----------



## looncraz

Quote:


> Originally Posted by *Themisseble*
> 
> THE STILT
> 
> How much higher IPC that skylake have over sandy bridge?
> 
> What exactly is IPC? Dont tell me instructions per clock...




Excavator's IPC is almost a dead-even tie with Penryn, to give you an idea of where AMD currently stands.


----------



## Themisseble

Quote:


> Originally Posted by *The Stilt*
> 
> Skylake has ~ 25% IPC advantage over Sandy Bridge, in FP scenarios where Skylake cannot utilize the newer instructions (AVX2). In workloads which support AVX2 (e.g X265) Skylake gains around 25% additional advantage over Sandy Bridge.
> 
> In Cinebench R15 (ST) for example:
> 
> Excavator = 24.705 points per GHz (100.00%)
> Sandy Bridge = 34.737 points per GHz (140.60%)
> Skylake = 43.333 points per GHz (175.40%)
> 
> Only 40% IPC improvement over Excavator in Zen is a disaster, since the frequencies are expected to drop significantly compared to current desktop line-up (Excavator isn´t THAT drastic improvement over Piledriver). If AMD had announced 40% improved single thread *performance* over Excavator, then everything would be great.


wrong answer.
What is IPC?

Skylake has much better FPU but .... is it really that better or is just better compiler?


----------



## The Stilt

Quote:


> Originally Posted by *Themisseble*
> 
> wrong answer.
> 
> Skylake has much better FPU but .... is it really that better or is just better compiler?


Right


----------



## Themisseble

Quote:


> Originally Posted by *The Stilt*
> 
> Right


I have asked you what is IPC...

AMD said 40% higher IPC... does that mean more focus on FPU or INTEGER? what does actually means?

40% better performance in every ST application?


----------



## The Stilt

Quote:


> Originally Posted by *Themisseble*
> 
> I have asked you what is IPC...
> 
> AMD said 40% higher IPC... does that mean more focus on FPU or INTEGER? what does actually means?
> 
> 40% better performance in every ST application?


Instructions per clock, as always.
Or do you claim AMD suddenly changed it´s meaning with Zen? AMD used the IPC metric for Excavator too prior it´s release, and their statement was somewhat accurate.

Is your last name Fruehe by any chance?


----------



## Themisseble

Quote:


> Originally Posted by *The Stilt*
> 
> Instructions per clock, as always.
> Or do you claim AMD suddenly changed it´s meaning with Zen? AMD used the IPC metric for Excavator too prior it´s release, and their statement was somewhat accurate.
> 
> Is your last name Fruehe by any chance?


No its not.

Actually, I respect your work on OC.net

I know IPC = is Instruction per clock.... what to expect? better FPU or better INTEGER or both? and by how much?

So let me say if AMD put 2x faster FPU... and only improves integer by 20%? does that mean 40% better IPC?


----------



## looncraz

Quote:


> Originally Posted by *Themisseble*
> 
> THE STILT
> 
> How much higher IPC that skylake have over sandy bridge?
> 
> What exactly is IPC? Dont tell me instructions per clock...


IPC, in its purist form, IS how many instructions can be performed in a clock cycle.

Without significant insider knowledge, however, we can't directly equate that to real world performance. There are, in fact, times where a 10% IPC increase can lead to a 20% performance increase, it all depends on which instructions were improved and how important they were for the logic flow. Generally speaking, a 10% IPC bump is a 9~11% single threaded performance increase.

When talking about one CPU versus another in a generic performance sense, we have to use AVERAGE IPC. That includes *ALL* instructions and workload types that are comparable between the CPUs. However, the only way we can figure out avgIPC is by a multibench mean - or observed performance over a wide range of benchmarks, per thread, and per clock.

My numbers are a multibench mean from one CPU generation to the next. My numbers match up VERY well with Intel's and every review, showing both that Intel's claims and my methodology are valid and compatible. I use the same methodology to determine that Zen should have Haswell-level performance, per clock, per thread.

I'm also working on an architecture simulator on the side to model the instruction flow on Haswell vs Zen and have broken down their instruction assignments in that effort.

http://looncraz.net/ZenAssignments.html
http://looncraz.net/HswAssignments.htm

At first glance, Zen has quite a few advantages. Sadly, it will almost certainly be hindered by a slower cache system than Haswell's.


----------



## Themisseble

Its hard to predict anything... at least its not hyped.


----------



## EniGma1987

Quote:


> Originally Posted by *svenge*
> 
> Construction core (e.g. Bulldozer, Steamroller, etc.) derivatives are all dead going forwards IIRC.


Quote:


> Originally Posted by *svenge*
> 
> They really don't have the resources to do a die-shrink of Excavator down to 14/16nm


AMD themselves said they are porting Excavator to 14nm for release first before Zen. Just FYI.
But ya, AMD isnt doing a whole new arch based on those cores, just a shrink.


----------



## Themisseble

Quote:


> Originally Posted by *EniGma1987*
> 
> AMD themselves said they are porting Excavator to 14nm for release first before Zen. Just FYI.
> But ya, AMD isnt doing a whole new arch based on those cores, just a shrink.


Yeah but for desktop?


----------



## EniGma1987

Quote:


> Originally Posted by *Themisseble*
> 
> Yeah but for desktop?


Who cares? The argument was that there would be nothing more from the Cat cores at all and AMD said otherwise. Whether it is on desktop or mobile no one but AMD knows. If it does make it to desktop it isn't like anyone here will want to buy it anyway with Zen coming out 6 months later according to AMD.


----------



## Themisseble

Anyway I would love to see 28nm ZEN cores for budget builders...


----------



## The Stilt

Quote:


> Originally Posted by *EniGma1987*
> 
> AMD themselves said they are porting Excavator to 14nm for release first before Zen. Just FYI.
> But ya, AMD isnt doing a whole new arch based on those cores, just a shrink.


Source for this? AFAIK Bristol Ridge and Stoney Ridge are still 28nm HPP from GF. Stoney Ridge is the last 15h design from AMD, while Bristol is identical to Carrizo physically. Cannot quite believe AMD would keep the 15h family alive after Zen has emerged.


----------



## Cyro999

Quote:


> Haswell has almost exactly 40% higher IPC (in terms of average single threaded performance per clock) than Excavator.


For x264, Haswell has ~70% over Piledriver and skylake has ~90% - is the piledriver to excavator gap that big?


----------



## Tivan

Quote:


> Originally Posted by *Cyro999*
> 
> For x264, Haswell has ~70% over Piledriver and skylake has ~90% - is the piledriver to excavator gap that big?


Not sure, but a fair bit of the x264 performance is due to x264 utilizing most available useful instruction sets. Piledriver is still on avx1 like sandy/ivy were.


----------



## Cyro999

Quote:


> Originally Posted by *Tivan*
> 
> Not sure, but a fair bit of the x264 performance is due to x264 utilizing most available useful instruction sets. Piledriver is still on avx1 like sandy/ivy were.


A notable amount, but only a ~5% boost from the comment i saw a while ago from devs


----------



## EniGma1987

Quote:


> Originally Posted by *The Stilt*
> 
> Source for this? AFAIK Bristol Ridge and Stoney Ridge are still 28nm HPP from GF. Stoney Ridge is the last 15h design from AMD, while Bristol is identical to Carrizo physically. Cannot quite believe AMD would keep the 15h family alive after Zen has emerged.


Maybe I mistook what they said about Excavator on AM4. Could be that it is still 28nm now but I swear during the webcast I thought I heard one of the people say Excavator was going 14nm so AMD could gain some better experience on the node before Zen. Either way I dont really care, but it would be nice to see what the new node could clock like in comparison to 28nm on the same arch


----------



## looncraz

Quote:


> Originally Posted by *Cyro999*
> 
> For x264, Haswell has ~70% over Piledriver and skylake has ~90% - is the piledriver to excavator gap that big?


My numbers have Excavator at an average of about 28% over Bulldozer. Zen should, then, be about 80% faster than Bulldozer. On average. In x264, my numbers put Zen with 95% of Haswell's performance at the high end. However, we don't know the performance profile of the core, though I am doing a workup (core simulations) to see if I can quantify the most likely areas of improvement based on known AMD latencies and Zen's pipeline assignments.

Zen should be dramatically better with branch-heavy code compared to Excavator, for example, but will only be better in division or multiplication when certain instruction mixes are at play (it should never be worse, though). Zen should, by most measures, match Haswell at worst in integer, but should outperform it with LEA, shift, rotate, and division-heavy code. I see no areas where Zen is lacking in integer performance versus Haswell. Floating point is more tricky, and I have a feeling that multiplication heavy floating point code will favor Haswell, but everything else will be close, or even favor Zen (fdiv may be a notable benefit of Zen over Haswell).

All this just means that it's necessary to profile the instructions in the x264 benchmark to determine how Zen may actually behave with it. From what I've seen of other media de/encoder code, I'd expect it to be favoring Haswell still, though the control logic may run better on Zen.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> My numbers have Excavator at an average of about 28% over Bulldozer. Zen should, then, be about 80% faster than Bulldozer. On average. In x264, my numbers put Zen with 95% of Haswell's performance at the high end. However, we don't know the performance profile of the core, though I am doing a workup (core simulations) to see if I can quantify the most likely areas of improvement based on known AMD latencies and Zen's pipeline assignments.
> 
> Zen should be dramatically better with branch-heavy code compared to Excavator, for example, but will only be better in division or multiplication when certain instruction mixes are at play (it should never be worse, though). Zen should, by most measures, match Haswell at worst in integer, but should outperform it with LEA, shift, rotate, and division-heavy code. I see no areas where Zen is lacking in integer performance versus Haswell. Floating point is more tricky, and I have a feeling that multiplication heavy floating point code will favor Haswell, but everything else will be close, or even favor Zen (fdiv may be a notable benefit of Zen over Haswell).
> 
> All this just means that it's necessary to profile the instructions in the x264 benchmark to determine how Zen may actually behave with it. From what I've seen of other media de/encoder code, I'd expect it to be favoring Haswell still, though the control logic may run better on Zen.


Your math is off there bud. The only reason the math isn't work is because the way the percentages are done are skewed by the average and not the absolute. This gives a cut off perspective of how much better a processor is. If Zen is 50% better than Bulldozer it'll equal Haswell. This is speculative and the way technology works isn't as simple as IPC. There may be reasons the Zen processor or vice-versa, is better than the competition due to it's features and other performance behaviors. It's why even Bulldozer is still favored under other environments than our own personal interests.

The reason IPC is weird is because the application itself is the means of compare it. What I am saying is that as soon as a processor exceeds the requirements to test it'll dramatically gain in score. It's why old GPU benchmarks are irrelevant to modern, you do 3dmark 07 or 09 and you'll literally be 20 times more powerful but really only be 3-5 times the power. Even Cinebench can't escape that form of entropy in benchmarking.


----------



## KyadCK

Quote:


> Originally Posted by *Cyro999*
> 
> Quote:
> 
> 
> 
> Haswell has almost exactly 40% higher IPC (in terms of average single threaded performance per clock) than Excavator.
> 
> 
> 
> For x264, Haswell has ~70% over Piledriver and skylake has ~90% - is the piledriver to excavator gap that big?
Click to expand...

Single or multi-thread?

SR/EX make up for a lot of MT drawbacks that otherwise would heavily contribute to that 70-90%


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> Your math is off there bud. The only reason the math isn't work is because the way the percentages are done are skewed by the average and not the absolute. This gives a cut off perspective of how much better a processor is. If Zen is 50% better than Bulldozer it'll equal Haswell. This is speculative and the way technology works isn't as simple as IPC. There may be reasons the Zen processor or vice-versa, is better than the competition due to it's features and other performance behaviors. It's why even Bulldozer is still favored under other environments than our own personal interests.
> 
> The reason IPC is weird is because the application itself is the means of compare it. What I am saying is that as soon as a processor exceeds the requirements to test it'll dramatically gain in score. It's why old GPU benchmarks are irrelevant to modern, you do 3dmark 07 or 09 and you'll literally be 20 times more powerful but really only be 3-5 times the power. Even Cinebench can't escape that form of entropy in benchmarking.


My math is cross-validated and is performed by spreadsheets ;-)

Much of your comment seems entirely nonsensical to me. 50% better than Bulldozer will not reach Haswell. Haswell has nearly double the IPC of Bulldozer, more than that in many benchmarks. The difference is exaggerated with multi-threaded benchmarks due to the module overhead, but the single threaded benchmarks illustrate the delta clearly.

http://www.anandtech.com/bench/product/1280?vs=1261

I think people forget just how bad Bulldozer was









Piledriver helped quite a bit:

http://www.anandtech.com/bench/product/700?vs=1261

Steamroller was a decent improvement, but all examples of it lack an L3 cache, which reduces the observed performance increase to about 6.7% over Piledriver (with an L3, it would probably be a jump similar to Piledriver over Bulldozer, but my math does not assume that at all).

http://www.anandtech.com/bench/product/1282?vs=1261

Excavator numbers are harder to come by, but it seems that it is just shy of a 10% improvement over Steamroller.

When calculating for Zen, I add 40% over Excavator in every benchmark. It beats Ivy bridge, and, on average, matches Haswell. In no scenario does it actually beat out Haswell in average IPC other than a few pure-integer workloads where Excavator excels. The performance profile of Zen, as I mentioned, however, will not be the same, so we can't add 40% to a benchmark and expect that result from Zen. We can only work with averages until such a time that we have a profile to expect.

Zen should be about 80% faster than the original Bulldozer (all of my numbers are proper averages - with outliers removed). Take into account the fact that we don't have an L3 for the numbers on two generations on AMD's side and we will see closer to 85% over Bulldozer, and also no module penalties (which will add 10~20% more to multithreaded workloads) and we have a statistical tie.

The reality is that Zen will improve well over 100% in some areas, and quite possibly barely at all in others (such as AVX or instruction latencies) when compared to Bulldozer. This, of course, does not take into account clock speeds or any process-related effects (Excavator is on a slow, low power, node, Zen will be on a much superior node, even if it is still a low power node).


----------



## looncraz

Quote:


> Originally Posted by *Cyro999*
> 
> For x264, Haswell has ~70% over Piledriver and skylake has ~90% - is the piledriver to excavator gap that big?


I guess a more simple way to answer this:

Code:



Code:


Multi-bench mean:

Piledriver : 100.0%
Steamroller: 106.7%
Excavator  : 117.2%
Zen        : 164.1% (calculated Excavator + 40%)

So, in theory, Zen should be closing that gap quite well... and, remember, Steamroller and Excavator don't have L3 caches, but Zen will. You can add another 3~6% or so for that, but make sure you add it at Steamroller... here, I add just 3.3% (and only at Steamroller) to account for the missing L3.

Code:



Code:


Piledriver : 100.0%
Steamroller: 110.0%
Excavator  : 120.8%
Zen        : 169.2%

And that's yet another way to show that Zen will probably be quite similar to Haswell


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> My math is cross-validated and is performed by spreadsheets ;-)
> 
> Much of your comment seems entirely nonsensical to me. 50% better than Bulldozer will not reach Haswell. Haswell has nearly double the IPC of Bulldozer, more than that in many benchmarks. The difference is exaggerated with multi-threaded benchmarks due to the module overhead, but the single threaded benchmarks illustrate the delta clearly.
> 
> http://www.anandtech.com/bench/product/1280?vs=1261
> 
> I think people forget just how bad Bulldozer was
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Piledriver helped quite a bit:
> 
> http://www.anandtech.com/bench/product/700?vs=1261
> 
> Steamroller was a decent improvement, but all examples of it lack an L3 cache, which reduces the observed performance increase to about 6.7% over Piledriver (with an L3, it would probably be a jump similar to Piledriver over Bulldozer, but my math does not assume that at all).
> 
> http://www.anandtech.com/bench/product/1282?vs=1261
> 
> Excavator numbers are harder to come by, but it seems that it is just shy of a 10% improvement over Steamroller.
> 
> When calculating for Zen, I add 40% over Excavator in every benchmark. It beats Ivy bridge, and, on average, matches Haswell. In no scenario does it actually beat out Haswell in average IPC other than a few pure-integer workloads where Excavator excels. The performance profile of Zen, as I mentioned, however, will not be the same, so we can't add 40% to a benchmark and expect that result from Zen. We can only work with averages until such a time that we have a profile to expect.
> 
> Zen should be about 80% faster than the original Bulldozer (all of my numbers are proper averages - with outliers removed). Take into account the fact that we don't have an L3 for the numbers on two generations on AMD's side and we will see closer to 85% over Bulldozer, and also no module penalties (which will add 10~20% more to multithreaded workloads) and we have a statistical tie.
> 
> The reality is that Zen will improve well over 100% in some areas, and quite possibly barely at all in others (such as AVX or instruction latencies) when compared to Bulldozer. This, of course, does not take into account clock speeds or any process-related effects (Excavator is on a slow, low power, node, Zen will be on a much superior node, even if it is still a low power node).


Actually I got BD mixed up with PD. Still benchmarks skew. It's late for me.

The second we get a benchmark intended for this generation of CPUs the skew pattern will shrink again. Cinebench won't cut it this time around.


----------



## epic1337

Quote:


> Originally Posted by *looncraz*
> 
> I guess a more simple way to answer this:
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> Multi-bench mean:
> 
> Piledriver : 100.0%
> Steamroller: 106.7%
> Excavator  : 117.2%
> Zen        : 164.1% (calculated Excavator + 40%)
> 
> So, in theory, Zen should be closing that gap quite well... and, remember, Steamroller and Excavator don't have L3 caches, but Zen will. You can add another 3~6% or so for that, but make sure you add it at Steamroller... here, I add just 3.3% (and only at Steamroller) to account for the missing L3.
> 
> Code:
> 
> 
> 
> Code:
> 
> 
> Piledriver : 100.0%
> Steamroller: 110.0%
> Excavator  : 120.8%
> Zen        : 169.2%
> 
> And that's yet another way to show that Zen will probably be quite similar to Haswell


in terms of single-thread performance where a single module takes on the entire workload then that is the case.
but theres some certain things where the BD series is different from Zen, that is it doesn't scale well in multi-threaded workloads due to the shared limited resource per module.

if you factor in Zen cores being individually fully functional, then we can expect it to scale far better than the BD series.
in this case, if Zen is +64% over piledriver, and scales far better in MT workloads, then the MT performance won't just be +64% better than piledriver.
it would be as if an FX-6300 would be able to easily surpass a hexa-core i7-4960X in multi-threaded workloads.
and it wouldn't be weird if AMD puts a 6core Zen at $300~$400, since it can easily surpass a 4C/8T mainstream i7.

as such, it wouldn't matter how accurate the "+40% IPC over excavator" is, its still extremely potent when priced right.
also, DX12 and Vulkan will pave the way to highly multi-threaded games, where intel is still struggling in giving us a low-cost 4core/6core/8core processors.
and on that point, it wouldn't even matter if its barely faster than Sandy Bridge or Ivy Bridge, since AMD can just slowly catch up in IPC like how they're doing with the BD series.


----------



## Pro3ootector

Makes me wonder will they relase ZEN Opterons first. Same scenario like Barcelona architecture?


----------



## dsmwookie

Has any announcement been made for a launch?


----------



## epic1337

Quote:


> Originally Posted by *dsmwookie*
> 
> Has any announcement been made for a launch?


too early for that, its still 6~12months till official release.


----------



## Iwamotto Tetsuz

I want to say who cares about those opyimisations. Get a better core desgin that does ghz with less w. not higher efficenty core design
5ghz 16 cores will never be able to go far . But 10ghz 16 cores will

Just like inter net. Slow inter net with net kikler extreme.but it will never beat a internet that is 2x faster wihout optimisaions.


----------



## TranquilTempest

Quote:


> Originally Posted by *Iwamotto Tetsuz*
> 
> I want to say who cares about those opyimisations. Get a better core desgin that does ghz with less w. not higher efficenty core design
> 5ghz 16 cores will never be able to go far . But 10ghz 16 cores will


You have no clue what you're talking about. More performance with less power is EXACTLY the reason CPU architectures are moving to IPC optimized designs instead of clockspeed optimized designs. Intel figured that out the hard way with netburst, 15 years ago.


----------



## Quantum Reality

Quote:


> Originally Posted by *Iwamotto Tetsuz*
> 
> I want to say who cares about those opyimisations. Get a better core desgin that does ghz with less w. not higher efficenty core design
> 5ghz 16 cores will never be able to go far . But 10ghz 16 cores will
> 
> Just like inter net. Slow inter net with net kikler extreme.but it will never beat a internet that is 2x faster wihout optimisaions.


The whole point is that code that runs in a CPU can be set up to run best a certain way based on its design specifications.

To use a simple example from the 1980s, it turned out on the 6502 CPU to be computationally a bit faster to issue decrements when testing a counter (a compare against zero is all that's involved), so people would write their code preferentially to use count-downs instead of count-ups where feasible.

In the modern era of course things are more complicated, but the basic principle remains that each CPU has its own particular strengths and weaknesses and code should use the strengths and avoid the weaknesses.


----------



## PiOfPie

Quote:


> Originally Posted by *WhiteCrane*
> 
> Thanks for breaking it down for me. In your opinion, is AMD a full generation behind Intel? That's what it looks like to me if they're targeting Haswell performance levels. If they can match Haswell for less money, I'll buy that and I am sure many budget gamers will do the same.


Sorry that I missed your post! Looncraz gave a pretty good answer.
Quote:


> Originally Posted by *Pro3ootector*
> 
> Makes me wonder will they relase ZEN Opterons first. Same scenario like Barcelona architecture?


Bigger margins to be made in the server/HPC market, so I would think so.

Quote:


> Originally Posted by *dsmwookie*
> 
> Has any announcement been made for a launch?


The chip is, per AMD, "on track for a late 2016" launch, with a full year of revenue from Zen in 2017. Whether that means 2016 will just be for enterprise, a paper launch, or a full launch for consumers remains to be seen.

Quote:


> Originally Posted by *TranquilTempest*
> 
> You have no clue what you're talking about. More performance with less power is EXACTLY the reason CPU architectures are moving to IPC optimized designs instead of clockspeed optimized designs. Intel figured that out the hard way with netburst, 15 years ago.


For the most part, yes, high performance-per-watt is king (which usually means high IPC/low clock speed), but mid-to-high clock speed is more tolerable from AMD's point of view because of their access to resonant clock mesh, something that Intel didn't didn't have back during the Netburst days. This naturally assumes that they can hit the clock speeds while staying in their desired power envelope.


----------



## Redwoodz

IPC is not the bottleneck, to quote many on this forum. Scheduling,branch prediction,cache latency and interconnect speeds can make a faster cpu with the same "IPC". If you look at the info ZenFX has hinted at AMD has a whole new way of achieving low latency,high speed cache. This will be a game changer if they can make it work.


----------



## SpeedyVT

Quote:


> Originally Posted by *Redwoodz*
> 
> IPC is not the bottleneck, to quote many on this forum. Scheduling,branch prediction,cache latency and interconnect speeds can make a faster cpu with the same "IPC". If you look at the info ZenFX has hinted at AMD has a whole new way of achieving low latency,high speed cache. This will be a game changer if they can make it work.


This is absolutely correct. People think and assume IPC is everything, it's only part of situation.


----------



## looncraz

Quote:


> Originally Posted by *Redwoodz*
> 
> IPC is not the bottleneck, to quote many on this forum. Scheduling,branch prediction,cache latency and interconnect speeds can make a faster cpu with the same "IPC". If you look at the info ZenFX has hinted at AMD has a whole new way of achieving low latency,high speed cache. This will be a game changer if they can make it work.


Not at all, everything you mentioned effects the net IPC - IPC being a meaningful measure of how effectively the supporting architecture is extracting the core's theoretical performance capabilities.

Theoretically speaking, we should expect Zen to lay the smack-down on Haswell and readily match or exceed Skylake. AMD's core hardware is, in some very important ways, superior to Intel's. Intel's domination is largely a result of their vastly superior cache systems (hard to compete when you are using an L2 cache that is nearly as slow as the competition's L3). In theory, Zen can retire 10 instructions per cycle, peak, whereas Haswell will peak at 7 (the store data port is always bound with an AGU). The mix, of course, is also very important, with Haswell not being able to issue certain integer and floating point instructions in the same cycle, whereas Zen can do four integer, four floating point, and two memory instructions every clock cycle (at least it seems so). This, alone, would indicate that Zen should lay waste to Haswell in terms of theoretical core instruction throughput. But everything else AMD has is quite inferior to Intel. They may even fall back to using Stars caches, AMD just lacks the cache technology to keep the core fully fed.


----------



## Iwamotto Tetsuz

There are limitations on counduticity of materials preventing chips going 10ghz if you elimiate and get cheap and good material you can use boost any chip ghz higher and no doubth on ln2 it will beat any othwr chips at clocks.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> Not at all, everything you mentioned effects the net IPC - IPC being a meaningful measure of how effectively the supporting architecture is extracting the core's theoretical performance capabilities.
> 
> Theoretically speaking, we should expect Zen to lay the smack-down on Haswell and readily match or exceed Skylake. AMD's core hardware is, in some very important ways, superior to Intel's. Intel's domination is largely a result of their vastly superior cache systems (hard to compete when you are using an L2 cache that is nearly as slow as the competition's L3). In theory, Zen can retire 10 instructions per cycle, peak, whereas Haswell will peak at 7 (the store data port is always bound with an AGU). The mix, of course, is also very important, with Haswell not being able to issue certain integer and floating point instructions in the same cycle, whereas Zen can do four integer, four floating point, and two memory instructions every clock cycle (at least it seems so). This, alone, would indicate that Zen should lay waste to Haswell in terms of theoretical core instruction throughput. But everything else AMD has is quite inferior to Intel. They may even fall back to using Stars caches, AMD just lacks the cache technology to keep the core fully fed.


IPC is nothing without the instruction set utilized and so follows the compilers that use them. IPC is a product of current software not relevant to actual performance. Benchmarks invalidate themselves as they age as newer hardware doesn't become just faster but because the hardware exceeds the performance the benchmark was designed around and so IPC is never equal. There is no valid way to compare performance other than is it faster at task A.


----------



## heyplati

Currently it looks like it's using 1.34v @ 4.2GHz (so with turbo), but I'll get some more data.


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> IPC is nothing without the instruction set utilized and so follows the compilers that use them. IPC is a product of current software not relevant to actual performance. Benchmarks invalidate themselves as they age as newer hardware doesn't become just faster but because the hardware exceeds the performance the benchmark was designed around and so IPC is never equal. There is no valid way to compare performance other than is it faster at task A.


IPC is everything when comparing the same instruction set, as we are. IPC is everything when comparing the same benchmarks, as we are. IPC isn't directly the equivalent of performance per clock, no, but an average IPC increase will result in a similar average performance increase. If I can push 40% more LEA instructions, but can't push any more mul, div, add, etc, then performance will only improve in those few cases where LEA instructions matter most. However, if, as Zen appears to be capable of doing, I am pushing 40% more total instructions, without much regard to their type, then a general performance increase of 40% is all we can expect. In some cases it will be more, some cases it will be less, but it will center around the IPC increase.

Where the benefits will be seen most is somewhat obvious (to me anyway). Excavator can only do one other integer op during the window where a multiplication is happening, Zen can do three, two of those being branches. Code that is multiplication heavy will see a sizable improvement, even if the execution of the multiplication itself takes the exact same number of cycles to complete. Same applies for division. Addition/subtraction heavy code, however, will really only increase relative to its dependency chain, which means that it will be more affected by the cache subsystem's improvements than multiplications. Loops with many independent additions or subtractions will operate dramatically faster, but those with dependent operations may not be faster at all.

However, when you take the average instruction retirement rate improvement and compare it to the average improvement in performance, they will be quite similar. This absolutely breaks down when you have only targeted improvements, like we see with Haswell over Ivy Bridge - where it has a 30% IPC increase in certain areas (which also directly results in a 30% observed performance increase).


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> IPC is everything when comparing the same instruction set, as we are. IPC is everything when comparing the same benchmarks, as we are. IPC isn't directly the equivalent of performance per clock, no, but an average IPC increase will result in a similar average performance increase. If I can push 40% more LEA instructions, but can't push any more mul, div, add, etc, then performance will only improve in those few cases where LEA instructions matter most. However, if, as Zen appears to be capable of doing, I am pushing 40% more total instructions, without much regard to their type, then a general performance increase of 40% is all we can expect. In some cases it will be more, some cases it will be less, but it will center around the IPC increase.
> 
> Where the benefits will be seen most is somewhat obvious (to me anyway). Excavator can only do one other integer op during the window where a multiplication is happening, Zen can do three, two of those being branches. Code that is multiplication heavy will see a sizable improvement, even if the execution of the multiplication itself takes the exact same number of cycles to complete. Same applies for division. Addition/subtraction heavy code, however, will really only increase relative to its dependency chain, which means that it will be more affected by the cache subsystem's improvements than multiplications. Loops with many independent additions or subtractions will operate dramatically faster, but those with dependent operations may not be faster at all.
> 
> However, when you take the average instruction retirement rate improvement and compare it to the average improvement in performance, they will be quite similar. This absolutely breaks down when you have only targeted improvements, like we see with Haswell over Ivy Bridge - where it has a 30% IPC increase in certain areas (which also directly results in a 30% observed performance increase).


You misunderstand that all the talk of IPC itself is irrelevant as most benchmarks fail to capture that said performance. Once a processor design exceeds the expectations of the written application the benchmark scoring skews.

So as long as this processor provides ample performance in real life tasks and is able to feed hardware efficiently like GPUs I'll definitely buy it.

It's why HWBot has separate categories purely on the different hardware. IPC =/= Performance.


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> You misunderstand that all the talk of IPC itself is irrelevant as most benchmarks fail to capture that said performance. Once a processor design exceeds the expectations of the written application the benchmark scoring skews.
> 
> So as long as this processor provides ample performance in real life tasks and is able to feed hardware efficiently like GPUs I'll definitely buy it.
> 
> It's why HWBot has separate categories purely on the different hardware. IPC =/= Performance.


No, I understand the concept of bottlenecks perfectly, however most benchmark performance can be rather accurately predicted by a deceptively simple (eIPC * ILP * clockRate) / latency equation. With all else being equal, increasing any one of those values (except latency, which must be reduced) will result in higher performance in general agreement with its own increase. Latency is driven by the code and the memory subsystems as well as the architecture. ILP is driven by CPU design, code order, instruction dependencies, and so on. eIPC here is IPC in its purist form, the entire core's ability to decode, schedule, execute, and retire instructions for a single execution unit on average.

Now, the real trick with the equation (which I just made up







) is that you have to accurately calculate the values. eIPC, ILP, and latency all vary by benchmark... and, today, so does clock rate, so the equation must be run for every benchmark.

You can increase eIPC by decreasing specific latencies within the CPU, you can increase ILP by adding more execution units or better utilizing your existing resources. Zen looks to have gone both routes.

The equation for layman's IPC is (eIPC * ILP) / latency. That's really just performance per clock, in proper terms. I make the assumption that Lisa Su's comment was regarding this more general definition (the whole-core IPC), as does everyone else, of course. In this definition, IPC * clockRate = performance. Bottlenecks are an entirely different discussion that is a moot point until such a time as we have more details about Zen's cache and pipelines. As it stands, I don't much expect Zen to have many fewer pipelines stages than Excavator, but I do expect it to have caches much more similar to the Stars cores in performance, but probably a bit better. That will not bottleneck many applications popularly used for benchmarks in the way Excavator's cache system currently does.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> No, I understand the concept of bottlenecks perfectly, however most benchmark performance can be rather accurately predicted by a deceptively simple (eIPC * ILP * clockRate) / latency equation. With all else being equal, increasing any one of those values (except latency, which must be reduced) will result in higher performance in general agreement with its own increase. Latency is driven by the code and the memory subsystems as well as the architecture. ILP is driven by CPU design, code order, instruction dependencies, and so on. eIPC here is IPC in its purist form, the entire core's ability to decode, schedule, execute, and retire instructions for a single execution unit on average.
> 
> Now, the real trick with the equation (which I just made up
> 
> 
> 
> 
> 
> 
> 
> ) is that you have to accurately calculate the values. eIPC, ILP, and latency all vary by benchmark... and, today, so does clock rate, so the equation must be run for every benchmark.
> 
> You can increase eIPC by decreasing specific latencies within the CPU, you can increase ILP by adding more execution units or better utilizing your existing resources. Zen looks to have gone both routes.
> 
> The equation for layman's IPC is (eIPC * ILP) / latency. That's really just performance per clock, in proper terms. I make the assumption that Lisa Su's comment was regarding this more general definition (the whole-core IPC), as does everyone else, of course. In this definition, IPC * clockRate = performance. Bottlenecks are an entirely different discussion that is a moot point until such a time as we have more details about Zen's cache and pipelines. As it stands, I don't much expect Zen to have many fewer pipelines than Excavator, but I do expect it to have caches much more similar to the Stars cores in performance, but probably a bit better. That will not bottleneck many applications popularly used for benchmarks in the way Excavator's cache system currently does.


Lisa Su may be using the generalized meaning of IPC. However the generalized interpretation of the masses is absolutely single frame of mind, one that is different from Lisa's. Nothing against you, IPC is less important than people emphasize(mass int.).


----------



## The Stilt

AMD is most definitely using "the generalized" meaning of IPC. For example in Carrizo´s ISSCC 2015 presentation they said: *"Excavator cores: 5% more IPC at 40% less power and 23% less area"*. That´s pretty much the average "generalized" IPC improvement you see on Carrizo. Naturally in some cases the IPC increased more than 5% and in some cases less, but the average is pretty much 5%.

So if the "40% more instructions per clock" is in fact the same "generalized" IPC they used for Carrizo, then...


----------



## ZenFX

Cache Simulator
September 2014

Designed a single level L1, Write Back/Write Allocate (WBWA) cache structure with Victim Cache and Strided Prefetcher implementation.
The cache simulator designed was capable of executing any user designed trace on any cache structure containing user defined specifications and reporting detailed
performance results.

https://www.linkedin.com/in/shrutimishra25

Tracking and eliminating bad prefetches generated by a stride prefetcher
US 20140237212 A1

https://www.google.com/patents/US20140237212


----------



## FlanK3r

I still hope, Zen FX will be solid CPU (I do not believe better than Skylake, but close to Skylake overall - similar or slightly better mutlithread and worse single thread performance - some like Phenoms II against Lynfields)


----------



## Cyro999

Quote:


> Originally Posted by *KyadCK*
> 
> Single or multi-thread?
> 
> SR/EX make up for a lot of MT drawbacks that otherwise would heavily contribute to that 70-90%


ST; The MT scaling for 2 threads on a module is about ~1.7x in x264, which makes Piledriver roughly neck and neck with Haswell i5 at the same frequency. Skylake i5 wins with about a 500mhz clock deficit.
Quote:


> In x264, my numbers put Zen with 95% of Haswell's performance at the high end


Nobody cares about Haswell though in late 2016. Nobody cares about Haswell already. Your numbers put Skylake ~20% ahead of Zen, core for core (without taking any SMT scaling into account on either side).

Why is it only the people in AMD threads comparing buying new tech against sandy-ivy in 2014-2015 and Haswell in 2016? SB/IB performance was start of 2011. Haswell was mid 2013. Skylake was four months ago even though Zen is the better part of a year out. Stop settling for less for no reason and demand more.


----------



## ZenFX

Bandwidth increase in branch prediction unit and level 1 instruction cache
WO 2015061648 A1

https://www.google.com/patents/WO2015061648A1?cl=en&dq=inassignee:%22Advanced+Micro+Devices,+Inc.%22++smt&hl=en&sa=X&ved=0ahUKEwjEvJXyzL3JAhVBc44KHZFrAuw4MhDoAQhaMAk

[0046] All the prediction, tag, and cache pipelines handle simultaneous multithreading (SMT) by interleaving accesses from the two threads based on a thread prioritization algorithm. In general, the thread scheduling is performed independently within the BP, IT, and IC pipelines using a round robin technique. In a given cycle, if one of the threads is blocked and the other thread is available to be picked, the other thread will be picked in that cycle.

[0045] The IC pipeline is a three stage pipeline that can fetch 32 bytes of instruction data per cycle. Each address in the PRQ, depending on the predicted start and end location within the 64 byte prediction window, needs either one or two flows down the IC pipeline to forward all of the data to the DE. A returning L2 cache miss that the oldest PRQ entry is waiting for can wake up the IC pipeline for that entry, and the L2 fill data can be bypassed directly to the DE while the data arrays are being updated.

[0073] This is a significant improvement in the throughput. Without the
BIP 536, there will be a bubble in the pipeline. If front end of the branch predictor is limiting the throughput of the machine, there would be a bubble every cycle. Using the BIP 536 plugs the holes, so there is a continuous stream of instructions and there are fewer front end bubbles. The value of using the BIP increases as the machine gets wider, as it attempts to process more instructions each cycle, by helping to keep the machine full.

[0006] Several different types of branch predictors have been used. A bimodal predictor makes a prediction based on recent history of a particular branch's execution, and provides a prediction of taken or not-taken. A global predictor makes a prediction based upon recent history of all the branches' execution, not just the particular branch of interest. A two-level adaptive predictor with a globally shared history buffer, a pattern history table, and an additional local saturating counter may also be used, such that the outputs of the local predictor and the global predictor are exclusive ORed with each other to provide a final prediction. More than one prediction mechanism may be used simultaneously, and a final prediction is made based either on a meta- predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different predictors.

[0004] To speed up the operation of the processor, it is desirable to have a full pipeline. One way of filling the pipeline is to fetch subsequent instructions while previous instructions are being processed. To be able to fetch ahead several instructions, a branch predictor may be used. A branch predictor predicts the direction of a branch instruction (i.e., taken or not- taken) and the branch target address before the branch instruction reaches the execution stage in the pipeline.


----------



## KarathKasun

Quote:


> Originally Posted by *Cyro999*
> 
> ST; The MT scaling for 2 threads on a module is about ~1.7x in x264, which makes Piledriver roughly neck and neck with Haswell i5 at the same frequency. Skylake i5 wins with about a 500mhz clock deficit.
> Nobody cares about Haswell though in late 2016. Nobody cares about Haswell already. Your numbers put Skylake ~20% ahead of Zen, core for core (without taking any SMT scaling into account on either side).
> 
> Why is it only the people in AMD threads comparing buying new tech against sandy-ivy in 2014-2015 and Haswell in 2016? SB/IB performance was start of 2011. Haswell was mid 2013. Skylake was four months ago even though Zen is the better part of a year out. Stop settling for less for no reason and demand more.


It will probably compete on price, right now Vishera is worthless in the market. This is worth 80%-90% of quad a quadcore i7.


----------



## looncraz

Quote:


> Originally Posted by *Cyro999*
> 
> Nobody cares about Haswell though in late 2016. Nobody cares about Haswell already. Your numbers put Skylake ~20% ahead of Zen, core for core (without taking any SMT scaling into account on either side).
> 
> Why is it only the people in AMD threads comparing buying new tech against sandy-ivy in 2014-2015 and Haswell in 2016? SB/IB performance was start of 2011. Haswell was mid 2013. Skylake was four months ago even though Zen is the better part of a year out. Stop settling for less for no reason and demand more.


That number is specifically for x264, I have some numbers that put it at or above Skylake, but that is only for three benchmarks:

Code:



Code:


Zen (theoretical) vs Skylake
Showing Zen >= Skylake only

Bench     (Zen > Skylake)
Agisoft      + 0.4%
WebXPRT      + 5.7%
x264 1st     + 7.7%

These are also areas where Skylake doesn't walk away from Haswell, obviously







And being a synthetic result using a performance profile derived from Excavator, is unlikely to be reality.

I should also note that I did not include Sunspider results in my breakdown due to performance regressions on Intel's side and poor scaling predictability, but Zen technically wipes the floor with Skylake in that benchmark, but so Haswell beats Skylake in that benchmark as well.

As to why we use Haswell, it's because Intel hasn't released meaningful core data about Skylake. We don't know why it's faster, though we can wager a few guesses. For all we know, Skylake is a 10 pipeline monster with 6 ALUs, 3 AGUs, and 1 store data pipe. Or it's just the same thing they've been doing, and just increases the buffer sizes, fetch strides (line width) and so on. You are right to say that most on the high end will not worry about Haswell-like performance, but most people aren't on the high end and will buy what works for their money. Zen looks to be a return to the phenom II vs Core 2 era for AMD, with a Zen+ CPU promising another 15% jump, which should keep AMD in that position for several years. This is important because it keeps AMD around to drive innovation and compete with Intel. That, and Excavator + 40% is pretty much spot-on Haswell


----------



## looncraz

Quote:


> Originally Posted by *ZenFX*
> 
> https://www.google.com/patents/US20140237212


I love that I can replace all of that with:

Code:



Code:


// Prefetch logic loop
while (true) {
        auto entry = GetNextTableEntry();

        if (!entry->Stride()->IsLocked())
                continue;

        UpdatePrefetchSuppression(entry);
        if (entry->IsSuppressed())
                RemovePrefetch(entry);
        else
                IssuePrefetch(entry);
}


----------



## SpeedyVT

Quote:


> Originally Posted by *The Stilt*
> 
> AMD is most definitely using "the generalized" meaning of IPC. For example in Carrizo´s ISSCC 2015 presentation they said: *"Excavator cores: 5% more IPC at 40% less power and 23% less area"*. That´s pretty much the average "generalized" IPC improvement you see on Carrizo. Naturally in some cases the IPC increased more than 5% and in some cases less, but the average is pretty much 5%.
> 
> So if the "40% more instructions per clock" is in fact the same "generalized" IPC they used for Carrizo, then...


It's not about just IPC though. With more beefier cores and dedicated resources than BD/PD it'll allow more feeding to the dGPU. The question is if it can feed it enough to point where it's beating Intel's frame rate. AMD is also trying to win back it's servers. I imagine the cores are smaller than BD/PD cores so it's possible to stack multitudes more per die. I believe it's die module configuration resembles Jaguar or Carrizo-L. I would've really loved seeing an eight core AM1 or multi-socket AM1 board when it was released.


----------



## Heuchler

Rumor from Planet3DNOW in regards to Zen based motherboard.

Since Zen is based on a new CPU socket including DDR4 support, logically new motherboards have to be presented. Therefore we inquired at the motherboard manufacturer if accurate information is already available around the new product generation eagerly awaited. The manufacturer of a reply containing an interesting information: The internal schedule of the provider assumes a launch in March 2016. This would Zen, depending on the approach, appear between six months and a whole year sooner than currently anticipated.

What sounds at first sight almost too good to be true, the confirmation of the rumor could be simply that Zen first appears as Opteron for server and desktop market is served later. It is also conceivable that it is an outdated version at internal schedule of the manufacturer. On the one hand Zen had originally been expected in early 2016 and on the other hand we have not received the "good news" only from a motherboard manufacturer. In this respect, there are still all the variants in the realm of possibility.

http://www.planet3dnow.de/cms/21487-geruecht-launcht-amd-zen-deutlich-eher-als-bisher-angenommen/


----------



## EniGma1987

Quote:


> Originally Posted by *Heuchler*
> 
> The manufacturer of a reply containing an interesting information: The internal schedule of the provider assumes a launch in March 2016.


Would make sense in the timeline if AMD is still planning on releasing Excavator 6 months ahead of Zen on AM4. A March 2016 platform launch with EX would mean a September(ish) launch of Zen unless AMDs plans have changed for product launches.


----------



## Redwoodz

Quote:


> Originally Posted by *EniGma1987*
> 
> Would make sense in the timeline if AMD is still planning on releasing Excavator 6 months ahead of Zen on AM4. A March 2016 platform launch with EX would mean a September(ish) launch of Zen unless AMDs plans have changed for product launches.


There may be a "Carrizo/DDR4" variant on AM4 coming I believe.
Quote:


> Originally Posted by *looncraz*
> 
> Not at all, everything you mentioned effects the net IPC - IPC being a meaningful measure of how effectively the supporting architecture is extracting the core's theoretical performance capabilities.
> 
> Theoretically speaking, we should expect Zen to lay the smack-down on Haswell and readily match or exceed Skylake. AMD's core hardware is, in some very important ways, superior to Intel's. Intel's domination is largely a result of their vastly superior cache systems (hard to compete when you are using an L2 cache that is nearly as slow as the competition's L3). In theory, Zen can retire 10 instructions per cycle, peak, whereas Haswell will peak at 7 (the store data port is always bound with an AGU). The mix, of course, is also very important, with Haswell not being able to issue certain integer and floating point instructions in the same cycle, whereas Zen can do four integer, four floating point, and two memory instructions every clock cycle (at least it seems so). This, alone, would indicate that Zen should lay waste to Haswell in terms of theoretical core instruction throughput. But everything else AMD has is quite inferior to Intel. They may even fall back to using Stars caches, AMD just lacks the cache technology to keep the core fully fed.


You just said what I said in 100+ more words. AMD's cache and memory performance has been previously the biggest bottleneck.As I stated they may have a new approach which is vastly different than anything in the past. When I refered to "IPC" I meant the actual instruction capabilities of the core.


----------



## Death Saved

Quote:


> Originally Posted by *PiOfPie*
> 
> The chip is, per AMD, "on track for a late 2016" launch, with a full year of revenue from Zen in 2017. Whether that means 2016 will just be for enterprise, a paper launch, or a full launch for consumers remains to be seen.


so october first as thats the start of fiscal year 2017?


----------



## looncraz

Quote:


> Originally Posted by *Redwoodz*
> 
> You just said what I said in 100+ more words. AMD's cache and memory performance has been previously the biggest bottleneck.As I stated they may have a new approach which is vastly different than anything in the past. When I refered to "IPC" I meant the actual instruction capabilities of the core.


Greater precision requires words









If you are referring to the internal execution IPC, which is purely bound by instruction latencies, then your prior statements are correct, but also completely irrelevant. That's an engineering detail. IPC, as most commonly used in forums such as this, equates to the entire core's capabilities, including all attached subsystems. In addition, it is the most common definition used in all marketing, as the internal execution IPC (eIPC) is usually just referred to as instruction latencies and IPC always equates to actual realized performance per clock. As such, a statement from AMD claiming 40% higher IPC is more concordant with a claim of approximately 40% higher performance.


----------



## PiOfPie

Quote:


> Originally Posted by *Death Saved*
> 
> so october first as thats the start of fiscal year 2017?


Maybe. We don't know whether they were referring to their own fiscal years or vanilla years.

Both Zambezi and Vishera dropped in October, so it would fit the desktop CPU pattern.


----------



## ZenFX

Efficient tag storage for large data caches

we will suppose that L4 is configured as a 256MB, 32-way, DRAM cache with 256B cache blocks stored in 2KB DRAM pages

[0041] Suppose for purposes of our running example (256MB, 32-way, 256B block, 2KB DRAM page L4; 28-bit tag structures), that L3 cache 110 is a 16MB, 32-way cache with 64B cache lines

[0027] One difficulty in building large, stacked DRAM caches is that the size of the tag array needed to support such a cache can consume significant die area.



https://www.google.com/patents/WO2012154895A1?cl=en&dq=inassignee:%22Advanced+Micro+Devices,+Inc.%22+L1+victim+cache&hl=en&sa=X&ved=0ahUKEwj_0a6g8sTJAhWOB44KHYXABeM4ChDoAQhAMAU


----------



## warpuck

I have not sat down and figured out the time it takes for data to travel from the memory to the the CPU for quite a few years. I wonder if AMD is going to push the CPU speed significantly faster than the SDRAM, that is placed the120mm or so distance away. With propagation delays, can it effectively keep up ? With more cores (12-16-20) the current external buss width and HT link is too small. More Speed ? At 10 Ghz that would be about 30 mm for travel of data (some what less with delays factored in). At some point the distance of width of the die has to figure in. SDRAM internal clock for a 3200 speed stick, should be 667 Mhz. Data travel gets pretty tricky above 1Ghz. Even at the current CPU speeds, the placement of just 1- 2 G bytes of ram running at 1 Ghz on the CPU die would be a large improvement. is there room for that with a 14Nm process inside the current size lid ? I don't think that external RAM will become obsolete for quite some time.

Sometimes you have too proof reed


----------



## Cyro999

There's a larger cache on the CPU package (~128MB?) on some recent Intel CPU's. It actually seemed to help quite a bit, but not revolutionize performance.


----------



## ku4eto

Quote:


> Originally Posted by *Cyro999*
> 
> There's a larger cache on the CPU package (~128MB?) on some recent Intel CPU's. It actually seemed to help quite a bit, but not revolutionize performance.


Well, it can be said it did bring huge performance boost to the iGPU on the 5xxx series. Then they removed the said stuff from the Skylake.


----------



## KyadCK

Quote:


> Originally Posted by *warpuck*
> 
> I have not sat down and figured out the time it takes for data to travel from the memory to the the CPU for quite a few years. I wonder if AMD is going to push the CPU speed significantly faster than the SDRAM, that is placed the120mm or so distance away. With propagation delays, can it effectively keep up ? With more cores (12-16-20) the current external buss width and HT link is too small. More Speed ? At 10 Ghz that would be about 30 mm for travel of data (some what less with delays factored in). At some point the distance of width of the die has to figure in. SDRAM internal clock for a 3200 speed stick, should be 667 Mhz. Data travel gets pretty tricky above 1Ghz. Even at the current CPU speeds, the placement of just 1- 2 G bytes of ram running at 1 Ghz on the CPU die would be a large improvement. is there room for that with a 14Nm process inside the current size lid ? I don't think that external RAM will become obsolete for quite some time.
> 
> Sometimes you have too proof reed


HT Link is completely irrelevant to anything and everything RAM related. It is only for the chip-set.

It is also no longer used on any CPU post-PD.


----------



## Cyro999

Quote:


> Originally Posted by *ku4eto*
> 
> Well, it can be said it did bring huge performance boost to the iGPU on the 5xxx series. Then they removed the said stuff from the Skylake.


It's still used in skylake, just not in the 6600k/6700k. And of course it brings a huge boost to the iGPU - that's heavily bottlenecked by ~25-50GB/s of ddr4 memory bandwidth that the CPU largely needs for itself


----------



## tcclaviger

When AMD can match Sandy Bridge E IPC in 6 or more cores for both ALU and FPU without using an integrated GPU/APU it matters, until then, sit down AMD.


----------



## looncraz

Quote:


> Originally Posted by *tcclaviger*
> 
> When AMD can match Sandy Bridge E IPC in 6 or more cores for both ALU and FPU without using an integrated GPU/APU it matters, until then, sit down AMD.


That is actually about the worst-case estimate for Zen's performance assuming they screw up across the board but still manage to deliver a product.


----------



## Vesku

Quote:


> Originally Posted by *svenge*
> 
> They really don't have the resources to do a die-shrink of Excavator down to 14/16nm, and if AMD could've made a desktop-class (greater than 35w) APU with Excavator using GloFo's 28nm process they would have done so already.
> 
> It's not like AMD's not used to spending entire years without anything competitive in the CPU marketplace, so for them 2016 won't be a new experience. It'll be just like 2015 but slightly worse due to Kaby Lake.


AMD has 28nm Bristol Ridge including 65W Excavator desktop APUs set for early-ish next year.


----------



## Vesku

Quote:


> Originally Posted by *Pro3ootector*
> 
> Makes me wonder will they relase ZEN Opterons first. Same scenario like Barcelona architecture?


Last I read from executive interviews it will be Highend AM4 desktop > Server > the various APUs


----------



## Themisseble

Quote:


> Originally Posted by *tcclaviger*
> 
> When AMD can match Sandy Bridge E IPC in 6 or more cores for both ALU and FPU without using an integrated GPU/APU it matters, until then, sit down AMD.


Not only arch you also need compilers.

ZEN will probably come out first on desktop then on server.


----------



## The Stilt

Just a hypothesis / poll: How would you personally react if the top desktop Zen SKU would match this exactly in terms of clocks, IPC and TDP?

http://ark.intel.com/products/75269/Intel-Xeon-Processor-E5-2650-v2-20M-Cache-2_60-GHz


----------



## Cyro999

Quote:


> Originally Posted by *tcclaviger*
> 
> When AMD can match Sandy Bridge E IPC in 6 or more cores for both ALU and FPU without using an integrated GPU/APU it matters, until then, sit down AMD.


Sandy IPC is pretty meh - Skylake is ~35-40% higher for encoding already. That's a lot! It only takes a 50% IPC advantage for a 4-core to match a 6-core in a 100% parallel task. A 30-40% gain is usually enough due to amdahl's law when parallelism isn't very close to 100%.
Quote:


> Just a hypothesis / poll: How would you personally react if the top desktop Zen SKU would match this exactly in terms of clocks, IPC and TDP?


Badly, it'd get beat by a 6700k for encoding and by ~70% for ST perf.

2011 IPC at 2.6 - 3.4ghz is pretty useless for consumer applications


----------



## Kuivamaa

Quote:


> Originally Posted by *The Stilt*
> 
> Just a hypothesis / poll: How would you personally react if the top desktop Zen SKU would match this exactly in terms of clocks, IPC and TDP?
> 
> http://ark.intel.com/products/75269/Intel-Xeon-Processor-E5-2650-v2-20M-Cache-2_60-GHz


This is the CPU of my non gaming rig.it can game at 3.4 Ghz with poor threaded titles but overall pretty Meh as in full turbo it has the ST perf of my 4.7 8320 CPU more or less. If it clocks around 4GHz it would be ok-ish as a PD upgrade but if this is the best AMD can do they are doomed as it brings them square back to where they were in 2012. Losing in ST perf vs the latest Intel core , beating their mainstream offerings in MT but losing across the board vs their HEDT platform.






There it is, my Ivy Xeon vs my Vishera. A 9590 which turbos at 5.0GHz should slightly beat the ivy at 3.4GHz single core, but lose in full MT (this is CPU-z 1.73 bench). If Zen 8C/16T is a replica of my Ivy, I may get one to replace the PD but only if it reliably clocks at 4.0 just as easy as PD clocks at 4.6 and only if it is not more expensive than a 5820k. This is the sad truth, Ivy IPC and low clocks won't cut it.


----------



## FlanK3r

Quote:


> Originally Posted by *The Stilt*
> 
> Just a hypothesis / poll: How would you personally react if the top desktop Zen SKU would match this exactly in terms of clocks, IPC and TDP?
> 
> http://ark.intel.com/products/75269/Intel-Xeon-Processor-E5-2650-v2-20M-Cache-2_60-GHz


in multithread? If top Zen will be clocked around 3.5 GHz+turbo up to 3.9 GHz, the pure single thread will be not bad compared to 3 GHz Ivy Bridge. E5-2650 is 8c/16t with "low clock", performance in rendering/video encoding of this could be similar to 4c/8t 6700K?


----------



## The Stilt

Quote:


> Originally Posted by *FlanK3r*
> 
> in multithread? If top Zen will be clocked around 3.5 GHz+turbo up to 3.9 GHz, the pure single thread will be not bad compared to 3 GHz Ivy Bridge. E5-2650 is 8c/16t with "low clock", performance in rendering/video encoding of this could be similar to 4c/8t 6700K?


In both. I can´t really see the 14nm LPP process going much past 3GHz, based on the little information currently available. Maybe 3.4GHz - 3.5GHz for one or two cores, but that probably already puts the process WAY outside it´s efficient frequency region. I would really hope there is a catch in that "40% IPC improvement over Excavator" statement, since it is obvious that they cannot reach high enough frequency to make up the lower IPC while using the 14nm LPP process. 40% IPC improvement over Excavator would make Zen to match Ivy Bridge pretty closely in FP IPC









Also it is yet to be seen how efficient will the SMT of implementation be. By the time Zen arrives Intel has around 15 years of experience from SMT. With Broadwell & Skylake their SMT yield is around 26.5% (in relation to native core performance). While I don´t expect AMD to be able to match it with their first design using it, I still hope they would have figured out something new...


----------



## FlanK3r

Im still confuse if top desktop model will be 8c/16t or 4c/8t.


----------



## Themisseble

Quote:


> Originally Posted by *FlanK3r*
> 
> Im still confuse if top desktop model will be 8c/16t or 4c/8t.


I believe they said 8 cores = 16 threads


----------



## The Stilt

With SMT you cannot get away saying "8 cores" if you got four cores which can all execute two threads each. In Intel´s SMT implementation there is basically no resource multiplication.

Intel SMT (In Broadwell / Skylake) resource doubling:

- Architectural state (GPR & CR block)
- APIC

AMD CMT (Excavator) resource doubling:

- Instruction decoder
- Integer datapath
- Integer scheduler
- 32KB L1 Data Cache
- LSU
- 128-bit FMAC
- 64 * 8K L2 Cache (512KB priority)

Until AMD reveals more information about Zen nobody can say anything certain really.


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> In both. I can´t really see the 14nm LPP process going much past 3GHz, based on the little information currently available. Maybe 3.4GHz - 3.5GHz for one or two cores, but that probably already puts the process WAY outside it´s efficient frequency region. I would really hope there is a catch in that "40% IPC improvement over Excavator" statement, since it is obvious that they cannot reach high enough frequency to make up the lower IPC while using the 14nm LPP process. 40% IPC improvement over Excavator would make Zen to match Ivy Bridge pretty closely in FP IPC
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Also it is yet to be seen how efficient will the SMT of implementation be. By the time Zen arrives Intel has around 15 years of experience from SMT. With Broadwell & Skylake their SMT yield is around 26.5% (in relation to native core performance). While I don´t expect AMD to be able to match it with their first design using it, I still hope they would have figured out something new...


Very little about 14nm LPP makes it less capable of high clocks than 28nm SHP. In fact, it has faster, more efficient, transistor switching, resulting in shorter delays, so it should be capable of higher clocks at lower power. What we don't know is the response curve from more power at the gate and the FinFet uniformity (which is a HUGE factor for power handling).

14nm LPP is specifically designed to increase the minimum size of the FinFets by some 10% at the top without growing the overall footprint much, which should result in an even higher current handling capability, which translates into higher clocks. FinFets change the game a bit.

From there, it's all about design. We already know AMD knows how to make CPUs that can reach high clock speeds, so if AMD keeps the pipeline stages simple and doesn't increase the time-wise complexity of anything, and ensures that power and clock deliveries are clean, then they should be able to exceed 4GHz. Albeit probably not with 95W TDP and 8 cores









Zen's performance profile, of course, will likely be quite different than Excavator's. There will be performance regressions and some areas that are twice as fast. Ivy level floating point and Haswell level integer is quite possible, but it will depend on the workload. Zen's FPU has some advantages over Haswell's for ILP, but I've yet to compare the latencies of Haswell vs Excavator, so I don't really have a good way to estimate the end performance of the FPU at this time. Integer, however, should, almost at worst, tie with Haswell. Memory operand heavy code with minimal code-local address reuse would still remain in favor of Haswell.

SMT scaling, clockspeeds, and cache performance are things we will have no way to estimate adequately for some time.


----------



## looncraz

Quote:


> Originally Posted by *The Stilt*
> 
> With SMT you cannot get away saying "8 cores" if you got four cores which can all execute two threads each. In Intel´s SMT implementation there is basically no resource multiplication.
> [snip...]
> Until AMD reveals more information about Zen nobody can say anything certain really.


Indeed so, Zen's SMT design is something I really would love to know about. I have a few designs in my mind that I'd love to see tried out, will be nice to see if AMD went with anything even remotely similar.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> Indeed so, Zen's SMT design is something I really would love to know about. I have a few designs in my mind that I'd love to see tried out, will be nice to see if AMD went with anything even remotely similar.


Rumor is that Zen's core design is a morph core design as in can or will be in future designs able to make more cores on one module become one large core for larger tasks or split up into less and lesser cores. This is an old rumor.


----------



## MadRabbit

Quote:


> Originally Posted by *SpeedyVT*
> 
> Rumor is that Zen's core design is a morph core design as in can or will be in future designs able to make more cores on one module become one large core for larger tasks or split up into less and lesser cores. This is an old rumor.


Could you link that rumor? Never even seen that...and don't tell me it's from a site that begins with W.


----------



## SpeedyVT

Quote:


> Originally Posted by *MadRabbit*
> 
> Could you link that rumor? Never even seen that...and don't tell me it's from a site that begins with W.


Too old to link not sure where I can dig it up again. It was the other notoriously bad page.

Rumor doesn't mean I'm implying anything it's just one of them fabulous rumors.


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> Rumor is that Zen's core design is a morph core design as in can or will be in future designs able to make more cores on one module become one large core for larger tasks or split up into less and lesser cores. This is an old rumor.


I'm very very much aware of this technology and I still believe AMD was originally trying to accomplish this with Bulldozer but faced unexpected issues. I first discussed this idea in the late 90s or early 00s and may even have been one of the first to envision it.









The idea is to use all the execution units of multiple cores to execute a single thread. The issue is, and has always been, how to handle the scheduling. When you have a fixed number of units with fixed capabilities scheduling is relatively simple, but as soon as you start creating variable availability you no longer have that simplicity. The only viable option, it seems, is to spread branches between the cores speculatively, such that you execute multiple possible branches concurrently. In the end, it just seems better to have a massive macro core with a single scheduler per execution resource type...

As in:












The brown lines on the integer/FPU groups denote a functional block which can be powered down at will. The white spacing in the mem ops group denotes power and clock gating. Every color has a different capability. Any thread can use all of the resources available and instructions are fetched in the largest groups possible. The design assumes thread prioritization awareness, register renaming, concurrent SMT (thread count is limited by register count and each thread can execute concurrently using available resources), and much more. Low power states will shut down sections of the core as well as reducing the clock rate.

You can see there are 20 ALU/FPU pipes, 9+9 AGUs, and various specialized memory units mostly designed to allow the SIMD array (GPU) to act as a first-class citizen for compute, with the ineger and floating point connections available for executing certain SIMD instructions for multiple threads (this reduces the complexity of each execution unit in the core, which is vital for enabling such a wide design).

Mind you, this is the entire CPU, and will favor single threaded performance, with more threads slowly reducing the performance of other threads. Operating system support is, quite obviously, critical.

EDIT:

In case it isn't clear, this is meant to treat graphics as a full-time first class citizen. The group of three multi-colored memory op pipelines are always on (except in standby, obviously). The three units (from left to right): memory stream in, memory stream out, full AGU. This allows basic operation (near-idle) of a computer with the remainder of the memory resources turned off (including all other AGUs). The big group to the right contains a load AGU, load and store streaming, and two AGLUs. These are active next and can handle basic massively repetitive tasks such as watching videos online and supporting minor thread tasks.

Hope that makes sense ;-)


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> I'm very very much aware of this technology and I still believe AMD was originally trying to accomplish this with Bulldozer but faced unexpected issues. I first discussed this idea in the late 90s or early 00s and may even have been one of the first to envision it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The idea is to use all the execution units of multiple cores to execute a single thread. The issue is, and has always been, how to handle the scheduling. When you have a fixed number of units with fixed capabilities scheduling is relatively simple, but as soon as you start creating variable availability you no longer have that simplicity. The only viable option, it seems, is to spread branches between the cores speculatively, such that you execute multiple possible branches concurrently. In the end, it just seems better to have a massive macro core with a single scheduler per execution resource type...
> 
> As in:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The brown lines on the integer/FPU groups denote a functional block which can be powered down at will. The white spacing in the mem ops group denotes power and clock gating. Every color has a different capability. Any thread can use all of the resources available and instructions are fetched in the largest groups possible. The design assumes thread prioritization awareness, register renaming, concurrent SMT (thread count is limited by register count and each thread can execute concurrently using available resources), and much more. Low power states will shut down sections of the core as well as reducing the clock rate.
> 
> You can see there are 20 ALU/FPU pipes, 9+9 AGUs, and various specialized memory units mostly designed to allow the SIMD array (GPU) to act as a first-class citizen for compute, with the ineger and floating point connections available for executing certain SIMD instructions for multiple threads (this reduces the complexity of each execution unit in the core, which is vital for enabling such a wide design).
> 
> Mind you, this is the entire CPU, and will favor single threaded performance, with more threads slowly reducing the performance of other threads. Operating system support is, quite obviously, critical.


As long as threading is in control of the OS or Compiler I don't think we'll see it rise to popular use. Too bad we can't make processor determine how it handles instructions rather than how a compiler does.


----------



## christoph

Spoiler: Warning: Spoiler!



Quote:


> Originally Posted by *looncraz*
> 
> I'm very very much aware of this technology and I still believe AMD was originally trying to accomplish this with Bulldozer but faced unexpected issues. I first discussed this idea in the late 90s or early 00s and may even have been one of the first to envision it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The idea is to use all the execution units of multiple cores to execute a single thread. The issue is, and has always been, how to handle the scheduling. When you have a fixed number of units with fixed capabilities scheduling is relatively simple, but as soon as you start creating variable availability you no longer have that simplicity. The only viable option, it seems, is to spread branches between the cores speculatively, such that you execute multiple possible branches concurrently. In the end, it just seems better to have a massive macro core with a single scheduler per execution resource type...
> 
> As in:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The brown lines on the integer/FPU groups denote a functional block which can be powered down at will. The white spacing in the mem ops group denotes power and clock gating. Every color has a different capability. Any thread can use all of the resources available and instructions are fetched in the largest groups possible. The design assumes thread prioritization awareness, register renaming, concurrent SMT (thread count is limited by register count and each thread can execute concurrently using available resources), and much more. Low power states will shut down sections of the core as well as reducing the clock rate.
> 
> You can see there are 20 ALU/FPU pipes, 9+9 AGUs, and various specialized memory units mostly designed to allow the SIMD array (GPU) to act as a first-class citizen for compute, with the ineger and floating point connections available for executing certain SIMD instructions for multiple threads (this reduces the complexity of each execution unit in the core, which is vital for enabling such a wide design).
> 
> Mind you, this is the entire CPU, and will favor single threaded performance, with more threads slowly reducing the performance of other threads. Operating system support is, quite obviously, critical.
> 
> EDIT:
> 
> In case it isn't clear, this is meant to treat graphics as a full-time first class citizen. The group of three multi-colored memory op pipelines are always on (except in standby, obviously). The three units (from left to right): memory stream in, memory stream out, full AGU. This allows basic operation (near-idle) of a computer with the remainder of the memory resources turned off (including all other AGUs). The big group to the right contains a load AGU, load and store streaming, and two AGLUs. These are active next and can handle basic massively repetitive tasks such as watching videos online and supporting minor thread tasks.
> 
> Hope that makes sense ;-)






I'm sure that was the original idea since the beginning


----------



## CrazyElf

If you think about it, we seem to be hitting the limits of silicon here. That is why Intel cannot improve the way it used to:


It costs more and more per new node, plus to design a new architecture.
At the same time, the marginal benefit per new node or architecture seems to have drastically declined.
Performance per watt has continued though to see some decent increases.
I think that with Zen, we will seen AMD make this great leap forward. Then only 10% like Intel. On paper at least, they have addressed the weak Floating Point performance of the Bulldozer (and successive architectures). That was a huge weak point.

Whether or not something like III-V materials can buy some more time, or "stacked transistors" (not sure how they are going to deal with the heat) or graphene in the long run remains to be seen.

All AMD has to do is to make the leap and then they will have closed most of the gap. (Okay it's more complex than "all" as this is a remarkably difficult step). I have confidence that Keller designed a solid CPU before he left, but I am less comfortable with how well the 14/16nm process will be.

Another big question I have is the "cache gap" for lack of a better term. Intel's cache is much, much better than AMDs. We'll need more details about this.

Quote:


> Originally Posted by *looncraz*
> 
> I'm very very much aware of this technology and I still believe AMD was originally trying to accomplish this with Bulldozer but faced unexpected issues. I first discussed this idea in the late 90s or early 00s and may even have been one of the first to envision it.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The idea is to use all the execution units of multiple cores to execute a single thread. The issue is, and has always been, how to handle the scheduling. When you have a fixed number of units with fixed capabilities scheduling is relatively simple, but as soon as you start creating variable availability you no longer have that simplicity. The only viable option, it seems, is to spread branches between the cores speculatively, such that you execute multiple possible branches concurrently. In the end, it just seems better to have a massive macro core with a single scheduler per execution resource type...
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> As in:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> The brown lines on the integer/FPU groups denote a functional block which can be powered down at will. The white spacing in the mem ops group denotes power and clock gating. Every color has a different capability. Any thread can use all of the resources available and instructions are fetched in the largest groups possible. The design assumes thread prioritization awareness, register renaming, concurrent SMT (thread count is limited by register count and each thread can execute concurrently using available resources), and much more. Low power states will shut down sections of the core as well as reducing the clock rate.
> 
> You can see there are 20 ALU/FPU pipes, 9+9 AGUs, and various specialized memory units mostly designed to allow the SIMD array (GPU) to act as a first-class citizen for compute, with the ineger and floating point connections available for executing certain SIMD instructions for multiple threads (this reduces the complexity of each execution unit in the core, which is vital for enabling such a wide design).
> 
> Mind you, this is the entire CPU, and will favor single threaded performance, with more threads slowly reducing the performance of other threads. Operating system support is, quite obviously, critical.
> 
> EDIT:
> 
> 
> 
> In case it isn't clear, this is meant to treat graphics as a full-time first class citizen. The group of three multi-colored memory op pipelines are always on (except in standby, obviously). The three units (from left to right): memory stream in, memory stream out, full AGU. This allows basic operation (near-idle) of a computer with the remainder of the memory resources turned off (including all other AGUs). The big group to the right contains a load AGU, load and store streaming, and two AGLUs. These are active next and can handle basic massively repetitive tasks such as watching videos online and supporting minor thread tasks.
> 
> Hope that makes sense ;-)


Outstanding ideas. +Rep

The question I have for you is whether or not AMD's CMT concept was fundamentally bad or whether it was simply a terrible execution of a fundamentally sound concept. Judging by what you have typed I suspect that you seem to think the later? I agree though that single-threaded performance is the optimal idea.

I suppose for your ideas, you'd want ideally something open source so that it can be modified to support your custom design or else it will not perform as well as desired. What I'd love to know from AMD is the details of the challenges that they encountered. That might also give us an idea of what kinds of challenges your CPU idea might see in the real world (were it ever designed).

Actually, as an enthusiast if you think about it, a "giant die" would work out well for us.

This is my thought:

Design for maximum performance per clock
Design for as high power efficiency as possible
Could use giant dies (this will be very inefficient from a performance per mm^2 POV)
Large die, but with few cores (alternatively a heterogeneous design with 2-4 really fast single threaded cores and the rest multithreaded cores)
Die size will be at reticle limit (kind of like how the current GPUs are at 600mm^2); I believe the largest die ever made was Tukwila at close to 700mm^2
It would not be cheap, but with CPUs slowing down in improvement, we would keep it for a long time, knowing that next generation, the improvement will only be 10%. I'm thinking that as we hit the limits of silicon, 10% will be much smaller and will drop to below the single digits (they already have in some benchmarks really).

At some point, we'll reach a point where:

The laws of physics do not allow for smaller nodes or faster
We are clockspeed bottlenecked
Too expensive to make a new architecture
At that point, we should go for the biggest die possible and see if the reticle limit can be raised.

I wonder how long GPUs can continue to improve for before it slows down too. Granted the nature of GPUs makes it easier, but at some point, I think they will hit a limit too.

Edit: I wonder if the my "big die" CPU could somehow be combined with say, HBM on the same package and soldered on? The reason I got the idea was because Broadwell saw some very impressive gains with the eDRAM which acted like an off package L4 cache.


----------



## Cyro999

Quote:


> On paper at least, they have addressed the weak Floating Point performance of the Bulldozer (and successive architectures). That was a huge weak point.


The cache, memory and ST performance too
Quote:


> alternatively a heterogeneous design with 2-4 really fast single threaded cores and the rest multithreaded cores


Those designs are amazing for many workloads due to amdahl's law but harder to design and scale. You could have for example a 5 core design (but core 1 be ~40% faster than the others) instead of a 6 core design with all of equal performance.

In a worst case scenario (100% parallelism) this is 10% slower than using the die area for 6 cores instead (you have the performance of 5.4 cores) but for many tasks it gives far more substantial performance advantages. It'd be godly as a gaming CPU.


----------



## Olivon

Quote:


> Originally Posted by *The Stilt*
> 
> Just a hypothesis / poll: How would you personally react if the top desktop Zen SKU would match this exactly in terms of clocks, IPC and TDP?
> 
> http://ark.intel.com/products/75269/Intel-Xeon-Processor-E5-2650-v2-20M-Cache-2_60-GHz


I will say that ST perfrormances would be too low compared to Intel's offer. More interesting for MT tasks, but we compare a 2013 CPU vs a 2017 one.


----------



## The Stilt

Quote:


> Originally Posted by *Olivon*
> 
> I will say that ST perfrormances would be too low compared to Intel's offer. More interesting for MT tasks, but we compare a 2013 CPU vs a 2017 one.


Indeed. In this case let´s keep our fingers and toes crossed that the "40% IPC improvement" is the understatement of the century.


----------



## Ultracarpet

Quote:


> Originally Posted by *The Stilt*
> 
> Indeed. In this case let´s keep our fingers and toes crossed that the "40% IPC improvement" is the understatement of the century.


That, or by some grace of absolute magic the chips can reach decent clock speeds.


----------



## The Stilt

Quote:


> Originally Posted by *Ultracarpet*
> 
> That, or by some grace of absolute magic the chips can reach decent clock speeds.


It would need to reach up to ~45% higher clock than Skylake to match it´s FP performance (if the 40% statement is accurate), which obviously is not going to happen. I´d put my money on Broadwell´ish (stock) clocks, in the very best case.


----------



## Cyro999

I don't think the process will be that bad. If you compare to Skylake, intel's 14nm is most efficient at very low clocks too (~2ghz?). Going from 3.5ghz to 4.5ghz costs so much power and is extremely inefficient. Inefficient 14nm is still worlds better than 32nm, though.

I don't think they would even bother with the zen design if there was going to be a hard clock wall anywhere near 3ghz


----------



## The Stilt

For that reason Intel has two different variants of the 14nm process (P1272 & P1273)







The other has been optimized for maximum power efficiency (at expense of performance, SoC / mobile / embedded), while the other has been optimized for maximum performance (at expense of efficiency, CPU). Performance vs. efficiency / area is always a trade-off, you make based on your design goals.

It is not about if the 14nm LPP AMD will be using is better than the 32nm SHP SOI or 28nm xHP / HPP bulk they are now using, because it is in nearly every aspect. The question is if the 14nm LPP process is up to the job AMD needs it to do. If AMD this time hasn´t been as incompetent as they have been during the past few years, they´ve made themselves a "escape plan" involving TSMC 16nm FF+. It should be obvious to everyone that if Zen infact fails it will be then end of AMD.


----------



## christoph

nooooo, I really think that ve hit a wall here

as our friend above said, a larger die size would be better, and don't know where Graphene's gonna come along, which is supposedly to allow you for a much much smaller architecture with no heating problems

think about it;

a larger die size, smaller architecture, bigger cache, higher clocks and no heating problems


----------



## looncraz

Quote:


> Originally Posted by *CrazyElf*
> 
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> If you think about it, we seem to be hitting the limits of silicon here. That is why Intel cannot improve the way it used to:
> 
> 
> It costs more and more per new node, plus to design a new architecture.
> At the same time, the marginal benefit per new node or architecture seems to have drastically declined.
> Performance per watt has continued though to see some decent increases.
> I think that with Zen, we will seen AMD make this great leap forward. Then only 10% like Intel. On paper at least, they have addressed the weak Floating Point performance of the Bulldozer (and successive architectures). That was a huge weak point.
> 
> Whether or not something like III-V materials can buy some more time, or "stacked transistors" (not sure how they are going to deal with the heat) or graphene in the long run remains to be seen.
> 
> All AMD has to do is to make the leap and then they will have closed most of the gap. (Okay it's more complex than "all" as this is a remarkably difficult step). I have confidence that Keller designed a solid CPU before he left, but I am less comfortable with how well the 14/16nm process will be.
> 
> 
> Another big question I have is the "cache gap" for lack of a better term. Intel's cache is much, much better than AMDs. We'll need more details about this.
> Outstanding ideas. +Rep
> 
> The question I have for you is whether or not AMD's CMT concept was fundamentally bad or whether it was simply a terrible execution of a fundamentally sound concept. Judging by what you have typed I suspect that you seem to think the later? I agree though that single-threaded performance is the optimal idea.
> 
> I suppose for your ideas, you'd want ideally something open source so that it can be modified to support your custom design or else it will not perform as well as desired. What I'd love to know from AMD is the details of the challenges that they encountered. That might also give us an idea of what kinds of challenges your CPU idea might see in the real world (were it ever designed).
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> Actually, as an enthusiast if you think about it, a "giant die" would work out well for us.
> 
> This is my thought:
> 
> Design for maximum performance per clock
> Design for as high power efficiency as possible
> Could use giant dies (this will be very inefficient from a performance per mm^2 POV)
> Large die, but with few cores (alternatively a heterogeneous design with 2-4 really fast single threaded cores and the rest multithreaded cores)
> Die size will be at reticle limit (kind of like how the current GPUs are at 600mm^2); I believe the largest die ever made was Tukwila at close to 700mm^2
> It would not be cheap, but with CPUs slowing down in improvement, we would keep it for a long time, knowing that next generation, the improvement will only be 10%. I'm thinking that as we hit the limits of silicon, 10% will be much smaller and will drop to below the single digits (they already have in some benchmarks really).
> 
> At some point, we'll reach a point where:
> 
> The laws of physics do not allow for smaller nodes or faster
> We are clockspeed bottlenecked
> Too expensive to make a new architecture
> At that point, we should go for the biggest die possible and see if the reticle limit can be raised.
> 
> I wonder how long GPUs can continue to improve for before it slows down too. Granted the nature of GPUs makes it easier, but at some point, I think they will hit a limit too.
> 
> Edit: I wonder if the my "big die" CPU could somehow be combined with say, HBM on the same package and soldered on? The reason I got the idea was because Broadwell saw some very impressive gains with the eDRAM which acted like an off package L4 cache.


I don't know what AMD will do about the cache gap, but they certainly went the wrong way with Bulldozer. Having an L2 that is as slows as the competition's L3 is not how you win a performance battle. Of course, AMD made up for some of that deficit with capacity, so you are more likely to find the data, but even when you do it takes longer to get it. Most programs aren't going to benefit from a massive L2 as their working data is much too small.

As for what went wrong with Bulldozer, I can only imagine that they couldn't find a way to schedule the same thread's instruction across two schedulers. You can't really do it without a unified scheduler (or schedule buffer) per execution resource type and expect an increase in improvement - unless you are decoding far far faster than the core can manage to execute. Even my design has quite a few issues. The massive schedulers, for example, guarantee added stages in the pipeline. This will probably be a 20+ stage pipeline design, though I think the added ILP will reduce the harm that might cause. This is partly remedied by the divisions within the schedulers, where a single thread will attempt to execute on the nearest available resource once its data is available. The great thing is that non-dependent executions can cascade to other execution units and you will get access to the pointer for address calculation before the linear progression of retired instructions would dictate, which will greatly hide cache latencies as the data will have a tendency to be available many cycles sooner.

And, of course, each unit is more simple as a result. The five ALUs per segment must be able to handle the full breadth of instructions, but they need not have any ILP considerations at all. So you can dedicate one to addition, one to multiplication, one to division, one to branch, and so on, knowing that you have those same resources repeated and accessible for performance critical code (one reason why the CPU will need to know thread priorities).

This CPU could be made today, no problem, but it would be more difficult to synthesize than a single core design you copy & paste, though I've taken some of that into account. Operating system support would be ideal, but not required. Running Windows XP on it, for example, would still see massive gains, it's just that the OS will need to see a configuration to which it is accustomed, so the CPU would have to make itself appear to be a octo core, for example, and treat each thread as being just as important as any other. This won't be a major issue until you have some heavy-hitting low priority background thread sucking up an unwanted share of resources (albeit limited to 1/5th per "core"). In a properly updated operating system each core would receive a priority setting instruction on context switches, so higher priorities get more resources and lower priorities can be strictly limited to just one execution group with other threads being able to jump in at any time.

There's a lot of metadata attached to each instruction with this design, of course. It needs to know which thread to which it belongs, the priority of that thread, which instruction group for that thread, its position in that instruction group, then it needs its reordering data, then, finally, the instruction itself... This is required for scheduling and retirement reordering, so some of this is already performed (and probably in a much better way than the following).

Code:



Code:


struct XInstructionEntry {
        uint8                   thread;
        uint4                   priority;
        uint4                   instructionGroup;
        uint4                   instructionPosition;
        uint4                   dependencyTree;
        XInstruction    instruction;
};


----------



## paulerxx

I really want AMD to destroy Intel.... I can't be the only one?


----------



## ubbernewb

Quote:


> Originally Posted by *paulerxx*
> 
> I really want AMD to destroy Intel.... I can't be the only one?


wont happen or if it did they would for about 30 seconds i strongly suspect Intel could bring out a much faster chip if they had to, but since AMD has been no threat for a while they release chips that are just slightly faster then their last chip


----------



## Amphibian

Quote:


> Originally Posted by *ubbernewb*
> 
> wont happen or if it did they would for about 30 seconds i strongly suspect Intel could bring out a much faster chip if they had to, but since AMD has been no threat for a while they release chips that are just slightly faster then their last chip


I would be ok with intel stepping up it's game. If AMD can become a strong competitor the consumers will benefit or at least we can hope.


----------



## ubbernewb

Quote:


> Originally Posted by *Amphibian*
> 
> I would be ok with intel stepping up it's game. If AMD can become a strong competitor the consumers will benefit or at least we can hope.


EXACTLY! im praying the AMD chip is good, competition is good for us the Consumer


----------



## Dom-inator

Quote:


> Originally Posted by *christoph*
> 
> nooooo, I really think that ve hit a wall here
> 
> as our friend above said, a larger die size would be better, and don't know where Graphene's gonna come along, which is supposedly to allow you for a much much smaller architecture with no heating problems
> 
> think about it;
> 
> a larger die size, smaller architecture, bigger cache, higher clocks and no heating problems


What about power consumption? This is a major consideration since most of the chips produced will end up in laptops, which is probably where AMD will make the most profit or aim to anyway. The high end desktop user-base is a lot smaller in comparison.


----------



## Themisseble

Quote:


> Originally Posted by *Dom-inator*
> 
> What about power consumption? This is a major consideration since most of the chips produced will end up in laptops, which is probably where AMD will make the most profit or aim to anyway. The high end desktop user-base is a lot smaller in comparison.


should not be a problem. Look at carrizo at 28nm.


----------



## christoph

Quote:


> Originally Posted by *Dom-inator*
> 
> What about power consumption? This is a major consideration since most of the chips produced will end up in laptops, which is probably where AMD will make the most profit or aim to anyway. The high end desktop user-base is a lot smaller in comparison.


that's correct...

but and why Intel has not kill AMD, as our friend said above, Intel has been releasing a just updated chip every step without doing ANY REAL architecture change, can't they really made a very good chip???

everyone is now focusing in power consumption which is really good, no question there, both AMD and INTEL are putting out chips with what really?? 30 - 40 % better performance but with what power consumption??

the AMD chip (don't remember which one was it) fro laps, that the claim to be 5% upgrade in performance, but with 60% (was it??) better/lower power consumption, really good.

I don't know what Intel is playing to, but can't they made a really really upgrade in chips??

what the article that said that intel's 22 and 28 nm architecture were not real?? they were like 24 and 30 nm, now I don't know if that statement is real but if it is, WHY????

what about overclock ability??? nowadays theres no chip that you can overclock the heck out of it, sure you can obtain good overclocks, but those are just decent overclocks, and thats it


----------



## Catscratch

Quote:


> Originally Posted by *christoph*
> 
> that's correct...
> 
> but and why Intel has not kill AMD, as our friend said above, Intel has been releasing a just updated chip every step without doing ANY REAL architecture change, can't they really made a very good chip???
> 
> everyone is now focusing in power consumption which is really good, no question there, both AMD and INTEL are putting out chips with what really?? 30 - 40 % better performance but with what power consumption??
> 
> the AMD chip (don't remember which one was it) fro laps, that the claim to be 5% upgrade in performance, but with 60% (was it??) better/lower power consumption, really good.
> 
> I don't know what Intel is playing to, but can't they made a really really upgrade in chips??
> 
> what the article that said that intel's 22 and 28 nm architecture were not real?? they were like 24 and 30 nm, now I don't know if that statement is real but if it is, WHY????
> 
> what about overclock ability??? nowadays theres no chip that you can overclock the heck out of it, sure you can obtain good overclocks, but those are just decent overclocks, and thats it


Can they really make a new chip from scratch and leave their current design in dust ? or
are they just playing along because killing AMD would mean someone buying them and becoming a real threat ?


----------



## PriestOfSin

Quote:


> Originally Posted by *Catscratch*
> 
> Can they really make a new chip from scratch and leave their current design in dust ? or
> are they just playing along because killing AMD would mean someone buying them and becoming a real threat ?


Intel is in a super good position right now. They release small, incremental updates to their chips at a low cost to them. Meanwhile they wait for AMD to catch up, and as soon as they do, they'll release something that's slightly faster. No reason to kill AMD....

Best case, AMD's Zen will match Skylake, maybe even beat it in a few scenarios. At that point, we'll see something from Intel that is slightly faster. Then we wait for Zen's successor, and the cycle will continue. Either way consumers are doing well, since we'd actually have a choice in things.


----------



## Kuivamaa

Quote:


> Originally Posted by *PriestOfSin*
> 
> Intel is in a super good position right now. They release small, incremental updates to their chips at a low cost to them. Meanwhile they wait for AMD to catch up, and as soon as they do, they'll release something that's slightly faster. No reason to kill AMD....
> 
> Best case, AMD's Zen will match Skylake, maybe even beat it in a few scenarios. At that point, we'll see something from Intel that is slightly faster. Then we wait for Zen's successor, and the cycle will continue. Either way consumers are doing well, since we'd actually have a choice in things.


I don't know why people think intel is holding back. This would be suicidal because at this point intel products compete only with their predecessors and you cannot sell to your own customers unless you offer real improvements. So for the most part intel is going full speed , or near that (they might take a few more design risks If AMD becomes competitive in performance again, but no guarantees). What will most likely happen in this case is intel reducing some SKU prices.


----------



## Ultracarpet

Quote:


> Originally Posted by *Kuivamaa*
> 
> I don't know why people think intel is holding back. This would be suicidal because at this point intel products compete only with their predecessors and you cannot sell to your own customers unless you offer real improvements. So for the most part intel is going full speed , or near that (they might take a few more design risks If AMD becomes competitive in performance again, but no guarantees). What will most likely happen in this case is intel reducing some SKU prices.


Ivy and Haswell weren't exactly ground breakers, but they still sold well. I think that all of the breathing room definitely allows them to slow play the hand they have, much more so than if they were in a super fast paced competitive market. Also, if I'm not mistaken, they spend a crap load on R&D but a big part of it has been focused on efficiency and mobile x86 chips, not really pushing the top end power front.


----------



## Kuivamaa

Quote:


> Originally Posted by *Ultracarpet*
> 
> Ivy and Haswell weren't exactly ground breakers, but they still sold well. I think that all of the breathing room definitely allows them to slow play the hand they have, much more so than if they were in a super fast paced competitive market. Also, if I'm not mistaken, they spend a crap load on R&D but a big part of it has been focused on efficiency and mobile x86 chips, not really pushing the top end power front.


Desktop sales are in steep decline. It is both a vicious circle and a self fulfilling prophecy. The general crowd ,especially in developing markets avoids desktops or laptops even in favor of tablets, intel focuses on tablets and power savings, putting less effort on performance which is a major selling point for desktops. Customers cannot find worthwhile upgrades for their old CPUs which leads to bigger decline of desktop sales and so on. And it is not as If we have plenty of processing power available,game engines could have been much more advanced, with RT ray tracing etc but they are being held back by hardware, GPU and CPU alike.


----------



## Tivan

Quote:


> Originally Posted by *Kuivamaa*
> 
> Desktop sales are in steep decline. It is both a vicious circle and a self fulfilling prophecy. The general crowd ,especially in developing markets avoids desktops or laptops even in favor of tablets


The sales statistics I've seen don't really support this, at least if accounting for self assembled PCs (and while considering that every new pc sold is a potential second hand pc moved on the second hand market).

But yeah please share your sources, I'm very interested in changing my view on this. With regard to people avoiding PC in favor of tablet. I see that the market is harsh for selling new PCs (especially prebuilt), considering installation base and stagnating progress. I just don't see the trends you blame weigh very heavily on the market, while seeing other trends have a stronger impact. I basically don't see a general trend of the consumer avoiding PCs. More of a market saturation issue. +ease of upgradability. (remember when you had to get a new motherboard to upgrade GPU?)

At least high end gaming components/PCs are on a rising sales trend... Of course the market would prefer to sell new PCs to _everyone_, considering the huge RnD costs. Selling as much as possible to as many people as possible is pretty much the core principle of the business. Hence the huge push for mobile and alternative devices.


----------



## Kuivamaa

Quote:


> Originally Posted by *Tivan*
> 
> The sales statistics I've seen don't really support this, at least if accounting for self assembled PCs (and while considering that every new pc sold is a potential second hand pc moved on the second hand market).
> 
> But yeah please share your sources, I'm very interested in changing my view on this. With regard to people avoiding PC in favor of tablet. I see that the market is harsh for selling new PCs (especially prebuilt), considering installation base and stagnating progress. I just don't see the trends you blame weigh very heavily on the market, while seeing other trends have a stronger impact. I basically don't see a general trend of the consumer avoiding PCs. More of a market saturation issue. +ease of upgradability. (remember when you had to get a new motherboard to upgrade GPU?)
> 
> At least high end gaming components/PCs are on a rising sales trend... Of course the market would prefer to sell new PCs to _everyone_, considering the huge RnD costs. Selling as much as possible to as many people as possible is pretty much the core principle of the business. Hence the huge push for mobile and alternative devices.


Desktop has been declining for years in favor of laptops originally- the rise of mobile meant that laptops in terms of market share are declining too.

https://www.gartner.com/newsroom/id/3090817

Home productivity (word processing, spreadsheets), multimedia consumption, internet etc meant that desktops and laptops were a necessity back in the day. Nowadays smartphones, phablets and tablets are taking over this area of utility fast.

http://www.wired.com/2015/04/googles-search-change-isnt-mobilegeddon-shows-google-works/

Browsing is mostly done on mobile devices now. This is stronger in emerging markets like India.

"According to a vision report prepared by the Internet and Mobile Association of India (IAMAI), total internet users in India are expected to be 500 million by 2017, of which smartphone data users would be 490 million."
http://www.business-standard.com/article/companies/google-says-india-mobile-search-supersedes-desktop-queries-115082400486_1.html



The only healthy PC desktop niche is indeed, hi end computers, driven by the resurgence of PC gaming. Which threatens to put PC gaming in a odd position it has never been before - that of the prime motivator for buying a desktop. For decades PC gaming was done in the home PC, since most houses needed and had one (see figures above) game development was done for it naturally.

Saturation and stagnation happens because people outside enthusiast gamers and professionals that need workstations, have no compelling reason to buy new desktops. There is no software running on desktop that offers unique ."must have" experience . There are ideas towards this direction but we do not have the hardware and we aren't getting it any time soon.So the masses get mobile devices because they offer the important stuff they need from their computer. Browsing, skype, netflix etc. Things would have been different if microsoft, intel etc would not have sat on their laurels and milked the market for years. Just imagine advanced graphical reproductions ,almost real life like, of city landmarks, holiday resorts ,hotels ,and generally areas of interest available to your screen with a simple request - what google earth/street view tries to do (and fails) the ridiculously hard way, for example. Only a cutting edge PC could offer that, but there is no affordable hardware available in the market.
.


----------



## warpuck

As far as desktop go I think this is a situation much the same as the 50s and 60s were to the US auto industry. New skins on old hardware. There were so few changes many of the basic parts were inter changeable for decades
No real real changes, then fuel prices went up.
Some thing has to come along that requires non gamers to need more processing power and Giga Bytes of of access, not giga bits.
Actually the current crop of AMD and intel low power chips can handle 2 way 8 K conversations in 50" display. The limit is Comcast, AT&T and the other providers are not finished milking the latest
equipment investments for the last drop of profit.
I don't know if your service is like mine. I involuntarily pay for 16 and get 2 or a little less. BUT when I check it, the speed site says I got 16. If I call for a service rep to investigate. Got 16 for all connections during the 4 hour period of the techs scheduled arrival.
Really what is the point in having more powerful video and CPU if your connection is already struggling to keep up with your favorite online game? I have noticed the power drawn by the video cards are different between the online and single player versions of the same game.
The only real change in the last 10 years has been the iphone and ipad.


----------



## christoph

that's right guys

Intel's releasing a updated low performance upgrade CPU, at (as our friend said above) a low cost TO THEM...

and I sure hell not gonna spend 400 dollars or even much much more to upgrade for 10 - 20% increase in performance, yes the chip sold well, very well in my opinion, but that is the problem to us consumers, Intel sells at any price they want and people buy...

but, that's why I overclock in the first place, to have a "upgraded" chip at low price that can last longer without the need to buy another chip, to seek performance, not to seek where to spend my money...


----------



## Themisseble

Hmm, I think that DX12 will give old CPUs enough power.

AMD needs to improve IPC, AMD need to improve compilers... more balance between integer and IPC.


----------



## Tivan

Quote:


> Originally Posted by *Kuivamaa*
> 
> ...


Where's tablets in third world countries as a real alternative to PC? I don't see it. Phones are really inexpensive so people would use that, but a tablet is just a phone with a bigger screen and less functionality.
Quote:


> Saturation and stagnation happens because people outside enthusiast gamers and professionals that need workstations, have no compelling reason to buy new desktops.


Indeed. I think we agree mostly about everything, aside from the part where PCs are still an extremely attractive platform to consumers, just not so much new prebuilt systems.


----------



## SCollins

Quote:


> Originally Posted by *Kuivamaa*
> 
> Desktop has been declining for years in favor of laptops originally- the rise of mobile meant that laptops in terms of market share are declining too.
> 
> https://www.gartner.com/newsroom/id/3090817
> 
> Home productivity (word processing, spreadsheets), multimedia consumption, internet etc meant that desktops and laptops were a necessity back in the day. Nowadays smartphones, phablets and tablets are taking over this area of utility fast.
> 
> http://www.wired.com/2015/04/googles-search-change-isnt-mobilegeddon-shows-google-works/
> 
> Browsing is mostly done on mobile devices now. This is stronger in emerging markets like India.
> 
> "According to a vision report prepared by the Internet and Mobile Association of India (IAMAI), total internet users in India are expected to be 500 million by 2017, of which smartphone data users would be 490 million."
> http://www.business-standard.com/article/companies/google-says-india-mobile-search-supersedes-desktop-queries-115082400486_1.html
> 
> 
> 
> The only healthy PC desktop niche is indeed, hi end computers, driven by the resurgence of PC gaming. Which threatens to put PC gaming in a odd position it has never been before - that of the prime motivator for buying a desktop. For decades PC gaming was done in the home PC, since most houses needed and had one (see figures above) game development was done for it naturally.
> 
> Saturation and stagnation happens because people outside enthusiast gamers and professionals that need workstations, have no compelling reason to buy new desktops. There is no software running on desktop that offers unique ."must have" experience . There are ideas towards this direction but we do not have the hardware and we aren't getting it any time soon.So the masses get mobile devices because they offer the important stuff they need from their computer. Browsing, skype, netflix etc. Things would have been different if microsoft, intel etc would not have sat on their laurels and milked the market for years. Just imagine advanced graphical reproductions ,almost real life like, of city landmarks, holiday resorts ,hotels ,and generally areas of interest available to your screen with a simple request - what google earth/street view tries to do (and fails) the ridiculously hard way, for example. Only a cutting edge PC could offer that, but there is no affordable hardware available in the market.
> .


according to your own chart, sale of units per year is actually steady, it is other more mobile devices that are growing. You aren't going to do any serious work like record music, use a drawing tablet, edit pictures etc on tablets, and eve laptops suck for these types of uses.

Look at your chart, PC sales are just flat, they are not in decline, the argument is simply a red herring


----------



## Kuivamaa

Quote:


> Originally Posted by *SCollins*
> 
> according to your own chart, sale of units per year is actually steady, it is other more mobile devices that are growing. You aren't going to do any serious work like record music, use a drawing tablet, edit pictures etc on tablets, and eve laptops suck for these types of uses.
> 
> Look at your chart, PC sales are just flat, they are not in decline, the argument is simply a red herring


Well let me rephrase so it becomes more clear. Anything related to PC ecosystem loses market share rapidly. From prebuilt systems to x86-64 OS shipped, PC space gets trampled under Android/iOS OS that run on ARM ISA hardware. There is a reason intel is struggling to feed its fabs you know, the computation market has boomed the last 20 years but x86 saw little to no increase. As for the "serious work", let me tell you this. I am in my mid thirties and I cannot be as efficient or as productive in general with this type of devices as I am with the PC. Hell, I even text faster with old style cell phones than touchscreens. I am not a dinosaur by any means,as I work in mobile games development . Still, any teenager out there can type, work,sketch, game and in general be more productive than me on a phone or a tablet.
Quote:


> Originally Posted by *Tivan*
> 
> Where's tablets in third world countries as a real alternative to PC? I don't see it. Phones are really inexpensive so people would use that, but a tablet is just a phone with a bigger screen and less functionality.
> Indeed. I think we agree mostly about everything, aside from the part where PCs are still an extremely attractive platform to consumers, just not so much new prebuilt systems.


We agree in principle, as for phone or tablet is more of a semantics/definition thing. I use a phablet (iphone 6+) and an apple fan friend of mine insists that "the bloody thing is a tablet".


----------



## Ultracarpet

I thought I remembered reading somewhere that tablets have actually started to decline HEAVILY in sales. They are definitely in an awkward position. A large phone can do all the socializing, browsing, and media consumption... and then a laptop/desktop can actually get real work done with full blown x86 programs and full sized keyboards, mice etc... I haven't seen a tablet being used by an adult for quite a while, it's usually just children.


----------



## Quantum Reality

Quote:


> Originally Posted by *Ultracarpet*
> 
> I thought I remembered reading somewhere that tablets have actually started to decline HEAVILY in sales. They are definitely in an awkward position. A large phone can do all the socializing, browsing, and media consumption... and then a laptop/desktop can actually get real work done with full blown x86 programs and full sized keyboards, mice etc... I haven't seen a tablet being used by an adult for quite a while, it's usually just children.


In my case, my iPad Mini is a really good TV show watching and PDF reading and occasional gameplaying + webbrowsing + SSHing device.







It has its uses, but definitely for any serious web browsing or Unix shell account work you need a laptop or desktop.


----------



## Kuivamaa

Quote:


> Originally Posted by *Ultracarpet*
> 
> I thought I remembered reading somewhere that tablets have actually started to decline HEAVILY in sales. They are definitely in an awkward position. A large phone can do all the socializing, browsing, and media consumption... and then a laptop/desktop can actually get real work done with full blown x86 programs and full sized keyboards, mice etc... I haven't seen a tablet being used by an adult for quite a while, it's usually just children.


What happened is that 7-8" tablets face steep competition from phablets, 5.5-6.5" phones in tablet size that is. A more complete product , in other words. Big tablets (9+) are inexorably eating away laptop market share still.


----------



## epic1337

Quote:


> Originally Posted by *Kuivamaa*
> 
> What happened is that 7-8" tablets face steep competition from phablets, 5.5-6.5" phones in tablet size that is. A more complete product , in other words. Big tablets (9+) are inexorably eating away laptop market share still.


indeed, ever since 8"~10" tablets had the capabilities of a full laptop on an even more portable package it became the go-to for most whom wish for ease of use.
laptops are cumbersome to carry around, even though these tablets are inferior in raw performance, the advantage of tablets still out-weights general laptops.

on the other hand, premium laptops that entirely outperforms the atom baytrail-based tablets costs quite a bit more than said tablets.


----------



## Ultracarpet

Quote:


> Originally Posted by *epic1337*
> 
> indeed, ever since 8"~10" tablets had the capabilities of a full laptop on an even more portable package it became the go-to for most whom wish for ease of use.
> laptops are cumbersome to carry around, even though these tablets are inferior in raw performance, the advantage of tablets still out-weights general laptops.
> 
> on the other hand, premium laptops that entirely outperforms the atom baytrail-based tablets costs quite a bit more than said tablets.


I don't get who is using tablets to replaces laptops. I don't think that is actually a thing. The tablet and laptop do different things. People who go to school, or have a job that needs a computer, or just have a computer for their house... a tablet will not replace the functionality. You have only apps, not full blown x86 programs. You have limited browser support- I don't know how many times I have heard people trying to do stuff on their phone/tablet and end up just saying "screw it i'm just going to get my laptop". Lugging around a massive tablet is no less cumbersome than bringing around a 13-15 inch laptop, you can just do more with the laptop.

It's the type of thing where the phone and tablet do some things pretty well, but in terms of efficiency, productivity and power, they aren't even in the same universe. Phones in my mind have created a very strong foothold due to them not only having every functionality that a tablet has, but also possessing the ability to be a PHONE. It's the tablet that is in such a weird position. It tries to be a laptop, but it falls short in many categories compared to them, and it's just a bigger smartphone, without the phone capability. Also, people are like, "yea but you can just buy a keyboard attachment for them". Oh, so it's easier to carry around a tablet AND a keyboard than a device that, even despite all of it's other obvious advantages, comes with a keyboard to BEGIN with?

What I'm getting at, is that people buy tablets to do the same stuff they do on their phone; just on a bigger screen. When they get fed up with their tablet trying to buy something, order pizza, change their router settings OR whatever have you, they go crawling back to the big cheese, grandmaster big papi, the good ol' PC.

Also, here, I found what I was talking about. http://www.businessinsider.com/tablet-sales-down-in-addition-to-ipad-2015-10

Good article: http://www.howtogeek.com/199483/tablets-arent-killing-laptops-but-smartphones-are-killing-tablets/


----------



## epic1337

Quote:


> Originally Posted by *Ultracarpet*
> 
> I don't get who is using tablets to replaces laptops. I don't think that is actually a thing. The tablet and laptop do different things. People who go to school, or have a job that needs a computer, or just have a computer for their house... a tablet will not replace the functionality. You have only apps, not full blown x86 programs. You have limited browser support- I don't know how many times I have heard people trying to do stuff on their phone/tablet and end up just saying "screw it i'm just going to get my laptop". Lugging around a massive tablet is no less cumbersome than bringing around a 13-15 inch laptop, you can just do more with the laptop.


thats a false assessment, back in the days of early atom chips that couldn't even outperform the ancient core2duo, users of those "netbooks" were even trying to run crysis in it.
while atom baytrail may be slower than an intel i5 or i3 (2C/4T) mobile processor with a dGPU, it can handle anything, even AutoCAD.

the fact that it can function as a full PC made it far more functional than a WinRT or Android based device, said tablets are quickly replacing the bottom end of the laptop segment.
plus you're using the wrong example, iPAD isn't a full OS PC, Windows surface series, Asus's T100 and the like are what you should be looking into.
Quote:


> Originally Posted by *Ultracarpet*
> 
> Oh, so it's easier to carry around a tablet AND a keyboard than a device that, even despite all of it's other obvious advantages, comes with a keyboard to BEGIN with?


the fact that you can hide the keyboard in your backpack is a BIG plus.

you're looking at this in the wrong way, people who buys tablets like these aren't after raw performance.
they much more prefer the advantage of portability, the ease of use of it, and such of those even affects productivity positively despite being slow.


----------



## Ultracarpet

Quote:


> Originally Posted by *epic1337*
> 
> thats a false assessment, back in the days of early atom chips that couldn't even outperform the ancient core2duo, users of those "netbooks" were even trying to run crysis in it.
> while atom baytrail may be slower than an intel i5 or i3 (2C/4T) mobile processor with a dGPU, it can handle anything, even AutoCAD.
> 
> the fact that it can function as a full PC made it far more functional than a WinRT or Android based device, said tablets are quickly replacing the bottom end of the laptop segment.
> plus you're using the wrong example, iPAD isn't a full OS PC, Windows surface series, Asus's T100 and the like are what you should be looking into.
> the fact that you can hide the keyboard in your backpack is a BIG plus.
> 
> you're looking at this in the wrong way, people who buys tablets like these aren't after raw performance.
> they much more prefer the advantage of portability, the ease of use of it, and such of those even affects productivity positively despite being slow.


The point is, tablets are not replacing anything. Tablet shipments have steadily been decreasing, and PC shipments had a slight bump up in 2014 and are back decreasing again slightly along side tablets. If the two were so correlated, when Tablet shipments drop, PC shipments should rise. Same goes the other way around, but they move independently.

The 2 in 1's are hard to distinguish which category to label them under. To me, if it's x86, that's more of a laptop than a tablet. Also, I think those devices too shall see a severe halt to shipments in the near future and enter maturity rather quickly. They will find themselves competing only with the low end laptops, and they may champion that position, but their design of having the main components screen side is an inherent flaw that will ultimately not allow them to ever reach the speeds of a high end laptop. Docking technology would really be the only saving grace for them, ie, having a GPU, extra battery, and other stuff in the base that is activated once connected.

Phones, low end laptops, tablets, 2 in 1's... they are all used for very simple tasks. Browsing the web, watching videos, and reading stuff pretty much. In the future, I see phones dominating that segment, but that's just me I guess. Well spec'd laptops, and desktops aren't going anywhere for a long time. Until something absolutely revolutionary comes about, that completely changes the way we communicate, and interface with machines, they are here to stay in my mind.

This all started because I was mentioning how Intel has managed to spend a ton of R&D in the last ~5-8 years, and it more so had to do with them trying to penetrate the mobile market. This has been somewhat successful, though, as the numbers show, the shipments of EVERY computing device are down. It seems that above all, stagnation in the performance of hardware and software is more to blame than anything else. People have no reason to upgrade anything. First gen tablets still work fine for 99% of things. Laptops/desktops with sandy bridge chips are still running strong for pretty much every use outside of the bleeding edge. The only people who keep upgrading are pretty much enthusiasts and gamers because things like games keep becoming more and more demanding every year... this is in no small part as to why DIY PC sales are not struggling nearly as bad as PC's in general.


----------



## KarathKasun

I have an 8 inch BayTrail tablet with 32gb eMMC+64gb microSD and 2gb of ram. It does everything a low end laptop does and even has HDMI output. It can play games like Fallout 3 with a bit of tweaking.

I have a bluetooth KB and mouse to go with it. With those its effectively a netbook, or a net-top when plugged into a monitor/TV.
This is what the KB looks like...


I can leave the KB/mouse in my bag and use the tablet for music in my car, Skype calls, Netflix, web browsing or any thing else that works well with touch input. Oh, I need to SSH into a clients equipment? I grab the KB out of my bag and go to town.

It works for me, and Im sure the 10" space would probably be better for my usage scenario, but 10" tablets were kinda crap when I got the 8". I really dig the new low end Surface, but the type cover is good and bad. With my setup I dont have to put it on a desk where as you do with the type cover.


----------



## Dom-inator

Quote:


> Originally Posted by *Kuivamaa*
> 
> I don't know why people think intel is holding back. This would be suicidal because at this point intel products compete only with their predecessors and you cannot sell to your own customers unless you offer real improvements. So for the most part intel is going full speed , or near that (they might take a few more design risks If AMD becomes competitive in performance again, but no guarantees). What will most likely happen in this case is intel reducing some SKU prices.


The reality is though, people upgrade because they "want a new laptop" because perhaps their battery doesn't hold charge like it used to, or it boots slow because they've never defragged the HDD, or simply because "yeah it's 3 years old, probably time to upgrade soon".

Intel could sell the same CPUs to the main stream over and over again, as long as the OEMs package it well with a better battery, an ssd, and HD screen.


----------



## Kuivamaa

Quote:


> Originally Posted by *Dom-inator*
> 
> The reality is though, people upgrade because they "want a new laptop" because perhaps their battery doesn't hold charge like it used to, or it boots slow because they've never defragged the HDD, or simply because "yeah it's 3 years old, probably time to upgrade soon".
> 
> Intel could sell the same CPUs to the main stream over and over again, as long as the OEMs package it well with a better battery, an ssd, and HD screen.


This could never happen for a number of reasons. First, intel laptops are using the same dies that their desktops use, the efficient ones become mobile, the leaky become desktop more or less. If they were to keep rebadging the same CPUs for laptops they would have to sell the same ones in desktops and lose huge amount of revenue anyway even with the impossible scenario of mobile sales remaining the same. Another reason for this being impossible is that intel is not yet another fabless CPU company like AMD,it is also a foundry. If they were to stagnate to a certain CPU,they would stagnate on a certain node too and that would kill their business model. No new nodes mean no new core designs, therefore no advancement on server/HTC level .This is their most lucrative market and such a practice would kill it in the long run.


----------



## svenge

Quote:


> Originally Posted by *Dom-inator*
> 
> Intel could sell the same CPUs to the main stream over and over again, as long as the OEMs package it well with a better battery, an ssd, and HD screen.


You fail to realize that current battery technology is much slower at gaining energy _density_ than Intel is at gaining energy _efficiency_ with their newer CPUs.


----------



## TriWheel

Quote:


> Originally Posted by *Cybertox*
> 
> Despite all that, this upcoming AMD CPUs line-up still wont be able to beat Intel in terms of raw performance, will just provide a better price per performance ration until a certain point.


And with that, they get my money.


----------



## Themisseble

So AMD will release APUs first.

So carrizo is 6th gen, on their slide 7th gen of APU will be release for AM4

please remember that AMD did 10% fpu performance in CB15 over piledriver with excavator and 26% improvement in CB 11.5

http://www.planet3dnow.de/cms/18564-amd-piledriver-vs-steamroller-vs-excavator-leistungsvergleich-der-architekturen/subpage-aida-cpu/

Aida CPu/FPU is showing great improvement.
http://www.planet3dnow.de/cms/18564-amd-piledriver-vs-steamroller-vs-excavator-leistungsvergleich-der-architekturen/subpage-aida-fpu/

excavator @1.6GHz is sometimes matching, beating FX 4300 stock.
http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/57615-amd-vishera-fx-6300-fx-4300-review-3.html


----------



## The Stilt

Quote:


> Originally Posted by *Themisseble*
> 
> So AMD will release APUs first.
> 
> So carrizo is 6th gen, on their slide 7th gen of APU will be release for AM4


"7th Gen." APUs are Carrizo (Bristol Ridge), they are not Zen based.


----------



## Themisseble

Quote:


> Originally Posted by *The Stilt*
> 
> "7th Gen." APUs are Carrizo (Bristol Ridge), they are not Zen based.


And also they are not carrizo based.

Yes excavator of course, but optimized for high performance and we can also expect at least GCN1.2 or newer.

Anyway we can see some serious improvement in some benchmarks.


----------



## The Stilt

Quote:


> Originally Posted by *Themisseble*
> 
> And also they are not carrizo based.
> 
> Yes excavator of course, but optimized for high performance and we can also expect at least GCN1.2 or newer.
> 
> Anyway we can see some serious improvement in some benchmarks.


They are different to Carrizo in same way as Richland was different to Trinity or Godavari was different to Kaveri.

== Badge tuning & fuse config.


----------



## ebduncan

Quote:


> Originally Posted by *svenge*
> 
> You fail to realize that current battery technology is much slower at gaining energy _density_ than Intel is at gaining energy _efficiency_ with their newer CPUs.


actually they have made several breakthroughs here recently regarding batteries.

http://www.pocket-lint.com/news/130380-future-batteries-coming-soon-charge-in-seconds-last-months-and-power-over-the-air

Battery life is slowly becoming a thing of the past, between advancements in efficiency of devices and the batteries which power them.


----------



## SpeedyVT

Quote:


> Originally Posted by *The Stilt*
> 
> They are different to Carrizo in same way as Richland was different to Trinity or Godavari was different to Kaveri.
> 
> == Badge tuning & fuse config.


It just proves that 28nm had some more life to it than the ol'tick tock approach. I think it's far better to refine a design then shrink. Give it a few years on one NM package as it'll net more R&D.


----------



## Ghoxt

Quote:


> Originally Posted by *ebduncan*
> 
> actually they have made several breakthroughs here recently regarding batteries.
> 
> http://www.pocket-lint.com/news/130380-future-batteries-coming-soon-charge-in-seconds-last-months-and-power-over-the-air
> 
> Battery life is slowly becoming a thing of the past, between advancements in efficiency of devices and the batteries which power them.


I think the other poster was referring to whats in the marketplace now. We've seen so many pie in the sky "Soon this new tech will revolutionize xxxxx" but rarely delivers in our lifetime.

There's so many forces (in my opinion) against battery efficiency as there are those wishing for better efficiency. IE corporate profits. And I choose to believe in massive corporate greed as the more agressive factor in play. Seems to be the winner time and time again.

Cough On Topic:

It's almost like an overclockers "dream" to think AMD will actually produce what they promise, so much so that many of us won't believe it at all until it drops and is tested by us.

If my motherboard had not gave out I think I'd still be using my i7 920. Other than power...oh nevermind that's another topic way off topic...ipc blah blah.


----------



## ebduncan

Quote:


> Originally Posted by *Ghoxt*
> 
> I think the other poster was referring to whats in the marketplace now. We've seen so many pie in the sky "Soon this new tech will revolutionize xxxxx" but rarely delivers in our lifetime.
> 
> There's so many forces (in my opinion) against battery efficiency as there are those wishing for better efficiency. IE corporate profits. And I choose to believe in massive corporate greed as the more agressive factor in play. Seems to be the winner time and time again.
> 
> Cough On Topic:
> 
> It's almost like an overclockers "dream" to think AMD will actually produce what they promise, so much so that many of us won't believe it at all until it drops and is tested by us.
> 
> If my motherboard had not gave out I think I'd still be using my i7 920. Other than power...oh nevermind that's another topic way off topic...ipc blah blah.


did you know that battery capacity has doubled in the past 2 years? it's a rapidly advancing tech. Tons of money is being thrown at this problem, mainly to increase the driving range of electric vehicles.

as for this original topic, I'm with you AMD has a-lot to prove with Zen. I mean they called a the Fury X a overclockers dream and well we know how that turned out. So hopefully Zen is like an overclockers paradise







(stronger word?)


----------



## epic1337

Quote:


> Originally Posted by *SpeedyVT*
> 
> It just proves that 28nm had some more life to it than the ol'tick tock approach. I think it's far better to refine a design then shrink. Give it a few years on one NM package as it'll net more R&D.


actually its fine to shrink, but just shrinking non-stop without stabilizing first is what puts things out of order, its like walking forward without looking where you step on.
the fact that intel had so many setbacks during their die-shrinks is an indication of unpreparedness, or in other words the node shrink isn't ready yet.

though someone has to take the risk in venturing into the unknown, so intel mindlessly shrinking nodes is actually a good thing.


----------



## rbarrett96

I'm still as confused as ever. What is a good "For Dummys" book to get started with? LoL


----------



## EniGma1987

Quote:


> Originally Posted by *CrazyElf*
> 
> [*] It costs more and more per new node, plus to design a new architecture.


it costs more and more because the fabs had to move to double patterning because of how the light source and optics are. Very soon (if we havent already) the fabs will move to quad patterning as they near 10nm. This is the main reason costs rise dramatically. The reason this had to be done is a new light source and optics are not ready to enable smaller nodes on a single patterning system. That system is called Extreme Ultraviolet Lithography, or EUV. It has been delayed for years and years already but is finally almost ready to go. Once it is ready we will see a huge cost reduction as the fabs can move back to single patterning for a couple more nodes, then we get a cost jump again below 7nm as the fabs once more move to double patterning EUV.


----------



## epic1337

Quote:


> Originally Posted by *EniGma1987*
> 
> it costs more and more because the fabs had to move to double patterning because of how the light source and optics are. Very soon (if we havent already) the fabs will move to quad patterning as they near 10nm. This is the main reason costs rise dramatically. The reason this had to be done is a new light source and optics are not ready to enable smaller nodes on a single patterning system. That system is called Extreme Ultraviolet Lithography, or EUV. It has been delayed for years and years already but is finally almost ready to go. Once it is ready we will see a huge cost reduction as the fabs can move back to single patterning for a couple more nodes, then we get a cost jump again below 7nm as the fabs once more move to double patterning EUV.


they really should change how they make their chips though.

i mean, low cost chips should use low cost fabs, it isn't very necessary to rush node shrinks on the cheapest of chips.
like atom for example, while smaller nodes could get it to consume notably less power, the drastic increase in cost negates such benefits.

from what i could see, they're primarily relying on node shrinks to increase performance somewhat, and decrease power consumption.
which in itself is reasonable, until it became expensive to achieve, but on the other hand they've swayed off actual uarch tweaks for such improvements.

it makes me wonder when we can see a new uarch thats from the ground-up a new design.
much like how the i-core series first came about, what was it again... nehalem?
i guess Zen is our only "new" thing that we can see anytime soon.


----------



## warpuck

I had a conversation with the BBMFIC of the MD-Boeing Satellite about the challenges of building a battery pack that will last in space, 18 years ago. ..That would about the same condition as selling a battery powered car to my wife. Your typical 2 pedal driver with a cell phone stuck to her ear.
The way they were doing it then was buying, say 5000 batteries running them through a charge discharge cycle a few times and then matching the ones with the closest amp/voltages characteristics in to a pack. Then selling or scrapping the rest. The problem with battery packs is the internal resistance changes with time and eventually one of them does not charge properly and the pack fails. They felt that research in to better manufacturing tech was the optimal goal. Selling the rejects to the auto companies was a good idea except the stock holders would not pay for the reasearch. The goals are the same. Pack as much power in as small space as possible and keep the weight down. However I don't think exploding batteries in a automobile would be a good thing. The other thing was he did not think autos would be running on batteries any time soon.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> actually its fine to shrink, but just shrinking non-stop without stabilizing first is what puts things out of order, its like walking forward without looking where you step on.
> the fact that intel had so many setbacks during their die-shrinks is an indication of unpreparedness, or in other words the node shrink isn't ready yet.
> 
> though someone has to take the risk in venturing into the unknown, so intel mindlessly shrinking nodes is actually a good thing.


Their fabs were ahead of their design. I agree with you.

Because AMD took the time to refine the design it could be a huge change!


----------



## epic1337

Quote:


> Originally Posted by *SpeedyVT*
> 
> Their fabs were ahead of their design. I agree with you.
> 
> Because AMD took the time to refine the design it could be a huge change!


i'd be feeling "why did i bother believing in this crap" if AMD were to flop this one as well.
it took them many years to design Zen, same goes with bulldozer and everyone knows what happened to that.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> i'd be feeling "why did i bother believing in this crap" if AMD were to flop this one as well.
> it took them many years to design Zen, same goes with bulldozer and everyone knows what happened to that.


I don't expect Zen to be faster than Intel's latest and greatest. As long as it's an alternative to just Intel on the x86 front I'm entirely comfortable.


----------



## epic1337

Quote:


> Originally Posted by *SpeedyVT*
> 
> I don't expect Zen to be faster than Intel's latest and greatest. As long as it's an alternative to just Intel on the x86 front I'm entirely comfortable.


i'd appreciate a budget hexa-core that matches 6core SB-E quite fairly, or that means to say, SB-IPC with excellent core-scaling.
if they get such a chip priced at under $250, i'd get one just for the heck of it.

but now that i think about it, buying an inferior chip sounds kinda stupid.
i mean sure $250 6core SB-E tier performance, but compared to a $350 6core Skylake-E? that extra $100 might just be worth it.
i hope AMD beefs up their motherboard platform to make it worth more than going into intel's camp.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> i'd appreciate a budget hexa-core that matches SB-E quite fairly, or that means to say, SB-IPC with excellent core-scaling.
> if they get such a chip priced at under $250, i'd get one just for the heck of it.
> 
> but now that i think about it, buying an inferior chip sounds kinda stupid, i hope AMD beefs up their motherboard platform to make it worth it.


IPC isn't what's getting the FPS in games. I have more of a feeling it's correlated with Intel's better memory and bus links. You're correct, but sometimes even in multi-threaded application where an i3 is completely flooded and the 8350 is hardly touch but still getting fully utilized in a game the FPS on the i3 still maintains strong. You know and I know that it should be the other way around and that the 8350 is stomping out the i3, but it's not. So we have to look for the real culprit, it's a mix of IPC and bus links.

Even if AMD makes this processor kill in every benchmark, we still won't know how well it handles with the latest and greatest GPU.


----------



## epic1337

Quote:


> Originally Posted by *SpeedyVT*
> 
> IPC isn't what's getting the FPS in games. I have more of a feeling it's correlated with Intel's better memory and bus links. You're correct, but sometimes even in multi-threaded application where an i3 is completely flooded and the 8350 is hardly touch but still getting fully utilized in a game the FPS on the i3 still maintains strong. You know and I know that it should be the other way around and that the 8350 is stomping out the i3, but it's not. So we have to look for the real culprit, it's a mix of IPC and bus links.
> 
> Even if AMD makes this processor kill in every benchmark, we still won't know how well it handles with the latest and greatest GPU.


theres hardly any review about this so we can't really have a conclusion.
but from what i've observed, i think the probable culprits are a combination of such:
* cache latency and front-end performance
* memory latency, memory bandwidth and integrated memory controller performance
* bus speed communication latency and bandwidth (PCI-E latency, chipset bus latency and such)
* CPU resource and pipeline overhead (thinking bulldozer's shared resource is causing a lot of overhead)

on a side note, if we compare thuban to intel's, we get the painted image of thuban having a very poor front-end, cache, IMC and overall IPC thats causing it to perform slower.
and thats considering that even bulldozer sometimes out-perform 6core thuban.
on another note, SSD performance on AMD's platform is inferior to Intel's platform, this indicates something is wrong.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> theres hardly any review about this so we can't really have a conclusion.
> but from what i've observed, i think the probable culprits are a combination of such:
> * cache latency and front-end performance
> * memory latency, memory bandwidth and integrated memory controller performance
> * bus speed communication latency and bandwidth (PCI-E latency, chipset bus latency and such)
> * CPU resource and pipeline overhead (thinking bulldozer's shared resource is causing a lot of overhead)
> 
> on a side note, if we compare thuban to intel's, we get the painted image of thuban having a very poor front-end, cache, IMC and overall IPC thats causing it to perform slower.
> and thats considering that even bulldozer sometimes out-perform 6core thuban.
> on another note, SSD performance on AMD's platform is inferior to Intel's platform, this indicates something is wrong.


Exactly what I mean! Did you see the 5350 performance with an SSD? The AM1 processors are on occasion better than some of the APUs, PDs and BDs. It's fatal issue is it's IPC and clock rate.


----------



## Themisseble

Quote:


> Originally Posted by *epic1337*
> 
> theres hardly any review about this so we can't really have a conclusion.
> but from what i've observed, i think the probable culprits are a combination of such:
> * cache latency and front-end performance
> * memory latency, memory bandwidth and integrated memory controller performance
> * bus speed communication latency and bandwidth (PCI-E latency, chipset bus latency and such)
> * CPU resource and pipeline overhead (thinking bulldozer's shared resource is causing a lot of overhead)
> 
> on a side note, if we compare thuban to intel's, we get the painted image of thuban having a very poor front-end, cache, IMC and overall IPC thats causing it to perform slower.
> and thats considering that even bulldozer sometimes out-perform 6core thuban.
> on another note, SSD performance on AMD's platform is inferior to Intel's platform, this indicates something is wrong.


Here is good research

http://www.extremetech.com/computing/177099-secrets-of-steamroller-digging-deep-into-amds-next-gen-core


----------



## KarathKasun

Quote:


> Originally Posted by *SpeedyVT*
> 
> Exactly what I mean! Did you see the 5350 performance with an SSD? The AM1 processors are on occasion better than some of the APUs, PDs and BDs. It's fatal issue is it's IPC and clock rate.


No, its a fatal flaw with the current AMD desktop platform. The SATA controller in the AM1 platform is actually AMD's newest design AND its in the same silicon as the CPU.


----------



## Cyro999

Quote:


> You're correct, but sometimes even in multi-threaded application where an i3 is completely flooded and the 8350 is hardly touch but still getting fully utilized in a game the FPS on the i3 still maintains strong. You know and I know that it should be the other way around and that the 8350 is stomping out the i3, but it's not. So we have to look for the real culprit


Look up Amdahl's law, it easily explains it. Memory and cache performance is just a part of effective performance-per-clock and i believe you are overcomplicating it here.

If you don't understand well and have half an hour, check out this video: https://www.youtube.com/watch?v=HwitUDD0B1A

Two cores at 4ghz outperform 6 cores at 2ghz in some of the best threaded games that we have, even on the exact same CPU (just disabling cores and adjusting clocks) due to the parallelization % not being very high. Very very few games pass 70% or so with the vast majority being lower.

"Amdahl's law" is the best 2 words to describe the bulldozer failure. This simple inconvenience is the achilles heel of the architecture which otherwise performed pretty decently in highly parallel workloads


----------



## epic1337

Quote:


> Originally Posted by *Themisseble*
> 
> Here is good research
> 
> http://www.extremetech.com/computing/177099-secrets-of-steamroller-digging-deep-into-amds-next-gen-core


i see... i think i've seen this before, though i must've forgotten.


Quote:


> Steamroller made some significant changes to the cache structure. The L1 instruction cache is 50% larger and is three-way associative, rather than two-way. This increases the chance that data locations will be located within the L1. There are still significant differences in L1 structure between AMD and Intel. *Each Intel core has a 32K instruction cache that's eight-way associative, whereas Steamroller has a 96KB shared cache that's just three-way associative.* Cache conflicts remain a significant problem - when two different threads are running in the same module, they can overwrite each others' code.


Quote:


> Steamroller's L2 cache remains very slow compared to Intel, but it's marginally faster than Piledriver. The big news, however, is the improved L2 write throughput. AIDA64 4.2 underscores just how significantly the new chip boosts performance.


Quote:


> The improved L1 caches come with an oddity, however. According to Agner's test, L1 cache throughput goes down when two threads run on two different modules at the same time. Note that this should never be the case - the L1 caches don't share any data when running one thread per module - but throughput still takes a whack. AMD is still investigating these findings.


these alone tells a lot in AMD's work, though theres more to quote on the front-end issue.


----------



## Themisseble

Quote:


> Originally Posted by *epic1337*
> 
> i see... i think i've seen this before, though i must've forgotten.
> 
> 
> 
> these alone tells a lot in AMD's work, though theres more to quote on the front-end issue.


Yeah, but now we need excavator vs steamroller research.


----------



## Hueristic

Quote:


> Originally Posted by *ebduncan*
> 
> actually they have made several breakthroughs here recently regarding batteries.
> 
> http://www.pocket-lint.com/news/130380-future-batteries-coming-soon-charge-in-seconds-last-months-and-power-over-the-air
> 
> Battery life is slowly becoming a thing of the past, between advancements in efficiency of devices and the batteries which power them.


Wasn't aware of so many breakthroughs lately. thanks for this.


----------



## burticus

Here's to hoping Zen is awesome. Even though I finally broke down and upgraded the old Phenom II to a Skylake i5, my first Intel chip since.... Prescott P4? So like 2003 or 4? (laptops don't count). AMD fanboy at heart and I hope they can at least be quasi-comptetive. Intel needs the competition.


----------



## TranquilTempest

I hope Zen is either amazingly good, good enough to be real competition for Intel, or amazingly bad, such that AMD is so screwed they get bought out by someone with the means to compete with Intel. I'd rather not see AMD constantly limping around, perpetually failing to catch up.


----------



## christoph

when Intel gets really, and I mean completely new architecture out, they gonna start out where AMD ended their work...

If you guys know what I mean


----------



## 45nm

Quote:


> Originally Posted by *TranquilTempest*
> 
> I hope Zen is either amazingly good, good enough to be real competition for Intel, or amazingly bad, such that AMD is so screwed they get bought out by someone with the means to compete with Intel. I'd rather not see AMD constantly limping around, perpetually failing to catch up.


It's bad for consumers as it is currently. Intel is leading in terms of price/performance with Skylake and performance with the X99 platform. Then you have the FX series from AMD which is at best equivalents to Sandy Bridge and Ivy Bridge LGA1155 counterparts.


----------



## Themisseble

http://wccftech.com/ashes-singularity-alpha-dx12-benchmark/

here is very good benchmark for CPU performance. AotS instructions are not AVX, but SSE.
Yet Athlon x4 860K scores only 31 FPS while i5 3570K scores 55 FPS. AMD will definitely have to improve compilers and IPC.

I have posted this before AIDA benchmark which is also based on SSE instructions



and FPU


We can all see decent boost... but benchmarks vs real word. I have notice that excavator FPU might be at least 10-20% faster than steamrollers or pilderivers.


----------



## Schmuckley

Something just popped into my head:
And it's "Integer divided by 0" in the middle of a very blue screen and nothing else.


----------



## PiOfPie

Quote:


> Originally Posted by *Themisseble*
> 
> Yeah, but now we need excavator vs steamroller research.




According to this TR article and Anand article:

-Branch predictor is now 768 entries, up from 512.
-FPU has a new instruction that lets it perform a flush if branch misprediction occurs.
-L1 data is now 32KB and 8-way associative. Latency is still 3-4 clocks.
-L2 has been halved to 1MB, but will be brought back to 2MB for Bristol Ridge/desktop XV if it drops.


----------



## epic1337

they should concentrate on getting L1 and L2 latencies down to the lowest possible, its what sustains high-throughput compute on the pipelines.
and speaking of which, if i remember right L2 cache is also shared by IGP, so improving L2 latency would tremendously help.

and on another note, when will they start doing something about their UMI interconnect?
even intel had gone DMI 3.0 with nearly 4times more bandwidth than AMD's.


----------



## Kuivamaa

Quote:


> Originally Posted by *epic1337*
> 
> they should concentrate on getting L1 and L2 latencies down to the lowest possible, its what sustains high-throughput compute on the pipelines.
> and speaking of which, if i remember right L2 cache is also shared by IGP, so improving L2 latency would tremendously help.
> 
> and on another note, when will they start doing something about their UMI interconnect?
> even intel had gone DMI 3.0 with nearly 4times more bandwidth than AMD's.


XV was largely depreciated in 2012 already. Those are valid points but ultimately XV core performance won't make any difference to AMD financies. Just two modules, way too small market footprint and a sidegrade/trade-off at best to FX octocores (better ST performance, worse MT).


----------



## epic1337

Quote:


> Originally Posted by *Kuivamaa*
> 
> XV was largely depreciated in 2012 already. Those are valid points but ultimately XV core performance won't make any difference to AMD financies. Just two modules, way too small market footprint and a sidegrade/trade-off at best to FX octocores (better ST performance, worse MT).


no no, i meant AMD's goal as a whole, it applies to Zen too.
if Zen has crap cache implementation, then we can expect memory intensive tasks to be crap as well.


----------



## looncraz

Quote:


> Originally Posted by *Kuivamaa*
> 
> XV was largely depreciated in 2012 already. Those are valid points but ultimately XV core performance won't make any difference to AMD financies. Just two modules, way too small market footprint and a sidegrade/trade-off at best to FX octocores (better ST performance, worse MT).


XV isn't intended as an upgrade path for anyone running an FX-8xxx or FX-6xxx, that's Zen's job.

XV is all about having known-good technology (APU footprint) to offer APU upgrades. Depending on the GPU performance, I may well jump to an XV APU with DDR4 for my HTPC so I can dump my old 7870XT onto eBay as well as my Kaveri setup completely (minus RAM, which I can always use somewhere ;-)). That may well pay for most of the upgrade. The 7870XT is major overkill, but the Kaveri graphics are an easy 50% too slow, so there's a great deal of ground to cover for that to be viable.

On a side note:

It looks like my most desired foundry scenario seems to be confirmed: AMD will be using Samsung AND GloFo to produce their chips on 14nm LPP, giving them an edge over nVidia process-wise and enabling them to have much more reliable production.


----------



## EniGma1987

Quote:


> Originally Posted by *epic1337*
> 
> and on another note, when will they start doing something about their UMI interconnect?
> even intel had gone DMI 3.0 with nearly 4times more bandwidth than AMD's.


AMD uses a proprietary link to the southbridge? I thought it just ran over Hyper Transport on the 990FX platform and was PCI-E on the FM sockets?
Honestly I would love for AMD to go full Hyper Transport on Zen and use four, 32-bit HT 3.1 links (HT 3.1 bandwidth is 25.6GB each way, PCI-E 3.0 is 15.75GB each way). Since Hyper Transport 3.1 is able to natively communicate PCI-E as well and has more bandwidth than PCI-E 3.0 the GPUs would be able to natively communicate fine with no performance hit, but it would allow AMD to make a special version of their GPUs where they could communicate at the full, higher HT bandwidth when used on an AMD system giving an advantage. This would also be a way to go up against Nvidia's NVLink interface. Four links would mean the southbridge and other misc. external IO could run on one and have HUGE bandwidth available to never bottleneck the fastest SSDs, and then two links could be given to card expansion slots (would be the equivalent of 32 PCI-E 3.0 lanes to expansion slots) and the last HT link could just be null on the desktop platform but used for processor communication on server MP systems.


----------



## KyadCK

Quote:


> Originally Posted by *EniGma1987*
> 
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> and on another note, when will they start doing something about their UMI interconnect?
> even intel had gone DMI 3.0 with nearly 4times more bandwidth than AMD's.
> 
> 
> 
> AMD uses a proprietary link to the southbridge? I thought it just ran over Hyper Transport on the 990FX platform and was PCI-E on the FM sockets?
> Honestly I would love for AMD to go full Hyper Transport on Zen and use four, 32-bit HT 3.1 links (HT 3.1 bandwidth is 25.6GB each way, PCI-E 3.0 is 15.75GB each way). Since Hyper Transport 3.1 is able to natively communicate PCI-E as well and has more bandwidth than PCI-E 3.0 the GPUs would be able to natively communicate fine with no performance hit, but it would allow AMD to make a special version of their GPUs where they could communicate at the full, higher HT bandwidth when used on an AMD system giving an advantage. This would also be a way to go up against Nvidia's NVLink interface. Four links would mean the southbridge and other misc. external IO could run on one and have HUGE bandwidth available to never bottleneck the fastest SSDs, and then two links could be given to card expansion slots (would be the equivalent of 32 PCI-E 3.0 lanes to expansion slots) and the last HT link could just be null on the desktop platform but used for processor communication on server MP systems.
Click to expand...

HT is used to talk to the north-bridge, which on 990FX then splits it into 42 PCI-e lanes (x4 for SB, 32 for "GPUs" (x16/x8/x4), 6 for onboard and x1/x4 lanes and stuff.) They run 8-bit bi or 16-bit uni directional 2.6Ghz HT bus on 990FX.

Beyond AM3+, HT is not used as a north-bridge is not used. They rely on a new XBar and PCI-e directly.


----------



## KarathKasun

HyperTransport is 16 bit uplink and 16 bit downlink. Two links on consumer boards 3-4 on dual socket boards.

8/8, 16/8 and 8/16 are compatibility modes that some BIOS's give you the option of selecting

HT 3.1 can do encapsulated PCIe 3.0 AFAIK and uses a similar protocol.


----------



## KyadCK

Quote:


> Originally Posted by *KarathKasun*
> 
> HyperTransport is 16 bit uplink and 16 bit downlink. Two links on consumer boards 3-4 on dual socket boards.
> 
> 8/8, 16/8 and 8/16 are compatibility modes that some BIOS's give you the option of selecting
> 
> HT 3.1 can do encapsulated PCIe 3.0 AFAIK and uses a similar protocol.


Such a shame they use HT 3.0 on AM3+ then. 3.1 is irrelevant. HT is dead for AMD after the NB is on the die.

And no. Consumer AMD CPUs get one active HT link, not two. You need one HT link per thing you are connecting to, and in AM3's case that means one to the NB. You can not run dual-CPU with an 8350. Socket C32 and G34 are LGA, they aren't even compatible.

AM3+; one link
CPU1 -> Chipset

C32; two links
CPU1 -> Chipset
CPU1 -> CPU2

G34; "four" links, dual-die CPUs connect with another HT link
CPU1-1 -> CPU1-2
CPU1-1 -> Chipset
CPU1-1 -> CPU2-1
CPU1-1 -> CPU3-1
CPU1-1 -> CPU4-1

And in extreme situations, you can find 8P boards with G34 that allows access to the final HT links, but not in a full mesh. They have more physical connections on the die (4, actually), but you do not get them all.


----------



## superstition222

Heard a rumor that Intel is going to dump the current Skylake Z socket for Kaby Lake. That's a short socket life, eh?

Also, since Broadwell is embarrassing Skylake in gaming benchmarks perhaps AMD should be targeting Broadwell instead of Skylake.









I am assuming that Kaby Lake is going to basically be Skylake + Broadwell's EDRAM. So, if we're talking about Zen we might want to talk about L4 cache.


----------



## Kuivamaa

Quote:


> Originally Posted by *superstition222*
> 
> Heard a rumor that Intel is going to dump the current Skylake Z socket for Kaby Lake. That's a short socket life, eh?
> 
> Also, since Broadwell is embarrassing Skylake in gaming benchmarks perhaps AMD should be targeting Broadwell instead of Skylake.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I am assuming that Kaby Lake is going to basically be Skylake + Broadwell's EDRAM. So, if we're talking about Zen we might want to talk about L4 cache.


Broadwell has no market footprint whatsoever. EDRAM is also pointless on desktop,as it skyrockets costs. i7 .5775C always rivaled i7 5820k in price and I guess most people would rather get 50% extra CPU over some EDRAM.


----------



## guttheslayer

Quote:


> Originally Posted by *superstition222*
> 
> Heard a rumor that Intel is going to dump the current Skylake Z socket for Kaby Lake. That's a short socket life, eh?
> 
> Also, since Broadwell is embarrassing Skylake in gaming benchmarks perhaps AMD should be targeting Broadwell instead of Skylake.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I am assuming that Kaby Lake is going to basically be Skylake + Broadwell's EDRAM. So, if we're talking about Zen we might want to talk about L4 cache.


According to the leaked Roadmap they will be a Skylake eDRAM cache variant that is release separately from Kaby Lake, So I assume kaby lake is something quite different from that.


----------



## superstition222

Quote:


> Originally Posted by *Kuivamaa*
> 
> Broadwell has no market footprint whatsoever.


Because Intel has artificially constrained supply to prevent it from competing with Skylake.
Quote:


> Originally Posted by *Kuivamaa*
> 
> EDRAM is also pointless on desktop


nope

Broadwell embarrasses Skylake in more than one site's gaming benches. And, I'm not talking about integrated graphics.


----------



## Kuivamaa

Quote:


> Originally Posted by *superstition222*
> 
> Because Intel has artificially constrained supply to prevent it from competing with Skylake.
> nope
> 
> Broadwell embarrasses Skylake in more than one site's gaming benches. And, I'm not talking about integrated graphics.


Intel hasn't artificially constrained supply, their 14nm yields were horrid initially. Broadwell is more expensive mostly because EDRAM is expensive. If intel is to start offering Kaby Lake with a similar configuration ,expect a quad core to rival in price BW-E hexacores and not be far away from the octocores. Who would opt for such a thing?


----------



## superstition222

Quote:


> Originally Posted by *Kuivamaa*
> 
> Intel hasn't artificially constrained supply, their 14nm yields were horrid initially.


So, you're saying it made sense to offer all the Broadwell parts initially when yields were horrid and then not offer them when the yields got better.
Quote:


> Originally Posted by *Kuivamaa*
> 
> Broadwell is more expensive mostly because EDRAM is expensive.


Which changes my point about them artificially constraining the supply so as not to have Skylake compete with them how? Why not sell a more inefficient part when you can sell EDRAM parts as an upgrade down the road to the same customers?
Quote:


> Originally Posted by *Kuivamaa*
> 
> If intel is to start offering Kaby Lake with a similar configuration ,expect a quad core to rival in price BW-E hexacores and not be far away from the octocores. Who would opt for such a thing?


Broadwell MSRP was vastly lower than the current gouging price due to Intel's refusal to make enough of them to meet demand.


----------



## Kuivamaa

Quote:


> Originally Posted by *superstition222*
> 
> So, you're saying it made sense to offer all the Broadwell parts initially when yields were horrid and then not offer them when the yields got better.
> Which changes my point about them artificially constraining the supply so as not to have Skylake compete with them how? Why not sell a more inefficient part when you can sell EDRAM parts as an upgrade down the road to the same customers?
> Broadwell MSRP was vastly lower than the current gouging price due to Intel's refusal to make enough of them to meet demand.


I am not sure I can follow your train of thought. The only reason we got eDRAM on desktop is that intel's laptop range chips share dies with the mainstream range of desktop chips. This solution was targeted for iris pro integrated graphics enabled chips, CPU related workload benefits are a pleasant side effect. It makes much more sense to buy a 5820k over a 5775C if you want a better CPU. Yes, the L4 chip will beat the hexacore in ST workloads that are cache sensitive but it will otherwise either be very close (normal ST) or get handily beaten (MT workloads).

By the time 14nm yields were improved , intel was ready to move to Skylake, there was absolutely no reason whatsoever to have two generations of processors side by side in the market,it would saturate the supply lines, create unnecessary inventory burden to OEMs and confuse consumers. You see, L4 isn't a core characteristic of broadwell, something without which this core could not perform. It is more of a curiosity ,an experiement from intel or just the outcome of surplus mobile cores that intel brought to desktop as a stunt to appease shareholders for totally cancelling desktop BW. Because let us not kid ourselves, this is exactly what happened, the market footprint of dekstop broadwell is practically nonexistant.

If a full desktop broadwell range of products was to be fielded, it would be just like every other desktop intel chip since Nehalem that came before or after them. With L3 only. If Kaby Lake is to have yet another pair of iris pro style of SKUs they will too be sold in low numbers. There is no market big enough for those, plain and simple. Zen has nothing to gain by specifically targeting SKUs that have no market presence.


----------



## superstition222

Quote:


> Originally Posted by *Kuivamaa*
> I am not sure I can follow your train of thought.


Broadwell was promised to 1150 owners. It was barely produced and then sold at prices far above the MSRP.

Broadwell beats Skylake in many gaming benchmarks, despite a clock deficit. Intel didn't want people to see that the immense Skylake hype (and it _was_ immense back when it first started to hit) was vastly overblown and that Broadwell could have substituted for desktop enthusiast Skylake while waiting for Kaby Lake - especially a soldered part - obviating the need to buy a new board and RAM.

Broadwell outperforms Skylake in gaming benches which shouldn't happen so it was buried. AMD's Zen may want to target Broadwell, not Skylake, to reach the gaming market. As a result, the company may want to consider some type of fast L4 cache.

The main thrust of my point is that numerous posts in this topic have acted like Skylake is the unattainable goal for Zen while Broadwell beats it in gaming, a fact that I consider rather droll.


----------



## Kuivamaa

Quote:


> Originally Posted by *superstition222*
> 
> Broadwell was promised to 1150 owners. It was barely produced and then sold at prices far above the MSRP.
> 
> Broadwell beats Skylake in many gaming benchmarks, despite a clock deficit. Intel didn't want people to see that the immense Skylake hype (and it _was_ immense back when it first started to hit) was vastly overblown and that Broadwell could have substituted for desktop enthusiast Skylake while waiting for Kaby Lake - especially a soldered part - obviating the need to buy a new board.
> 
> Broadwell outperforms Skylake in gaming benches which shouldn't happen so it was buried. AMD's Zen may want to target Broadwell, not Skylake, to reach the gaming market. As a result, the company may want to consider some type of fast L4 cache.
> 
> The main thrust of my point is that numerous posts in this topic have acted like Skylake is the unattainable goal for Zen while Broadwell beats it in gaming, a fast that I consider rather droll.


The problem here is that If broadwell manages that (I remember anandtech claiming some games did benefit from L4) it is not for free. eDRAM costs. I seriously doubt AMD can muster the necessary economies of scale that can make this solution feasible from a market standpoint, especially when it doesn't benefit the iGPU all that much when resolution is increased. So it is once again the 5775C vs 5820k dilemma. For a given price point I would rather being offered more Zen cores than just some extra L4.


----------



## superstition222

An Intel rep said, in an interview, that 32MB would have probably been sufficient. I doubt that that costs a lot of money.

And, if the EDRAM in those Broadwell chips would have been as costly as you think why would Intel have set the MSRP where it did, especially given the initial yield problem?
Quote:


> Originally Posted by *Kuivamaa*
> So it is once again the 5775C vs 5820k dilemma.


Do you have any data on the cost to back this up? The cores of Broadwell are tiny. If the iGPU is disabled (for enthusiast setups with a dedicated GPU) the cost of producing a Broadwell chip is even lower.

Using Ebay and Amazon random sellers' gouging Broadwell prices is really not useful. If Intel were producing sufficient quantities the price would be much lower.

Intel is out to maximize its profits. If that means holding back a better part and not using solder so be it.


----------



## Kuivamaa

Quote:


> Originally Posted by *superstition222*
> 
> An Intel rep said, in an interview, that 32MB would have probably been sufficient. I doubt that that costs a lot of money.
> 
> And, if the EDRAM in those Broadwell chips would have been as costly as you think why would Intel have set the MSRP where it did, especially given the initial yield problem?
> Do you have any data on the cost to back this up? The cores of Broadwell are tiny. If the iGPU is disabled (for enthusiast setups with a dedicated GPU) the cost of producing a Broadwell chip is even lower.


It has nothing to do with the size of cores since we are comparing the same chip with or without L4, and of course disabling an iGPU which is integral part of the die ,therefore already produced, does not reduce production costs in any way. Adding something to where nothing exists, even without accounting for design/implementation costs, always affects the price, unless the entity that sells this product is willing to accept lower margins ,but this is a cost as well.

We do not disagree on the benefits of eDRAM in performance of course. It is a good thing to have, no questions about it. My main point is the viability of this solution from a financial point of view. If the cost was negligible, we would see more of it, I am certain.

Although I will give you this: If AMD would prepare a special edition , an Extreme version of Zen CPU with some sort of DRAM targeted to boost ST and gaming workloads that would top benchmark tables ,it would be a major publicity win. Think of articles about the big comeback after 11 years etc. A halo product able to glorify its lesser siblings and a bragging point. But I am afraid it either can't be done or it is just too expensive for AMD.


----------



## epic1337

14nm yields were horrid in the broadwell days, by the time 14nm became stable, skylake was out.
theres no longer a point in mass producing broadwell when skylake is already out in mass production.
broadwell-C will still be available as a laptop part though, since theres no skylake alternative to it.

as for AMD... they should fix their cache latencies first, adding an L4 DRAM cache would have no benefit due to their sad cache implementation.

on the other hand, am i the only one who thinks that HBM will replace the niche purpose of an eDRAM?
HBM should be equally fast, yet cheaper to implement even in large capacities, a single HBM stack could be as much as 1GB with existing HBM chips.


----------



## Kuivamaa

Quote:


> Originally Posted by *epic1337*
> 
> 14nm yields were horrid in the broadwell days, by the time 14nm became stable, skylake was out.
> theres no longer a point in mass producing broadwell when skylake is already out in mass production.
> broadwell-C will still be available as a laptop part though, since theres no skylake alternative to it.
> 
> as for AMD... they should fix their cache latencies first, adding an L4 DRAM cache would have no benefit due to their sad cache implementation.
> 
> on the other hand, am i the only one who thinks that HBM will replace the niche purpose of an eDRAM?
> HBM should be equally fast, yet cheaper to implement even in large capacities, a single HBM stack could be as much as 1GB with existing HBM chips.


I am not qualified to answer that but I don't think HBM is fast enough to act as CPU cache, latency should be pretty high. It definitely makes DRAM redundant for iGPU purposes, though.


----------



## Particle

Quote:


> Originally Posted by *Kuivamaa*
> 
> I am not qualified to answer that but I don't think HBM is fast enough to act as CPU cache, latency should be pretty high. It definitely makes DRAM redundant for iGPU purposes, though.


It would be fast enough to act as a last-level cache akin to Intel's use of eDRAM in some of their processors featuring Iris Pro graphics. We can see from their use that it does help performance when used as CPU cache.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> 14nm yields were horrid in the broadwell days, by the time 14nm became stable, skylake was out.
> theres no longer a point in mass producing broadwell when skylake is already out in mass production.


Did you miss the part where Broadwell is outperforming Skylake in gaming benchmarks even at a 500 MHz lower clock?

Or the fact that Broadwell buyers could have kept their 1150 motherboards and RAM.


----------



## EniGma1987

Quote:


> Originally Posted by *epic1337*
> 
> 14nm yields were horrid in the broadwell days, by the time 14nm became stable, skylake was out.


Broadwell and Skylake use different 14nm process nodes though. They both had yield problems, but the skylake node is meant for a bit more high frequency design.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> Did you miss the part where Broadwell is outperforming Skylake in gaming benchmarks even at a 500 MHz lower clock?
> 
> Or the fact that Broadwell buyers could have kept their 1150 motherboards and RAM.


no, i did not.

strictly speaking, broadwell-C was only good because it had an eDRAM in it, remove the eDRAM and it'd be hardly better than Haswell.
in which case, manufacturing a broadwell SKU without the eDRAM out isn't an option anymore due to skylake.

or the fact that people bought AM3+ because its supposed to have steamroller and up to excavator?


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> no, i did not.
> 
> strictly speaking, broadwell-C was only good because it had an eDRAM in it, remove the eDRAM and it'd be no better than Haswell.
> in which case, manufacturing a broadwell SKU without the eDRAM with skylake out isn't an option anymore.


Of course there's no point in removing the EDRAM. The point was that Broadwell would have been just fine for gamers while they waited for Kaby Lake.

Instead of giving 1150 owners the option of a competitive CPU they are forcing people to replace their boards and RAM.

Broadwell was buried because it outperformed Skylake in games.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> Of course there's no point in removing the EDRAM. The point was that Broadwell would have been just fine for gamers while they waited for Kaby Lake.
> 
> Instead of giving 1150 owners the option of a competitive CPU they are forcing people to replace their boards and RAM.
> 
> Broadwell was buried because it outperformed Skylake in games.


you think of it by how they're handling the manufacturing part of these SKUs.

having broadwell out in mass production means they'd cut into skylake's mass production, they only have so many fabrication plants that can do 14nm in large quantities.
the only option for broadwell to exist in the fabs is if its shifted into low-production, its not a question of whether broadwell outperforms skylake or not, it'd be wasting skylake's fabrication time.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> you think of it by manufacturing standards, having broadwell out in mass production means they'd cut into skylake's mass production, they only have so many fabrication plants that can do 14nm in large quantities.


I don't see the appeal in releasing a part with inferior performance when it requires a new board and RAM. There is at least one review site that put out an article saying Intel should have released a Skylake part with EDRAM because of Broadwell's performance so I am no the only one with the opinion that this is a case of what's good for Intel's wallet and not what's good for consumers.

This is the sort of thing that happens when there isn't enough competition. It also undercuts Intel's massive hyping of Skylake for an 1150 part at 500 MHz lower clockrate to beat it.


----------



## epic1337

i give up.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> i give up.


How would Broadwell be cutting into Skylake's production if Broadwell, not Skylake, was what was being produced while we wait for Kaby Lake?

Skylake should have had better gaming performance than Broadwell with discreet GPUs to justify a new board and RAM.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> How would Broadwell be cutting into Skylake's production if Broadwell, not Skylake, was what was being produced while we wait for Kaby Lake?
> 
> Skylake should have had better gaming performance than Broadwell with discreet GPUs to justify a new board and RAM.


then we won't be having skylake at all, skylake-E will also be delayed as a result.

thats why, you either ditch broadwell for skylake, or ditch skylake for broadwell, thats technological progress, regardless if its better than the predecessor or not.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> then we won't be having skylake at all, skylake-E will also be delayed as a result.


You'd have Skylake, with EDRAM, like it should have been - instead of Skylake that gets beaten by an 1150 part at 500 MHz lower clock.

No point in pushing people into a new board and RAM for less gain.


----------



## epic1337

thats why i said i give up, theres no point in talking to you, biased as you are.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> thats why i said i give up, theres no point in talking to you, biased as you are.


Ad hominem is the refuge of those lacking a rebuttal.

There is nothing biased about pointing out reality. In reality, Broadwell spanks Skylake in gaming benchmarks despite having a 500 MHz lower clock and not requiring buyers to replace their 1150 boards and RAM.


----------



## epic1337

for the record, there is no eDRAM skylake nor kaby lake for desktop, you're looking at BGA chips.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> for the record, there is no eDRAM skylake for desktop.


For the record, Zen isn't out yet.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> For the record, Zen isn't out yet.


ah right, this was a Zen thread.

what was i saying back then... ah yes, AMD should fix their cache implementation first, before opting for an L4 Cache eDRAM.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> AMD should fix their cache implementation first.


I'm sure AMD has done that by now, with Keller having been there and all.

So, what is more pressing is something like L4 to help Zen's performance compete with Intel's current stuff.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> I'm sure AMD has done that by now, with Keller having been there and all.
> 
> So, what is more pressing is something like L4 to help Zen's performance compete with Intel's current stuff.


i doubt it, keller came in when Zen was already in the finalization phase, theres hardly anything he can do to change the architecture.

and on that note, so far, even considering excavator's improved cache implementation, its still far inferior compared to intel's.
if they can't even get their cache implementation right, i'd expect them to fail spectacularly in implementing an eDRAM cache as well.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> i doubt it, keller came in when Zen was already in the finalization phase, theres hardly anything he can do to change the architecture.
> 
> and on that note, so far, even considering excavator's improved cache implementation, its still far inferior compared to intel's.
> if they can't even get their cache implementation right, i'd expect them to fail spectacularly in implementing an eDRAM cache as well.


Zen isn't going to be much like Excavator.


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> Zen isn't going to be much like Excavator.


i'm talking about AMD's experiences in how they're implementing the cache.

even if the road is different, how they walk is still the same.


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> i'm talking about AMD's experiences in how they're implementing the cache.
> 
> even if the road is different, how they walk is still the same.


I think AMD has taken enough criticism over Bulldozer and its children. Zen is clearly going to be very different. The road is going to be making a chip that's a lot like Intel's - although some APU designs might be quite different (HBM 2 and such).


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *superstition222*
> 
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> AMD should fix their cache implementation first.
> 
> 
> 
> I'm sure AMD has done that by now, with Keller having been there and all.
> 
> So, what is more pressing is something like L4 to help Zen's performance compete with Intel's current stuff.
Click to expand...

@epic1337

You are correct, and it is one thing that AMD was aware of and focused on in the design process for Zen.. I am too lazy to pull up the articles however I believe that Keller also discussed this and basically the L1 and L2 cache design as well as flow/process was changed to produce lower latency and better memory management.

@superstition222

There is a possibility and has been talked about in regards to an L4 cache system. However depending on the design process it may or may not actually benefit.

The both of you..

HBM may be a good alternative as long as the other latencies are low. As it is high bandwidth low frequency.. however that also depends on what functions are incorporated on it. I think that the R&D for AMD is better spent on the inner die cache and FPU's to improve performance than anything else.

I think there will be a day where we see a L4 Cache for the APU side of things to benefit the iGPU in which this is where HBM would shine as it would then alleviate the need for having to wait on actual RAM itself. Even DDR4 ram is not fast enough to allocate for a GPU and that has ALWAYS been the short comings of the AMD APUs was the Latency and RAM slowness for the integrated GPU
Quote:


> Originally Posted by *epic1337*
> 
> Quote:
> 
> 
> 
> Originally Posted by *superstition222*
> 
> I'm sure AMD has done that by now, with Keller having been there and all.
> 
> So, what is more pressing is something like L4 to help Zen's performance compete with Intel's current stuff.
> 
> 
> 
> i doubt it, keller came in when Zen was already in the finalization phase, theres hardly anything he can do to change the architecture.
> 
> and on that note, so far, even considering excavator's improved cache implementation, its still far inferior compared to intel's.
> if they can't even get their cache implementation right, i'd expect them to fail spectacularly in implementing an eDRAM cache as well.
Click to expand...

Actually Keller was brought in to design Zen.. Keller left right after the design process was completed.
Quote:


> Originally Posted by *superstition222*
> 
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> i'm talking about AMD's experiences in how they're implementing the cache.
> 
> even if the road is different, how they walk is still the same.
> 
> 
> 
> I think AMD has taken enough criticism over Bulldozer and its children. Zen is clearly going to be very different. The road is going to be making a chip that's a lot like Intel's - although some APU designs might be quite different (HBM 2 and such).
Click to expand...

I don't think we will see an HBM APU until Zen+ or later comes out.


----------



## epic1337

Quote:


> Originally Posted by *F3ERS 2 ASH3S*
> 
> Actually Keller was brought in to design Zen.. Keller left right after the design process was completed.


i must've remembered it wrong, though speaking of keller, i heard he was kicked out because Zen was delayed?


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *epic1337*
> 
> Quote:
> 
> 
> 
> Originally Posted by *F3ERS 2 ASH3S*
> 
> Actually Keller was brought in to design Zen.. Keller left right after the design process was completed.
> 
> 
> 
> i must've remembered it wrong, though speaking of keller, i heard he was kicked out because Zen was delayed?
Click to expand...

Keller Started AM back in 2012
http://www.amd.com/en-us/press-releases/Pages/JimKellerJoinsAMD-2012aug01.aspx

Everything that I have read positioned him toward being on time, and traditionally like every other project he does he completed and moved over to another company to start on the next project. In fact ironically Keller went over to Samsung as chief architect from AMD back in October. Which was right after AMD (3 weeks I think) publicly announced that Zen's design was completed.

Which here is something to think about..

AMD is now confirmed to use Samsung 14nm .. keller designed Zen, I wonder if that had anything to do with Samsungs decision to bring them over.. wonder what that means for Sumsungs mobile devision.


----------



## epic1337

good point, theres a fine connection between the two.
it may be possible that AMD "loaned" keller for samsung's fabs.


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *epic1337*
> 
> good point, theres a fine connection between the two.
> it may be possible that AMD "loaned" keller for samsung's fabs.


IF that is the case and there is actually a connection then that would be nothing but great news for AMD. However Looking at Kellers past record you see him jumping once a project is completed. Only other thing that could possibly come out of this is a partnership with Samsung starting to build for future projects and maybe at some point a share of IP to boost either/or's presence.

In any event his departure from AMD was not a bad one and everything is on track as has been stated.

I have been following Zen very closely and so for I have only heard good news. only one thing that may disappoint me is if Zen's Frequency can't match a level needed to perform on the same level as Intel. but that will only be determined once we see leaks of ES chips


----------



## Kuivamaa

Quote:


> Originally Posted by *epic1337*
> 
> i doubt it, keller came in when Zen was already in the finalization phase, theres hardly anything he can do to change the architecture.


Actually Keller rejoined AMD in the summer of 2012, when SR and XV Opterons were cancelled in favor of what became Zen. He is the guy that made the blueprint ,Zen is his baby.


----------



## Tojara

I'm going to go out and say that an L4 cache is most likely not worth it for only CPU cores if you're not doing a monolithic chip for cache sensitive workloads. When you have that many caches the latency already starts to get up there and having it is hardly more space efficient than increasing the lower level cache size simply because the latency difference between the main system memory is much larger. DRAM latency going up might make that false in the long run.

HBM isn't really what you want to run as an L4 cache, but it might be what you want to run as system memory with a smart shader array and an relatively lower latency L4 for the CPU. For that to happen it would need a more flexible solution than 2.5D stacking, it loses the competitive edge when you have to run a certain size of stack stuck with the processor itself.

Just for the heck of it, rough Zen (8C/16T) are estimations, based on this slide:

Area scaling from 32nm (PD) to 14nm (Zen) would roughly be ~1/2.67. An eight core PD chip would fit in somewhere along the lines of 125-140mm^2 die on 14nm, as all parts can't get the full benefit from scaling onto a smaller process node. By the same notion that die would cost around 70% of a Vishera die currently, assuming the costs haven't changed. A large part of that would be neglected by increased design costs regardless.

Zen is reasonably likely to cut the large L2 of earlier Bulldozer generations to half (512KB), or possibly even less to reduce latency even further. Latter would be an area benefit, but would be a detriment in cache sensitive workloads, unless the L3 is large and fast enough to compensate. What route they choose depends on the APUs and whether or not they continue to have the memory pools unified via L2/L3 or some other implementation.
I'm going to guess 512KB/core with slightly worse than average scaling.

When 8MB of L2 takes about 56mm^2 on 32nm 4MB on 14nm should take somewhere around 56*140mm^2/315mm^2/2~~13mm^2. If L3 is increased to 2MB/core it should take around 88mm^2*140/315*2~~75mm^2. The cores themselves roughly being doubled (~1.8x for a wider design, 10% extra for SMT) would be around 96mm^2*140/315*1,8*1,1~~85mm^2. The IO would be a fair bit larger than on Vishera, 72/315*140*1,5~~48mm^2.

That would result the wanted 8C/16T die at around 220mm^2, and would cost about 70% more than a Vishera chip without taking frequency scaling into account. Certainly not too bad for something that should hold it's own against Intel's newer six cores. Ivy-E being 257mm^2 for six and HW-E being 355mm^2 for eight cores on 22nm makes it seem pretty reasonable for a roughly similarly performing CPU on a smaller node, though the Intels would likely have beefier I/O and wider memory controllers. I've also went with a relatively conservative estimate on most numbers, so 200-210mm^2 could be more likely depending on the configuration they choose on.


----------



## epic1337

512KB L2 per core sounds about right, if they can get it down to a very low latency then the capacity being small will be mitigated by it's responsiveness.
if you've looked into intel's implementation, their L2 is always 256KB per core, while L3 is normally 1MB per thread (factoring in hyperthreading) or up to 2MB per core.

that means to say, L2 capacity being small doesn't matter, its the latency that counts.
or to be more precise, if L2 latency is 10clock cycle for example, a cache miss would only add a 20clock cycle penalty.
as opposed to a 30clock cycle latency, with the penalty a wooping 60clock cycle.


----------



## EniGma1987

Quote:


> Originally Posted by *epic1337*
> 
> or the fact that people bought AM3+ because its supposed to have steamroller and up to excavator?


That is a good point. AMD advertised it as going to have 2 more CPU series launches so some people bought into the platform for longevity reasons, instead AMD shafted everyone by calling the FX-9xxx line a new series to make "good on their word" without having to release what was really promised







Ya Steamroller would have sucked anyway because of the lower clocks, but it still would have been nice and I know I was disappointed to not be able to upgrade from my 8350.

Quote:


> Originally Posted by *Tojara*
> 
> Zen is reasonably likely to cut the large L2 of earlier Bulldozer generations to half (512KB), or possibly even less to reduce latency even further. Latter would be an area benefit, but would be a detriment in cache sensitive workloads, unless the L3 is large and fast enough to compensate. What route they choose depends on the APUs and whether or not they continue to have the memory pools unified via L2/L3 or some other implementation.


it has been said that Zen cores will have their own L1 and L2, and a group of 4 Zen cores together will share a section of L3. Each group of 4 cores has it's own L3.
In my mind, that would mean a server chip will probably have some sort of L4 cache design for the sharing of cache data between cores in heavy MT workloads since each group has their own l3. This could be an on die cache or an off die, on package. The latter makes more sense if the consumer chips have no l4 but server does. Otherwise AMD would need separate dies and we know that wont happen.


----------



## KarathKasun

Quote:


> Originally Posted by *KyadCK*
> 
> Such a shame they use HT 3.0 on AM3+ then. 3.1 is irrelevant. HT is dead for AMD after the NB is on the die.
> 
> And no. Consumer AMD CPUs get one active HT link, not two. You need one HT link per thing you are connecting to, and in AM3's case that means one to the NB. You can not run dual-CPU with an 8350. Socket C32 and G34 are LGA, they aren't even compatible.
> 
> AM3+; one link
> CPU1 -> Chipset
> 
> C32; two links
> CPU1 -> Chipset
> CPU1 -> CPU2
> 
> G34; "four" links, dual-die CPUs connect with another HT link
> CPU1-1 -> CPU1-2
> CPU1-1 -> Chipset
> CPU1-1 -> CPU2-1
> CPU1-1 -> CPU3-1
> CPU1-1 -> CPU4-1
> 
> And in extreme situations, you can find 8P boards with G34 that allows access to the final HT links, but not in a full mesh. They have more physical connections on the die (4, actually), but you do not get them all.


Each HT link is a pair consisting of a 16 bit upstream and a 16 bit downstream connection, which you could configure as a ring in theory.
Dual die CPU's technically have 4 HT link pairs exposed, but all 4 links on one die are never active at the same time AFAIK.
There is a difference between cache coherent and non-coherent links.

And because HT is switchable you can get it in a full mesh, several high end super computers actually used it in this fashion as well as for rack to rack networking.


----------



## superstition222

Quote:


> Originally Posted by *Tojara*
> 
> I'm going to go out and say that an L4 cache is most likely not worth it for only CPU cores if you're not doing a monolithic chip for cache sensitive workloads. When you have that many caches the latency already starts to get up there and having it is hardly more space efficient than increasing the lower level cache size simply because the latency difference between the main system memory is much larger. DRAM latency going up might make that false in the long run.


How is it that Broadwell is beating Skylake in gaming despite a 500 MHz clock deficit?


----------



## Themisseble

What is happening

http://finance.yahoo.com/q?s=AMD

?


----------



## STEvil

they're approaching their year end high? or are you referring to something else?


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> How is it that Broadwell is beating Skylake in gaming despite a 500 MHz clock deficit?


thats because intel's implementation of the cache is superb, that means to say even with the large cache size it barely impacts the latency.
and on the other hand, it also points out that the IMC has too much latency to sustain high-loads on the pipelines.
or in the practical of sense, with the increase in sustained output, you end up pulling up the fluctuations in the framerate, of course you'd end up with a higher average framerate.
and as a side note, this is why higher clocks of DDR3 or DDR4 increases framerate by a noticeable degree, not that dramatic though since the IMC still has that huge latency.

if you compare intel's and AMD's implementation of both the cache and IMC, you could see AMD falling behind in both latency and bandwidth.
that means, AMD doesn't have the experience nor the expertise in implementing an equal to or superior cache than intel, even Fury's HBM leaves little to be desired.
or in other words, AMD opting for an L4 cache may not be as effective as anyone thinks, as opposed to making the chip cost dramatically higher.


----------



## Tojara

Quote:


> Originally Posted by *EniGma1987*
> 
> it has been said that Zen cores will have their own L1 and L2, and a group of 4 Zen cores together will share a section of L3. Each group of 4 cores has it's own L3.
> 
> In my mind, that would mean a server chip will probably have some sort of L4 cache design for the sharing of cache data between cores in heavy MT workloads since each group has their own l3. This could be an on die cache or an off die, on package. The latter makes more sense if the consumer chips have no l4 but server does. Otherwise AMD would need separate dies and we know that wont happen.


I'm still not set on the L3 part as it was from a fabricated slide that hasn't been confirmed. L1 and L2 hierarchy is indeed likely to be similar to Intel, there are tricks that you can do but for the most part you don't want massive differences if you're aiming for high IPC consistently.

L4 will be workload dependent, I'm guessing Zen might not have it as the general port setup and FMACs seem to aim more for power and area efficiency rather than highest possible IPC. The workloads that really want that already have high-end Xeons as Power8 as competition, for the vast majority of tasks designing the core with L4 would be a detriment, although only a slight one.
Quote:


> Originally Posted by *superstition222*
> 
> How is it that Broadwell is beating Skylake in gaming despite a 500 MHz clock deficit?


Because the L4 absolutely makes it faster at certain workloads, I have not once said it doesn't help. I'm saying *L4 is not space efficient* if you don't have an iGPU. The extra ~5% IPC it brings on average is not good enough for a CPU for doubling the core+L1/2/3 area. If you were to use that for the cores+L1+L2+L3 you'd get at least double that. It's as pointless as arguing that Skylake (and Kaby Lake) has higher IPC than Zen, power efficiency and die area are at least equally important.

I also love how you use a single benchmark where the L4 is really useful, and ignore everything else where the gain is low single percentages. Not to ignore that you could be one of the hundred people who'd buy a chip solely due to a compression benchmark, but on average it's hardly anything worth of note.
http://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc/3
Quote:


> Originally Posted by *Anandtech*
> 
> Sandy Bridge to Ivy Bridge: Average ~5.0% Up
> Ivy Bridge to Haswell: Average ~11.2% Up
> Haswell to Broadwell: Average ~3.3% Up
> 
> Thus in a like for like environment, when eDRAM is not explicitly a driver for performance, Broadwell gives a 3.3% gain over Haswell. That's a take home message worth considering, but it also affords the difference in performance between an architecture update and a node change.
> 
> Cycling back to our WinRAR test, things look a little different. Ivy Bridge to Haswell gives only a 3.2% difference, but the eDRAM in Broadwell slaps on another 23.8% performance increase, dropping the benchmark from 76.65 seconds to 63.91 seconds. When eDRAM counts, it counts a lot.


----------



## superstition222

Quote:


> Originally Posted by *Tojara*
> 
> I'm still not set on the L3 part as it was from a fabricated slide that hasn't been confirmed. L1 and L2 hierarchy is indeed likely to be similar to Intel, there are tricks that you can do but for the most part you don't want massive differences if you're aiming for high IPC consistently.
> 
> L4 will be workload dependent, I'm guessing Zen might not have it as the general port setup and FMACs seem to aim more for power and area efficiency rather than highest possible IPC. The workloads that really want that already have high-end Xeons as Power8 as competition, for the vast majority of tasks designing the core with L4 would be a detriment, although only a slight one.
> Because the L4 absolutely makes it faster at certain workloads, I have not once said it doesn't help. I'm saying *L4 is not space efficient* if you don't have an iGPU. The extra ~5% IPC it brings on average is not good enough for a CPU for doubling the core+L1/2/3 area. If you were to use that for the cores+L1+L2+L3 you'd get at least double that. It's as pointless as arguing that Skylake (and Kaby Lake) has higher IPC than Zen, power efficiency and die area are at least equally important.
> 
> I also love how you use a single benchmark where the L4 is really useful, and ignore everything else where the gain is low single percentages. Not to ignore that you could be one of the hundred people who'd buy a chip solely due to a compression benchmark, but on average it's hardly anything worth of note.
> http://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc/3


Gaming is driving higher-end PC sales and that trend is only picking up. Broadwell beating Skylake in gaming shouldn't happen.

Because it's happening it means Intel could have done better with the 1st generation of Skylake. Space efficiency isn't such an issue since the four tiny cores of Broadwell function just fine for gaming purposes since the chip is, after all, beating Skylake.

It's too much to ask to ask people to buy new boards and RAM just to get a CPU that has worse performance than a 1150 part.

Sure, roll out Skylake E or whatever but Broadwell should have been the desktop mainstream CPU not this inferior version of Skylake.


----------



## warpuck

Quote:


> Originally Posted by *epic1337*
> 
> thats because intel's implementation of the cache is superb, that means to say even with the large cache size it barely impacts the latency.
> and on the other hand, it also points out that the IMC has too much latency to sustain high-loads on the pipelines.
> or in the practical of sense, with the increase in sustained output, you end up pulling up the fluctuations in the framerate, of course you'd end up with a higher average framerate.
> and as a side note, this is why higher clocks of DDR3 or DDR4 increases framerate by a noticeable degree, not that dramatic though since the IMC still has that huge latency.
> 
> if you compare intel's and AMD's implementation of both the cache and IMC, you could see AMD falling behind in both latency and bandwidth.
> that means, AMD doesn't have the experience nor the expertise in implementing an equal to or superior cache than intel, even Fury's HBM leaves little to be desired.
> or in other words, AMD opting for an L4 cache may not be as effective as anyone thinks, as opposed to making the chip cost dramatically higher.


Since I am a analogue kinda guy, would that be similar to narrow band high gain circuits ? I think I remember some design stuff about it. Using LC circuits to store trapping unwanted artifacts and feeding it back as B+ to the power supply rail ? or forward energy to the next phase in the pipe line? That would work, but would only be good for specific set of frequencies.
I was good at figuring out fails in the analogue when computers had millions of discrete components and only fair with digital part. You know back when a computer was the size of a refrigerator.
It would seem to be logical to design with a frequency optimization in mind.
Multiplexing circuits always gave me a fit when they went bad. Things were so much simpler then.


----------



## Redwoodz

Quote:


> Originally Posted by *superstition222*
> 
> Did you miss the part where Broadwell is outperforming Skylake in gaming benchmarks even at a 500 MHz lower clock?
> 
> Or the fact that Broadwell buyers could have kept their 1150 motherboards and RAM.


What are you talking about? I have not scoured the internet for Broadwell reviews,mainly because it is irrelevant at this point,but every Skylake review I have seen states otherwise. Maybe you should add some proof.









Here's mine. http://www.guru3d.com/articles_pages/msi_z170a_xpower_gaming_titanium_review,15.html


That's Broadwell losing to Haswell(same stock clocks as Broadwell) and getting stomped by Skylake.


----------



## epic1337

Quote:


> Originally Posted by *warpuck*
> 
> Since I am a analogue kinda guy, would that be similar to narrow band high gain circuits ? I think I remember some design stuff about it. Using LC circuits to store trapping unwanted artifacts and feeding it back as B+ to the power supply rail ? or forward energy to the next phase in the pipe line? That would work, but would only be good for specific set of frequencies.
> I was good at figuring out fails in the analogue when computers had millions of discrete components and only fair with digital part. You know back when a computer was the size of a refrigerator.
> It would seem to be logical to design with a frequency optimization in mind.
> Multiplexing circuits always gave me a fit when they went bad. Things were so much simpler then.


i'm trying to wrap my head around what you just said... well in a way, yes i think.

in a sense, a very-low latency cache would act as a pass-through buffer that would supply the most used data to the pipelines at a very, very high rate.
which means, the lower the latency, the higher the effects it would give the pipelines, since there won't be any stalling for access and such.
now as a bonus, if the cache could be made with such a low latency, then any cache misses would have next to irrelevant side-effects, since well theres hardly any latency penalty to begin with.

anyway, its a concept of faster vs wider.
more cache (in terms of size) would decrease the chances of a cache miss, somewhat, since 256KB~1MB of cache can only hold so much...
on the other hand, faster cache (less latency, more bandwidth) at the cost of cache size, would increase cache miss but decrease the latency penalty.
and a faster cache even increase maximum throughput of the pipeline, since well, the pipeline would wait for a shorter time to access the cached data.

intel had the correct idea in this, small but extremely fast cache is the way to go.

Quote:


> Originally Posted by *Redwoodz*
> 
> What are you talking about? I have not scoured the internet for Broadwell reviews,mainly because it is irrelevant at this point,but every Skylake review I have seen states otherwise. Maybe you should add some proof.


http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/6

apparently, games find the eDRAM in broadwell-C as a surprisingly effective way in increasing performance.
though on the other hand, skylake does non-game applications a tad better than broadwell as expected.


----------



## Redwoodz

Quote:


> Originally Posted by *epic1337*
> 
> http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/6
> 
> apparently, games find the eDRAM in broadwell-C as a surprisingly effective way in increasing performance.
> though on the other hand, skylake does non-game applications a tad better than broadwell as expected.


Eh,1 or 2 games show a frame or 2 increase,easily within the margin of error. Also,it is a terrible overclocker,which is why Intel went with off die VRM with Skylake,and it's way too expensive.


----------



## epic1337

Quote:


> Originally Posted by *Redwoodz*
> 
> Eh,1 or 2 games show a frame or 2 increase,easily within the margin of error. Also,it is a terrible overclocker,which is why Intel went with off die VRM with Skylake,and it's way too expensive.


thats comparing it to a 500Mhz slower broadwell-C, since i7-5775C is 3.7Ghz, while i7-6700K is 4.2Ghz.

and yes, broadwell-C is expensive to manufacture, combined with low yields, not to mention most of the good chips goes to the mobile platform.


----------



## Redwoodz

Quote:


> Originally Posted by *epic1337*
> 
> thats comparing it to a 500Mhz slower broadwell-C, since i7-5775C is 3.7Ghz, while i7-6700K is 4.2Ghz.
> 
> and yes, broadwell-C is expensive to manufacture, combined with low yields, not to mention most of the good chips goes to the mobile platform.


Haswell is 3.9,22nm instead of 14nm and performs neck and neck with Broadwell too.


----------



## superstition222

Quote:


> Originally Posted by *Redwoodz*
> What are you talking about? I have not scoured the internet for Broadwell reviews,mainly because it is irrelevant at this point,but every Skylake review I have seen states otherwise. Maybe you should add some proof


http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/8
http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/6



Quote:


> Originally Posted by *Tech Report*
> The 6700K improves on the 4790K by a tad, but the 5775C upstages it with a freakish string of gaming performance wins, even though its prevailing clock speed is ~500MHz lower.
> 
> Heck, if you're a gamer sporting a Haswell-compatible motherboard and looking for an upgrade, this little desktop Broadwell may be a better choice than the 6700K. So long as your motherboard is Broadwell-compatible via a BIOS upgrade, the 5775C could deliver gaming performance that's superior to Skylake, provided your games of choice benefit as much from that L4 cache as the ones we tested did.


Too bad for them Intel won't let that happen at a reasonable price.






*Intel's Skylake lineup is robbing us of the performance king we deserve*
http://arstechnica.com/gadgets/2015/09/intels-skylake-lineup-is-robbing-us-of-the-performance-king-we-deserve/
Quote:


> Originally Posted by *Peter Bright*
> Intel's range of sixth-generation Core processors, codenamed Skylake, is now public. And boy, am I disappointed.
> 
> ...
> 
> amid all that Broadwell mess was a truly monstrous chip: an almost mythical beast, the Core i7 5775C. What made this part so special? It paired two features. One is mundane: the processor is socketed rather than soldered, meaning that enthusiasts can use it in self-built systems, pairing it with the precise range of components that they want.
> 
> The other is rather more exotic: the chip has Iris Pro graphics, and with it, 128MB of eDRAM. Crack open the processor and it has not one big chip in its package but two, with the eDRAM nestled alongside the processor itself. The RAM is primarily there to accelerate graphics operations, but Intel's design means that it is not dedicated to this task. For Broadwell, it functions as a large, high bandwidth level 4 cache (the other 3 levels being part of the processor chip itself). Skylake shakes up the design somewhat, changing the topology and allowing the eDRAM to cache even more stuff, but the effect is still the same: a monstrously large cache for a mainstream commodity processor.
> 
> And if Broadwell is anything to go by, that cache does real work. Tech Report's review of the first Skylake processors included scores from the 5775C, and in games the performance was remarkable. The 65W 3.3-3.7GHz i7-5775C beat the 91W 4-4.2GHz Skylake i7-6700K. The Skylake processor has a higher clock speed, it has a higher power budget, and its improved core means that it executes more instructions per cycle, but that enormous L4 cache meant that the Broadwell could offset its disadvantages and then some. In CPU-bound games such as Project Cars and Civilization: Beyond Earth, the older chip managed to pull ahead of its newer successor.
> 
> ...
> 
> in memory-intensive workloads, such as some games and scientific applications, the cache is better than 21 percent more clock speed and 40 percent more power. That's the kind of gain that doesn't come along very often in our dismal post-Moore's law world.
> 
> Those 5775C results tantalized us with the prospect of a comparable Skylake part. Pair that ginormous cache with Intel's latest-and-greatest core and raise the speed limit on the clock speed by giving it a 90-odd W power envelope, and one can't help but imagine that the result would be a fine processor for gaming and workstations alike.
> 
> But imagine is all we can do because Intel isn't releasing such a chip.
> 
> Intel has made many processors aimed at the performance-hungry enthusiast segment; the entire purpose of the overclockable K parts is to appeal to those customers.
> 
> a chip that gives me a bunch of extra performance just by including a cache that's bigger than the first hard disk I ever owned? I'd be glad to throw some money Intel's way for that.
> 
> Intel could have had a Skylake processor that was exciting to gamers and anyone else with performance-critical workloads. For the right task, that extra memory can do the work of a 20 percent overclock, without running anything out of spec. It would have been the must-have part for enthusiasts everywhere. And I'm tremendously disappointed that the company isn't going to make it.


----------



## epic1337

you're forgetting the most important image.



not everyone plays games all the time mind you, rather 24/7 gamers are the absolute minority.
though i wonder why TR didn't add in the legendary i7-5820K?


----------



## looncraz

I thought this was a Zen thread?







Why all the Broadwell vs Skylake chatter?


----------



## Redwoodz

Quote:


> Originally Posted by *looncraz*
> 
> I thought this was a Zen thread?
> 
> 
> 
> 
> 
> 
> 
> Why all the Broadwell vs Skylake chatter?


It is about the performance benefits of the EDRAM cache.


----------



## caenlen

Quote:


> Originally Posted by *epic1337*
> 
> you're forgetting the most important image.
> 
> 
> 
> not everyone plays games all the time mind you, rather 24/7 gamers are the absolute minority.
> though i wonder why TR didn't add in the legendary i7-5820K?


a very good point. i might sell my 2500k for 80 bucks when zen comes out, if the price is right on zen anyway. i honestly love gaming, but i am beginning to retire the amount of time spent on it, trying to get more varied lifestyle.


----------



## epic1337

Quote:


> Originally Posted by *caenlen*
> 
> a very good point. i might sell my 2500k for 80 bucks when zen comes out, if the price is right on zen anyway. i honestly love gaming, but i am beginning to retire the amount of time spent on it, trying to get more varied lifestyle.


quite true, after i've mellowed down, i'm barely playing any AAA titles if at all, used to go 12hours straight even forgetting lunch.
now i days i mostly transcode absurdly encoded videos, read LN in PDF format, and watch the forums for any interesting posts, you can see why my i3-2100 is still alive and kicking.

if what the speculations here are accurate enough (ivy~haswell ipc) then an 8C/16T Zen "should" be priced at $400 at most, with performance slightly (e.g. 10%~20%) superior to i7-5820K.
this means, its in a good spot, even if the IPC were to fall short of expectations, the performance to cost estimates shouldn't be any worse than intel's i7 in a general sense.
as for 10C/20T Zen, i'm pretty sure they'll put it at $800~$1000 to pit against intel's broadwell-E or skylake-E.


----------



## TranquilTempest

Quote:


> Originally Posted by *epic1337*
> 
> you're forgetting the most important image.
> 
> 
> 
> not everyone plays games all the time mind you, rather 24/7 gamers are the absolute minority.
> though i wonder why TR didn't add in the legendary i7-5820K?


Most have games as their most resource demanding program though.


----------



## superstition222

Quote:


> Originally Posted by *TranquilTempest*
> 
> Most have games as their most resource demanding program though.


He is also forgetting that people had to buy *a new board and RAM* to get that tiny increase.

And Skylake still loses in games.

And, he is posting as if it's an either-or - instead of looking at the logic presented by Peter Bright:
Quote:


> Originally Posted by *Peter Bright*
> in memory-intensive workloads, such as some games and scientific applications, the cache is better than 21 percent more clock speed and 40 percent more power.
> 
> Those 5775C results tantalized us with the prospect of a comparable Skylake part. Pair that ginormous cache with Intel's latest-and-greatest core and raise the speed limit on the clock speed by giving it a 90-odd W power envelope, and one can't help but imagine that the result would be a fine processor for gaming and workstations alike. But imagine is all we can do because Intel isn't releasing such a chip.
> 
> More interestingly, perhaps, Intel appears to be willing to make changes to its processor designs that are at least partially motivated by the demands of overclockers. In previous Intel processors, the chips used the same base clock speed for both the processor's integrated PCIe controller and everything else. This meant that overclocking by increasing the base clock speed was undesirable, as it also overclocked the PCIe bus. In Skylake, those clocks are separated from one another, enabling base clock to be increased without pushing the PCIe bus out of spec. This makes overclocking more flexible and reliable-it opens the doors to overclocking both by increasing the multiplier and by increasing the base clock, instead of just the multiplier-but doesn't immediately appear to have any other big upside. In comparison to this change, shipping socketed eDRAM parts surely wouldn't be much extra work at all.
> 
> Intel could have had a Skylake processor that was exciting to gamers and anyone else with performance-critical workloads. For the right task, that extra memory can do the work of a 20 percent overclock, without running anything out of spec. It would have been the must-have part for enthusiasts everywhere. And I'm tremendously disappointed that the company isn't going to make it.


He says he has no idea why. I'll tell him. The EDRAM is being withheld for future chips so Intel can ask people to upgrade again.


----------



## epic1337

you don't even need to get DDR4, reusing your DDR3 kit is fine with a D3 board.


----------



## looncraz

Quote:


> Originally Posted by *epic1337*
> 
> you don't even need to get DDR4, reusing your DDR3 kit is fine with a D3 board.


DDR3L or DDR4 is all I thought Skylake could support. DDR3L isn't commonly used in existing systems.


----------



## epic1337

Quote:


> Originally Posted by *looncraz*
> 
> DDR3L or DDR4 is all I thought Skylake could support. DDR3L isn't commonly used in existing systems.


not quite, conforming to intel's standard, you can use regular DDR3 but you have to undervolt it to at least 1.4v, which is easily done with a 1.5v kit.
the only issues you'd encounter is those kits that are 1.65v or more, it'd probably need to get downclocked to reach 1.4v.

and for the record, skylake's IMC isn't that weak, it can tolerate an overvolt by as much as 1.65v.
i mean, even haswell wasn't supposed to exceed 1.5v, yet we see people overclocking their DRAM kit with voltages over 1.7v.
and despite such high voltages, we rarely see issues concerning this, those people who'd been asking "is 1.65v dimm safe on haswell" always gets the reply of "its safe".

now considering this forum is supposed to be an OVERCLOCKING forum, i doubt people would pass up on overvolting their kits, despite the risk of damage.


----------



## escksu

This is AMD's absolutely last chance to come out with something competitive. If they fail, they can close down........


----------



## christoph

Quote:


> Originally Posted by *escksu*
> 
> This is AMD's absolutely last chance to come out with something competitive. If they fail, they can close down........


just because you say so???

or whats your point??

America my friend is NOT THE WORLD...


----------



## TranquilTempest

Quote:


> Originally Posted by *christoph*
> 
> just because you say so???
> 
> or whats your point??
> 
> America my friend is NOT THE WORLD...


It has nothing to do with america, it has everything to do with AMD running out of money.


----------



## Olivon

Quote:


> Originally Posted by *escksu*
> 
> This is AMD's absolutely last chance to come out with something competitive. If they fail, they can close down........


Yeah. It's do or die now.


----------



## epic1337

what they're saying makes sense, can AMD keep their negative sales as it is? wouldn't they run out of money in the far off future as they are right now?
if investors actually realizes that they're losing money with AMD's current status, they'd pull out and end up bankrupting AMD.

nobody wants to "bet" on a losing horse, the risks of losing outweighs the benefits.
actually can AMD even double or triple their current sales with just a successful Zen?


----------



## ku4eto

Quote:


> Originally Posted by *epic1337*
> 
> what they're saying makes sense, can AMD keep their negative sales as it is? wouldn't they run out of money in the far off future as they are right now?
> if investors actually realizes that they're losing money with AMD's current status, they'd pull out and end up bankrupting AMD.
> 
> nobody wants to "bet" on a losing horse, the risks of losing outweighs the benefits, actually can AMD even double or triple their current sales with just a successful Zen?


If Zen performs as they are saying, yes, AMD will double/triple the sales when compared to the FX chips.


----------



## epic1337

Quote:


> Originally Posted by *ku4eto*
> 
> If Zen performs as they are saying, yes, AMD will double/triple the sales when compared to the FX chips.


No!

i meant as an entire company, can they double or triple their yearly sales as an entirety, meaning their revenue would double or triple in value.
otherwise increasing just 10%~20% for the entire company due to one segment doing 3times better is just negligible, too small to even offset their losses.


----------



## ku4eto

Quote:


> Originally Posted by *epic1337*
> 
> No!
> 
> i meant as an entire company, can they double or triple their yearly sales as an entirety, meaning their revenue would double or triple in value.
> otherwise increasing just 10%~20% for the entire company due to one segment doing 3times better is just negligible, too small to even offset their losses.


Humm, this means quarter market share to be at least 25% in CPU division and 40% in GPU. Not impossible, but also not very probable.


----------



## epic1337

exactly, which means what they say makes sense, Zen wouldn't probably save them from sinking, but it'll slow the process down.
what AMD actually needs is for all their segments, and all their products to simultaneously become the most popular choice, which is entirely impossible for numerous other reasons.


----------



## EniGma1987

Quote:


> Originally Posted by *epic1337*
> 
> No!
> 
> i meant as an entire company, can they double or triple their yearly sales as an entirety, meaning their revenue would double or triple in value.
> otherwise increasing just 10%~20% for the entire company due to one segment doing 3times better is just negligible, too small to even offset their losses.


Remember that the XBox and PS4 chips will get shrinks too and that means more profit for AMD. They are also making the new Nintendo chip which gets them sales. TThat area of the company should be doing just fine. If Zen performs it will do just fine as well, and if it performs quite well and is energy efficient AMD could gain a bit in server share as well. Then all they have to worry about is whether the Arctic Islands GPUs will outperform Nvidia on the new nodes.


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *EniGma1987*
> 
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> No!
> 
> i meant as an entire company, can they double or triple their yearly sales as an entirety, meaning their revenue would double or triple in value.
> otherwise increasing just 10%~20% for the entire company due to one segment doing 3times better is just negligible, too small to even offset their losses.
> 
> 
> 
> Remember that the XBox and PS4 chips will get shrinks too and that means more profit for AMD. They are also making the new Nintendo chip which gets them sales. TThat area of the company should be doing just fine. If Zen performs it will do just fine as well, and if it performs quite well and is energy efficient AMD could gain a bit in server share as well. Then all they have to worry about is whether the Arctic Islands GPUs will outperform Nvidia on the new nodes.
Click to expand...

I want to put a but in here.. however it does look like their long term plan is actually a good one. That and going with both GF and Samsung fabs helps them IMMENSELY


----------



## KarathKasun

AFAIK, AMD sold/licensed the designs for the XB1/PS4. They get the equivalent of royalties from each chip, a die shrink wont change that income.


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *KarathKasun*
> 
> AFAIK, AMD sold/licensed the designs for the XB1/PS4. They get the equivalent of royalties from each chip, a die shrink wont change that income.


No however the partnership with Nintendo will add to that income also with 3 major consoles supporting their architecture you are going to see coders use their faster instruction sets for the chips with better multi threading. It already is happening which is a fantastic thing and something that people complained that AMD was whining about..

AMD is now instead of whining is actually forcing the adaptation of what they want to see in the market to aid their sales.


----------



## epic1337

guys you have to realize that AMD is a very BIG business with two fronts ( CPU & GPU ), it isn't gonna be saved by a few hundred million USD of additional sales...

they've been losing billions of revenue in the years they've been running the business, you think Zen can get that back within a few short years?
not to mention that they're still running on negative, despite the numerous workforce dismissals.


----------



## Tojara

When they've been running at a few hundred M loss/qtr the swing doesn't have to be massive. Even needing to double their CPU sales seems rather pessimistic, though it wouldn't be the worst thing for the long run. To achieve ~200M positive revenue/qtr that would very much keep them in the running would need maybe 60-70% more market share at most, if Zen will have profit margins more familiar to what they had earlier instead of the embarrassingly low amount it is right now. If what they have wasn't being sold for peanuts market share would be far less of an issue, I imagine they have a few percent of servers at best. Impossible to say what will happen, but I've seen companies make comebacks from far worse situations. Either that or they'll go bankrupt.


----------



## escksu

Quote:


> Originally Posted by *Tojara*
> 
> When they've been running at a few hundred M loss/qtr the swing doesn't have to be massive. Even needing to double their CPU sales seems rather pessimistic, though it wouldn't be the worst thing for the long run. To achieve ~200M positive revenue/qtr that would very much keep them in the running would need maybe 60-70% more market share at most, if Zen will have profit margins more familiar to what they had earlier instead of the embarrassingly low amount it is right now. If what they have wasn't being sold for peanuts market share would be far less of an issue, I imagine they have a few percent of servers at best. Impossible to say what will happen, but I've seen companies make comebacks from far worse situations. Either that or they'll go bankrupt.


Everything rides on Zen. If its crap, confirm AMD will go bust and we will only have Intel as sole supplier. Well, don't know if its good or bad but right now, the world is using Intel anyway. AMD is literally non-existence.


----------



## escksu

Quote:


> Originally Posted by *epic1337*
> 
> guys you have to realize that AMD is a very BIG business with two fronts ( CPU & GPU ), it isn't gonna be saved by a few hundred million USD of additional sales...
> 
> they've been losing billions of revenue in the years they've been running the business, you think Zen can get that back within a few short years?
> not to mention that they're still running on negative, despite the numerous workforce dismissals.


Well, their custom/embedded solution is making money. Actually I would rather they just close down their CPU/GPU dept and concentrate on embedded.


----------



## Tojara

Quote:


> Originally Posted by *escksu*
> 
> Everything rides on Zen. If its crap, confirm AMD will go bust and we will only have Intel as sole supplier. Well, don't know if its good or bad but right now, the world is using Intel anyway. AMD is literally non-existence.


Pretty much, but I think a lot of people are overestimating how good Zen needs to be. The specs in the OP already heavily point it to being a power/area focused core, not aiming for the highest pure performance. The FPU points to that direction as well.
Quote:


> Originally Posted by *escksu*
> 
> Well, their custom/embedded solution is making money. Actually I would rather they just close down their CPU/GPU dept and concentrate on embedded.


Doesn't sound like a smart move when they have a fighting chance in the bigger market and it's far more efficient to have the same core designs used multiple times.


----------



## epic1337

Quote:


> Originally Posted by *Tojara*
> 
> Doesn't sound like a smart move when they have a fighting chance in the bigger market and it's far more efficient to have the same core designs used multiple times.


no, rather the embedded market is far more larger than the CPU/GPU market.

just to note, the embedded market includes phones, tablets, watches, barebones, and even those consoles.
literally if the chip comes pre-installed, soldered and fully configured (SoC chips to be exact) then its an embedded chip.

as a matter of fact, AMD had the chance to shine in the embedded market, bigcats were somewhat popular, even consoles chose it over something else.
the only reason they hadn't gotten through it, is because they lacked the budget and manpower to further develop bigcats.


----------



## drufause

Quote:


> Originally Posted by *escksu*
> 
> Everything rides on Zen. If its crap, confirm AMD will go bust and we will only have Intel as sole supplier. Well, don't know if its good or bad but right now, the world is using Intel anyway. AMD is literally non-existence.


If that was to happen then the US Government would once again use monopoly laws to force Intel to diversify their sourcing to another company or split Intel so that there is not one source of x86 x64 compatible processors.

That being said not having to pay to play and compete with NVIDIA at TSMC for production time as well as dealing with Samsung, A bigger company than NVIDIA lends its self to profit in the future.

Following with AMD operations appear at this time to have less profit in them than NVIDIA's operations despite the fact that the sales of both companies is less than 25% difference.

According to the GAAP reports from both company.

AMD in Q3 made 1.06 Billion in sales and NVIDIA made 1.23 Billion in sales. (Yeah NVIDIA Is murdering them in sales)
This leads us to making money off sales as the place where NVIDA is making the difference. AMD lost 158 Million on the 1.06 Billion sales where as NVIDIA made 173 Million on the 1.23 billion in sales. If AMD ZEN can be manufactured at a better price and better yield rate than they got off of their .32 fab then they might make up the difference in losses just in cost cutting measures. Enhance that with possibility of closer to Intel processor performance increasing sales of CPU they might be an even higher monetary profit performance.


----------



## epic1337

Quote:


> Originally Posted by *drufause*
> 
> According to the GAAP reports from both company.
> 
> AMD in Q3 made 1.06 Billion in sales and NVIDIA made 1.23 Billion in sales. (Yeah NVIDIA Is murdering them in sales)
> This leads us to making money off sales as the place where NVIDA is making the difference. AMD lost 158 Million on the 1.06 Billion sales where as NVIDIA made 173 Million on the 1.23 billion in sales. If AMD ZEN can be manufactured at a better price and better yield rate than they got off of their .32 fab then they might make up the difference in losses just in cost cutting measures. Enhance that with possibility of closer to Intel processor performance increasing sales of CPU they might be an even higher monetary profit performance.


you're forgetting the part that Zen isn't a GPU, it won't fix the GPU segment losing money.
that is to note, if Zen were to "tank" the GPU segment's losses, then Zen would have to make ~150million USD more on top of getting the CPU segment back on track.
we're probably looking at requiring around ~400million USD more sales just relying on Zen uarch, which by itself would be quite a miraculous feat to achieve.

AMD is for the moment, literally relying on Investor's "donations", yes donations because so far they're only losing money.
if those investors gets fed up in "donating" their money, AMD would run out of funds to keep their R&D up, stopping their "future plans" to a screeching halt.
in the worst case, they'd have to cut even more workforce, losing valuable development staff, and probably even sell one or two of their segments.


----------



## superstition222

Nvidia may get to HBM2 first with Pascal and make quite a bit of money on it but AMD has more experience with HBM so it may be able to capitalize on that with its next iteration of GPUs. Once AMD can get off of 28nm things may improve quite a bit for them on the GPU sales front.

AMD optimized its Hawaii parts for high-resolution performance (4K) and DX12 before the market was mature for either of them, just as its Bulldozer FX design was optimized for lots of threads before games were even coming close to using many. AMD is forward-thinking and Nvidia has, lately, been better at serving up "just enough" - with parts that aren't designed to excel in DX12 but which are quite efficient for DX11. The 3.5 GB VRAM in the 970 hasn't been a big liability in most games so far. Only Mordor is an issue as far as I know below 4K resolution. This is because most games use 1-3 GB of VRAM, according to one site's testing, even maxed, at 1440. Nvidia probably got Witcher 3's VRAM requirements reduced, too, since the demo had more detailed terrain.

It's unfortunate for AMD that DX12 development has been so slow. There really should have been significant games released on DX12's launch. It's also unfortunate that developers aren't likely to make many games that need more than 3.5 GB of VRAM at 1440 because they don't want to have problems due to the 970. It's like the way the continued sales of i3s hamper multithreading in gaming, because developers want their games to be able to be purchased and played by i3 owners. That hurts the FX line. However, VR and 4K are going to become, along with DX12, things that improve AMD's competitiveness in GPUs.

AMD was the performance per watt leader as well as the performance per dollar leader back when the 5870/5850 were around, as I recall. Nvidia had to recover from the 480/470/465 fiasco. It doesn't seem inconceivable that AMD will be able to reposition itself again. Despite the complaints about Fiji, the performance per watt numbers did improve quite a bit without giving up a lot of performance in the process. A smaller node should allow AMD to better leverage those design changes (versus Hawaii) and extend that trend of returning to being both the performance per watt leader and the performance per dollar leader. If it can do that again it will become a sales winner.

It's amusing to see how short some people's memories are. The 480 couldn't even be fully enabled and the 465 was a hot mess.


----------



## Redwoodz

Quote:


> Originally Posted by *drufause*
> 
> If that was to happen then the US Government would once again use monopoly laws to force Intel to diversify their sourcing to another company or split Intel so that there is not one source of x86 x64 compatible processors.
> 
> That being said not having to pay to play and compete with NVIDIA at TSMC for production time as well as dealing with Samsung, A bigger company than NVIDIA lends its self to profit in the future.
> 
> Following with AMD operations appear at this time to have less profit in them than NVIDIA's operations despite the fact that the sales of both companies is less than 25% difference.
> 
> According to the GAAP reports from both company.
> 
> AMD in Q3 made 1.06 Billion in sales and NVIDIA made 1.23 Billion in sales. (Yeah NVIDIA Is murdering them in sales)
> This leads us to making money off sales as the place where NVIDA is making the difference. AMD lost 158 Million on the 1.06 Billion sales where as NVIDIA made 173 Million on the 1.23 billion in sales. If AMD ZEN can be manufactured at a better price and better yield rate than they got off of their .32 fab then they might make up the difference in losses just in cost cutting measures. Enhance that with possibility of closer to Intel processor performance increasing sales of CPU they might be an even higher monetary profit performance.


AMD didn't lose 158 million on 1.06 billion in sales,they had 158 million in R&D expense for new products over what they sold. It's called an investment.


----------



## christoph

Quote:


> Originally Posted by *superstition222*
> 
> Nvidia may get to HBM2 first with Pascal and make quite a bit of money on it but AMD has more experience with HBM so it may be able to capitalize on that with its next iteration of GPUs. Once AMD can get off of 28nm things may improve quite a bit for them on the GPU sales front.
> 
> AMD optimized its Hawaii parts for high-resolution performance (4K) and DX12 before the market was mature for either of them, just as its Bulldozer FX design was optimized for lots of threads before games were even coming close to using many. AMD is forward-thinking and Nvidia has, lately, been better at serving up "just enough" - with parts that aren't designed to excel in DX12 but which are quite efficient for DX11. The 3.5 GB VRAM in the 970 hasn't been a big liability in most games so far. Only Mordor is an issue as far as I know below 4K resolution. This is because most games use 1-3 GB of VRAM, according to one site's testing, even maxed, at 1440. Nvidia probably got Witcher 3's VRAM requirements reduced, too, since the demo had more detailed terrain.
> 
> It's unfortunate for AMD that DX12 development has been so slow. There really should have been significant games released on DX12's launch. It's also unfortunate that developers aren't likely to make many games that need more than 3.5 GB of VRAM at 1440 because they don't want to have problems due to the 970. It's like the way the continued sales of i3s hamper multithreading in gaming, because developers want their games to be able to be purchased and played by i3 owners. That hurts the FX line. However, VR and 4K are going to become, along with DX12, things that improve AMD's competitiveness in GPUs.
> 
> AMD was the performance per watt leader as well as the performance per dollar leader back when the 5870/5850 were around, as I recall. Nvidia had to recover from the 480/470/465 fiasco. It doesn't seem inconceivable that AMD will be able to reposition itself again. Despite the complaints about Fiji, the performance per watt numbers did improve quite a bit without giving up a lot of performance in the process. A smaller node should allow AMD to better leverage those design changes (versus Hawaii) and extend that trend of returning to being both the performance per watt leader and the performance per dollar leader. If it can do that again it will become a sales winner.
> 
> It's amusing to see how short some people's memories are. The 480 couldn't even be fully enabled and the 465 was a hot mess.


anddddd it is a shame where, people choose Nvidia over AMD when they can get almost the same performance at a hugh better price...

AMD has been always pushing the tech to the next level, where others have fall apart

and you guys know it;

2 real cores;
x64 architecture;
4 real cores;
Mantle;
Freesync;
HBM memory;

and always at a better price


----------



## Shogon

Quote:


> Originally Posted by *christoph*
> 
> and you guys know it;
> 
> 2 real cores;*couldn't capitilize for long after Intel released core 2*
> x64 architecture; *I guess that means a lot when it's jointly shared with Intel*
> 4 real cores; *again, didn't capitilize on this for long and hasn't even begun to match Sandy Bridge performance (5+ year old cpu)*
> Mantle; *used in a hand full of games and scrapped for Vulkan. Vulkan in nothing at this time*
> Freesync; *the expression a day late, and a dollar short rings a bell. One year later though*
> HBM memory; *doesn't do much as it was hyped up to be, apart from 4k usage (which as we know is an extreme minority)*
> 
> and always at a better price;*apart from the 300 series. It's better we all forget though they are just rehashed cards with more memory and tweaked speeds that could nearly be obtained with overclocking. Then again it's not like they could continue with selling cards at loses for long partly due to the mining craze.*


----------



## Olivon

Quote:


> Originally Posted by *superstition222*
> 
> Nvidia may get to HBM2 first with Pascal and make quite a bit of money on it but AMD has more experience with HBM so it may be able to capitalize on that with its next iteration of GPUs. Once AMD can get off of 28nm things may improve quite a bit for them on the GPU sales front.
> 
> AMD optimized its Hawaii parts for high-resolution performance (4K) and DX12 before the market was mature for either of them, just as its Bulldozer FX design was optimized for lots of threads before games were even coming close to using many. AMD is forward-thinking and Nvidia has, lately, been better at serving up "just enough" - with parts that aren't designed to excel in DX12 but which are quite efficient for DX11. The 3.5 GB VRAM in the 970 hasn't been a big liability in most games so far. Only Mordor is an issue as far as I know below 4K resolution. This is because most games use 1-3 GB of VRAM, according to one site's testing, even maxed, at 1440. Nvidia probably got Witcher 3's VRAM requirements reduced, too, since the demo had more detailed terrain.
> 
> It's unfortunate for AMD that DX12 development has been so slow. There really should have been significant games released on DX12's launch. It's also unfortunate that developers aren't likely to make many games that need more than 3.5 GB of VRAM at 1440 because they don't want to have problems due to the 970. It's like the way the continued sales of i3s hamper multithreading in gaming, because developers want their games to be able to be purchased and played by i3 owners. That hurts the FX line. However, VR and 4K are going to become, along with DX12, things that improve AMD's competitiveness in GPUs.
> 
> AMD was the performance per watt leader as well as the performance per dollar leader back when the 5870/5850 were around, as I recall. Nvidia had to recover from the 480/470/465 fiasco. It doesn't seem inconceivable that AMD will be able to reposition itself again. Despite the complaints about Fiji, the performance per watt numbers did improve quite a bit without giving up a lot of performance in the process. A smaller node should allow AMD to better leverage those design changes (versus Hawaii) and extend that trend of returning to being both the performance per watt leader and the performance per dollar leader. If it can do that again it will become a sales winner.
> 
> It's amusing to see how short some people's memories are. The 480 couldn't even be fully enabled and the 465 was a hot mess.


Your comment is totally biaised and shows little reasonning.
Quote:


> AMD optimized its Hawaii parts for high-resolution performance (4K) and DX12 before the market was mature for either of them


You're telling us that AMD don't listen to market and choose wrong priorities.
Before talking about 4K (lol), DX 12 (lol²), what about current market 1080p, 1440p and DX11 ? what about correct DX11 drivers first ?
nVidia is pragmatic and know how to embrace the actual market's demand. AMD is telling you to look at Fiji but real cards are from 2013.
4K concerns anybody and DX12 is not implemented with just a technical demo ??? It's just bla bla bla and during this time nVidia is having success with cards satisfiying the demand.
With AMD it's always "this round is not good but next time wil be good". Next time, people finally choose Intel and nVidia.


----------



## MaCk-AtTaCk

Quote:


> Originally Posted by *Olivon*
> 
> Your comment is totally biaised and shows little reasonning.
> You're telling us that AMD don't listen to market and choose wrong priorities.
> Before talking about 4K (lol), DX 12 (lol²), what about current market 1080p, 1440p and DX11 ? what about correct DX11 drivers first ?
> nVidia is pragmatic and know how to embrace the actual market's demand. AMD is telling you to look at Fiji but real cards are from 2013.
> 4K concerns anybody and DX12 is not implemented with just a technical demo ??? It's just bla bla bla and during this time nVidia is having success with cards satisfiying the demand.
> With AMD it's always "this round is not good but next time wil be good". Next time, people finally choose Intel and nVidia.


+1


----------



## Themisseble

Quote:


> Originally Posted by *MaCk-AtTaCk*
> 
> +1


I course NVIDIA do care abou their DX11 Cards.... and they are advertising frea*king VULKAN and DX12 and async shaders everywhere...

http://www.computerbase.de/2015-11/call-of-duty-black-ops-iii-test/3/#diagramm-cod-black-ops-iii-1920-1080

GTX 770? 53% behind 7970 GHz?


----------



## superstition222

Quote:


> Originally Posted by *Olivon*
> 
> Your comment is totally biaised and shows little reasonning.


That response is about as useful as your "bla bla bla" response. Ignore list.


----------



## Serios

Quote:


> Originally Posted by *Shogon*
> 
> doesn't do much as it was hyped up to be, apart from 4k usage (which as we know is an extreme minority)


Doesn't do much? you must a very big expert what exactly recommends you?
HBM is hand down superior to GDDR5 in every way and will be a great match for 14-16nm GPUs. You don't even have to be an engineer and have experience in this field just look at Nano and see what AMD was able to accomplish whit such a big GPU. A Fury Nano whit GDDR5 would clearly be impossible.
HBM is limited to 4Gb, this is the only problem it has right not but it won't have in the very near future. The fact that Nvidia will be using HBM for their next GPUs, a tech in which AMD played a key role developing it's a testament of how great HBM really is.

When is the last time Nvidia did something similar to HBM for the GPU world?


----------



## epic1337

Quote:


> Originally Posted by *Serios*
> 
> Doesn't do much? you must a very big expert what exactly recommends you?
> HBM is hand down superior to GDDR5 in every way and will be a great match for 14-16nm GPUs. You don't even have to be an engineer and have experience in this field just look at Nano and see what AMD was able to accomplish whit such a big GPU. A Fury Nano whit GDDR5 would clearly be impossible.
> HBM is limited to 4Gb, this is the only problem it has right not but it won't have in the very near future. The fact that Nvidia will be using HBM for their next GPUs, a tech in which AMD played a key role developing it's a testament of how great HBM really is.
> 
> When is the last time Nvidia did something similar to HBM for the GPU world?


"superior to GDDR5 in *every way*" is an overstatement, HBM has a notably higher latency compared to GDDR5.
despite it's higher bandwidth it's too-high latency made hardly better than GDDR5.
and thats without considering the capacity limitations of current HBM.

and on that note, the difference of HBM to GDDR5 is like DDR4 is to DDR3.
despite DDR4's nearly 2x more bandwidth, it's high latency made it hardly better than DDR3 in real-world applications.

on a further note, no i'm not saying HBM is worthless, what i'm saying is that HBM has it's own pros and cons


----------



## superstition222

HBM is superior, overall, to GDDR5. However, competition will heat up with GDDR5X. One analyst predicts that HBM2 will mainly be reserved for the high-end GPUs with GDDR5X for the remainder.


----------



## EniGma1987

GDDR5 is at it's limit though. Even the "X" version is really just a small trick to improve bandwidth without really improving things for the spec. Power requirements and MHz are still pretty much at their limits so we wont get anything better. HBM on the other hand will dominate everything once it is clocked to between 1.0-1.5GHz. You are looking at 2-3TB/s of bandwidth at that point and latency comparable to GDDR5 while using less power.

(Bandwidth calculation is 512GB/s of current HBM1 multiplied by 2x for having twice as many stacks when using HBM2, and then multiplied by 2-3 again for doubling/tripling the clock speed)


----------



## epic1337

Quote:


> Originally Posted by *EniGma1987*
> 
> GDDR5 is at it's limit though. Even the "X" version is really just a small trick to improve bandwidth without really improving things for the spec. Power requirements and MHz are still pretty much at their limits so we wont get anything better. *HBM on the other hand will dominate everything once it is clocked to between 1.0-1.5GHz.* You are looking at 2-3TB/s of bandwidth at that point and latency comparable to GDDR5 while using less power.


the day this comes would be the day where GDDR5 will become what GDDR3 is today, but we're probably looking at 3~5years into the future.

on another note, there had been other attempts in making a non-volatile substitute to RAM, yes its a crossover between an SSD and RAM.
or rather my point, by the time HBM becomes relevant, we'd probably having this same argument about HBM being obsolete stuff.


----------



## Tojara

Quote:


> Originally Posted by *epic1337*
> 
> "superior to GDDR5 in *every way*" is an overstatement, HBM has a notably higher latency compared to GDDR5.
> despite it's higher bandwidth it's too-high latency made hardly better than GDDR5.
> and thats without considering the capacity limitations of current HBM.
> 
> and on that note, the difference of HBM to GDDR5 is like DDR4 is to DDR3.
> despite DDR4's nearly 2x more bandwidth, it's high latency made it hardly better than DDR3 in real-world applications.
> 
> on a further note, no i'm not saying HBM is worthless, what i'm saying is that HBM has it's own pros and cons


Is there an actual source for that? I've yet to see any proper latency comparisons between the two. If anything, I'd imagine it to be lower for HBM simply due to being a smaller distance away from the chip.


----------



## Imouto

You're not getting AMD's strategy at all. They want to make SoCs for PC. HBM is the key for this as neither DDR4 or GDDR5 are enough for a SoC with such characteristics.

AMD can't compete with Intel or Nvidia and Zen and Arctic Islands will tell the same story. What AMD wants is to make PCs the size of a graphics card with a good enough CPU/GPU performance. I bet they will be able to deliver that performance for 1080p at < 300mm². Then they will have an entire new market for themselves with Intel lagging behind and Nvidia unable to even join the fight.

If we consider a GTX 970 the golden standard for 1080p at ~400mm²@28nm you could fit that in less than 200mm²@14nm considering lower clocks and all. Plus fitting the CPU in less than 100mm² (Intel already fits an entire i7 in 122mm²) is indeed promising. Then AMD partners can pack the whole thing (SoC+HBM, MB, SSD and PSU) in a box the size of a graphics card for whatever they consider but it is highly unlikely a traditional DIY PC can beat it.

AMD doing this right is another matter.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> "superior to GDDR5 in *every way*" is an overstatement, HBM has a notably higher latency compared to GDDR5.
> despite it's higher bandwidth it's too-high latency made hardly better than GDDR5.
> and thats without considering the capacity limitations of current HBM.
> 
> and on that note, the difference of HBM to GDDR5 is like DDR4 is to DDR3.
> despite DDR4's nearly 2x more bandwidth, it's high latency made it hardly better than DDR3 in real-world applications.
> 
> on a further note, no i'm not saying HBM is worthless, what i'm saying is that HBM has it's own pros and cons


Depends on the utilization of the HBM and how it's access on the interposer is split. The way it's utilized on a GPU is not the best for the CPU, but you could say that if they split the hbm stacks up it'd reduce the latency but reduce it's higher bandwidth all-in-all better performance than GDDR5 and DDR4.


----------



## Rookie1337

Quote:


> Originally Posted by *Imouto*
> 
> You're not getting AMD's strategy at all. They want to make SoCs for PC. HBM is the key for this as neither DDR4 or GDDR5 are enough for a SoC with such characteristics.
> 
> AMD can't compete with Intel or Nvidia and Zen and Arctic Islands will tell the same story. What AMD wants is to make PCs the size of a graphics card with a good enough CPU/GPU performance. I bet they will be able to deliver that performance for 1080p at < 300mm². Then they will have an entire new market for themselves with Intel lagging behind and Nvidia unable to even join the fight.
> 
> If we consider a GTX 970 the golden standard for 1080p at ~400mm²@28nm you could fit that in less than 200mm²@14nm considering lower clocks and all. Plus fitting the CPU in less than 100mm² (Intel already fits an entire i7 in 122mm²) is indeed promising. Then AMD partners can pack the whole thing (SoC+HBM, MB, SSD and PSU) in a box the size of a graphics card for whatever they consider but it is highly unlikely a traditional DIY PC can beat it.
> 
> AMD doing this right is another matter.


Just what kind of market would that even be for? In the developing world you have smartphones with ARM and here you have ARM/Nuc style systems. None of these target gamers of the PCmaster rate MSAAx16 type.

Really this seems as dumb as Nvidia with it's Tegra line. All that power and R&D wasted because they tried to snag the Android pie and got burned. Tegra X1 would be great for a Linux system but they don't make it an appealing choice.


----------



## SpeedyVT

Quote:


> Originally Posted by *Rookie1337*
> 
> Just what kind of market would that even be for? In the developing world you have smartphones with ARM and here you have ARM/Nuc style systems. None of these target gamers of the PCmaster rate MSAAx16 type.
> 
> Really this seems as dumb as Nvidia with it's Tegra line. All that power and R&D wasted because they tried to snag the Android pie and got burned. Tegra X1 would be great for a Linux system but they don't make it an appealing choice.


HBM is perfect for phones too! Samsung has been interested in HBM.


----------



## superstition222

Quote:


> Originally Posted by *Imouto*
> 
> You're not getting AMD's strategy at all. They want to make SoCs for PC. HBM is the key for this as neither DDR4 or GDDR5 are enough for a SoC with such characteristics.
> 
> AMD can't compete with Intel or Nvidia and Zen and Arctic Islands will tell the same story. What AMD wants is to make PCs the size of a graphics card with a good enough CPU/GPU performance. I bet they will be able to deliver that performance for 1080p at < 300mm². Then they will have an entire new market for themselves with Intel lagging behind and Nvidia unable to even join the fight.
> 
> If we consider a GTX 970 the golden standard for 1080p at ~400mm²@28nm you could fit that in less than 200mm²@14nm considering lower clocks and all. Plus fitting the CPU in less than 100mm² (Intel already fits an entire i7 in 122mm²) is indeed promising. Then AMD partners can pack the whole thing (SoC, MB, SSD and PSU) in a box the size of a graphics card for whatever they consider but it is highly unlikely a traditional DIY PC can beat it.
> 
> AMD doing this right is another matter.


If they're aiming at 1080 that will be a mistake because 4K is going to dominate mindshare going forward in the enthusiast community, even though it's overkill in terms of pixel count (1440 is plenty with a large enough panel and reasonable distance) and it would be better for developers to focus more on improving graphics by adding more detail/quality and, especially, for them to focus more on making environments less static. But that last bit requires more work than just bumping the pixel count.

4K won because of 1080 panel manufacturing techniques, not because human vision needs it, especially at HDTV viewing distances.


----------



## Imouto

Quote:


> Originally Posted by *Rookie1337*
> 
> Just what kind of market would that even be for? In the developing world you have smartphones with ARM and here you have ARM/Nuc style systems. None of these target gamers of the PCmaster rate MSAAx16 type.


The budget gamer. Those who buy midrange cards and pick fire sales on Steam. We are legion.

In fact there's a thread around with the best sellers at Newegg stating that the cards with most sales were the GTX 960 and the GTX 970.

I'll tell you again, AMD can't fight on the high end with Intel and/or Nvidia. It's better if they look for new lands.
Quote:


> Originally Posted by *Rookie1337*
> 
> Really this seems as dumb as Nvidia with it's Tegra line. All that power and R&D wasted because they tried to snag the Android pie and got burned. Tegra X1 would be great for a Linux system but they don't make it an appealing choice.


It would be a full fledged PC. What does Android or ARM have to do with it?
Quote:


> Originally Posted by *superstition222*
> 
> 4K won because of 1080 panel manufacturing techniques, not because human vision needs it, especially at HDTV viewing distances.


AMD has a track of implementing new technologies far too soon. Nvidia leads at 1080p and that is killing AMD.

Just to put a bit of context 4K displays are 0.14% of the total in the latest Steam HW survey with near to zero growth. 1440p a bit better with a stunning 1.32% and 0.05% growth.

Let's see 1080p: 35.21% and 0.31% growth.

I'd say 1080p is going to reign for quite a while.


----------



## epic1337

Quote:


> Originally Posted by *Tojara*
> 
> Is there an actual source for that? I've yet to see any proper latency comparisons between the two. If anything, I'd imagine it to be lower for HBM simply due to being a smaller distance away from the chip.


as far as i'm aware of theres no credible source, most of it are pure conjecture.

if i remember right, their reasoning is that the clock speed is far too low, even with a low base latency the overall latency would be higher.
e.g. 1500Mhz @ 10CL vs 500Mhz @ 5CL.
Quote:


> Originally Posted by *SpeedyVT*
> 
> HBM is perfect for phones too! Samsung has been interested in HBM.


this indeed, HBM would make SoC power consumption quite lower, and the bandwidth would be far greater than DDR3L.


----------



## superstition222

Quote:


> Originally Posted by *Imouto*
> 
> Just to put a bit of context 4K displays are 0.14% of the total in the latest Steam HW survey with near to zero growth. 1440p a bit better with a stunning 1.32% and 0.05% growth.
> 
> Let's see 1080p: 35.21% and 0.31% growth.
> 
> I'd say 1080p is going to reign for quite a while.


I know that 1080 is very dominant at the moment but enthusiast mindshare makes a big difference in terms of sales momentum. People will inevitably say 1080 is for console gaming next year. They'll also say "Why should I bother to upgrade from my 970?"

The media, fronting for manufacturers, are already beginning to try to hype 5K and even 8K. No one needs 8K but it will come anyway.


----------



## superstition222

Quote:


> Originally Posted by *Rookie1337*
> 
> Just what kind of market would that even be for?


People like cute. Portability is also a bonus. Cute + powerful is a sales recipe, provided it is reasonably priced.

Apple also proved that customers would pay a premium for skinny nice-looking laptops instead of laptops with thick profiles so they could have larger quieter cooling for when the GPU is under load. I would personally prefer a thick laptop for gaming over one like my Macbook Pro that I never use for that, or for anything that loads the GPU much, because the fan noise is beyond what I can take (even with just the Iris CPU graphics).

But, consumers have generally shown that they're willing to be deafened if it makes for a "sexy" laptop. It's not just Apple either. No laptop maker of note has specifically bucked the skinny trend to provide large quiet cooling and light weight at the same time. If any maker decides to make a thick laptop it will be heavy and/or will have very hot parts crammed in and resultant high decibels.









So, because gaming laptops are a joke, there is a market for a portable box for gaming that can be connected to an HDTV or what not, one that will deliver better quality than a console. Unfortunately, though, input lag on HDTVs is generally horrendous.

Or, a company can get a clue and make a proper gaming laptop/portable (very thick profile, large quiet fans, vapor chamber, reasonably light weight, low input lag with Freesync/Gsync, wide gamut high contrast screen, AC operated only). Even in that case AMD's design might be useful. HBM runs at lower clocks so it should be able to be run cooler than GDDR5.


----------



## Imouto

Quote:


> Originally Posted by *superstition222*
> 
> I know that 1080 is very dominant at the moment but enthusiast mindshare makes a big difference in terms of sales momentum. People will inevitably say 1080 is for console gaming next year. They'll also say "Why should I bother to upgrade from my 970?"


The enthusiast mindshare is limited by how deep their pockets are. They may salivate just by thinking of owning a GTX 980Ti but the hard and cold truth is that they can barely afford a GTX 970. And you don't buy a 4K monitor for that kind of card to play at 15~20 FPS.

Also the card that is good enough for 1080p now won't do in the future and even more with DX12 and Vulkan around the corner. Let's remember that a GTX 560Ti was plenty for 1080p not that long ago.
Quote:


> Originally Posted by *superstition222*
> 
> The media, fronting for manufacturers, are already beginning to try to hype 5K and even 8K. No one needs 8K but it will come anyway.


It doesn't matter what they want, they've been shoving this 4K nonsense down our throats for quite a while now and it doesn't take off at all. I prefer [email protected]/144 over [email protected] any day of the week.

There's no 4K media and the gear needed to play at that resolution is stupidly expensive. Unless the 14/16nm GPUs can deal with 4K like a breeze I don't see it going anywhere for at least two GPU generations.


----------



## drufause

Quote:


> Originally Posted by *Rookie1337*
> 
> Just what kind of market would that even be for? In the developing world you have smartphones with ARM and here you have ARM/Nuc style systems. None of these target gamers of the PCmaster rate MSAAx16 type.
> 
> Really this seems as dumb as Nvidia with it's Tegra line. All that power and R&D wasted because they tried to snag the Android pie and got burned. Tegra X1 would be great for a Linux system but they don't make it an appealing choice.


This is for the office and laptop world. Where Corporations buy 1000's of connected PC's at a time and people like HP, Dell and Acer deliver functional secure Linux or Windows PC's for business purposes. AMD, Intel, HP, Dell and Acer make more money on these deals than they do on all of their performance PC's (ie.. FX and i7x) sales combined.


----------



## epic1337

OEMs is what they're called, they order in the hundred thousands per batch, and deliver them worldwide.

NUCs are getting more popular than typical desktop OEM counterparts though, but i wonder if you can build an AMD NUC, as Intel has the rights on that.


----------



## christoph

ok, so you guys are saying that HBM is a piece of crap in general...??? hmmmmm

so you guys are like;

" hey, we develop a automobile with wings and can fly, and it can take anywhere in the country within a few hours"

and you are like;

"is a piece of crap, it can't take you within continents, why they didn't develop a teletransporter device that can take you anywhere within seconds, but noooooo, they develop a car that can fly, is crap, cause you have to turn it on and wait a few minutes for it to warm up"...

really you guys expect to be develop something and to be working at full potential the first release??, like windows did that, or linux, or any hardware...

you guys always have to be complaining about everything, because it is or because it is not

and I said complaining, not argumenting


----------



## Shogon

Quote:


> Originally Posted by *Serios*
> 
> Doesn't do much? you must a very big expert what exactly recommends you?
> HBM is hand down superior to GDDR5 in every way and will be a great match for 14-16nm GPUs. You don't even have to be an engineer and have experience in this field just look at Nano and see what AMD was able to accomplish whit such a big GPU. A Fury Nano whit GDDR5 would clearly be impossible.
> HBM is limited to 4Gb, this is the only problem it has right not but it won't have in the very near future. The fact that Nvidia will be using HBM for their next GPUs, a tech in which AMD played a key role developing it's a testament of how great HBM really is.
> 
> When is the last time Nvidia did something similar to HBM for the GPU world?


Yeah, it doesn't do much except give you a big number in GPU-z where it states your bandwidth speeds. I can forgo the 4GB limitation due to HBM's first iteration as that will increase with HBM2, but AMD's implementation of HBM hasn't been as groundbreaking as I wish it would have been. Ahh yes, the Nano. A successful card that proves its place in the market. It is a testament of engineering to have that much power in a small space I'll give you that, but it hasn't exactly been a _Joan of Arc_ for AMD in revitalizing market share or profits.

HBM barely gives an advantage over 7 GHz GDDR5 at anything under 4k gaming, and hasn't exactly helped AMD out financially or with sales being the first to utilize HBM. The price to pay for pioneering I suppose.

It really only helps in terms of efficiency since it uses less power in general compared to GDDR5, which is a good thing, albeit at the cost of being more expensive to produce (for now). Eventually that will change as production ramps up and volume reaches nominal capacity, along with HBM2, to rival GDDR5/X.

Also to retort
Quote:


> When is the last time Nvidia did something similar to HBM for the GPU world?


Nvidia has been selling mid range video cards for flagship prices while AMD can barely make ends meat. They let AMD do the hard work (transition of GDDR3 to GDDR5 and GDDR5 to HBM1/2) and Nvidia just swoops in for the profiteering when the time is right. Then again I forget some people consider business as charities on this site, and everyone gets a trophy for simply trying even when they don't succeed. Then again I guess AMD has to be the company to innovate things because what incentive is there for Intel, or even Nvidia when you're a behemoth in your respected market fighting against a David without a slingshot.


----------



## Rookie1337

Quote:


> Originally Posted by *Imouto*
> 
> The budget gamer. Those who buy midrange cards and pick fire sales on Steam. We are legion.
> 
> In fact there's a thread around with the best sellers at Newegg stating that the cards with most sales were the GTX 960 and the GTX 970.
> 
> I'll tell you again, AMD can't fight on the high end with Intel and/or Nvidia. It's better if they look for new lands.
> It would be a full fledged PC. What does Android or ARM have to do with it?
> AMD has a track of implementing new technologies far too soon. Nvidia leads at 1080p and that is killing AMD.
> 
> Just to put a bit of context 4K displays are 0.14% of the total in the latest Steam HW survey with near to zero growth. 1440p a bit better with a stunning 1.32% and 0.05% growth.
> 
> Let's see 1080p: 35.21% and 0.31% growth.
> 
> I'd say 1080p is going to reign for quite a while.


I think you're over estimating how powerful and how small these things can be with that power. Right now the best iGPU is less than half as powerful as I think anyone would want at 1080p. The iris in the i7-5775C is only able to scrape the low-mid 30s at medium quality settings at the 1080p resolution in many games. AMD's offering is about the same and slightly worse.

Both need a minimum size of a Gigabyte Brix to be viable and even then they're super hot and loud. A Brix is larger than any GPU shy of maybe a Titan or Mars(if they make those anymore).

So no, I don't see anything in the near term (2 or less years) making a gaming system the size of a GPU that is capable of desirable 1080p performance happening. Hence, I am puzzled as to what could possibly be your ideas' target audience. ARM and Android are replacing the non-work non-technical users' desire for a laptop/media consumption device which means your AMD idea can't target that market. So again, what is the market for the thing.

Also, since it wouldn't be a viable gaming platform it would have to contend with AMD's (and Intels) offerings in systems like those built by CompuLab such as the IntensePC 2, fitlet, and FitPC which are all fanless and super tiny.

Quote:


> Originally Posted by *superstition222*
> 
> People like cute. Portability is also a bonus. Cute + powerful is a sales recipe, provided it is reasonably priced.
> 
> Apple also proved that customers would pay a premium for skinny nice-looking laptops instead of laptops with thick profiles so they could have larger quieter cooling for when the GPU is under load. I would personally prefer a thick laptop for gaming over one like my Macbook Pro that I never use for that, or for anything that loads the GPU much, because the fan noise is beyond what I can take (even with just the Iris CPU graphics).
> 
> But, consumers have generally shown that they're willing to be deafened if it makes for a "sexy" laptop. It's not just Apple either. No laptop maker of note has specifically bucked the skinny trend to provide large quiet cooling and light weight at the same time. If any maker decides to make a thick laptop it will be heavy and/or will have very hot parts crammed in and resultant high decibels.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> So, because gaming laptops are a joke, there is a market for a portable box for gaming that can be connected to an HDTV or what not, one that will deliver better quality than a console. Unfortunately, though, input lag on HDTVs is generally horrendous.
> 
> Or, a company can get a clue and make a proper gaming laptop/portable (very thick profile, large quiet fans, vapor chamber, reasonably light weight, low input lag with Freesync/Gsync, wide gamut high contrast screen, AC operated only). Even in that case AMD's design might be useful. HBM runs at lower clocks so it should be able to be run cooler than GDDR5.


See the above reply it fits here mostly as well. Laptops with big cooling are around. But they're generally called mobile workstations and most people can't afford them and don't want a laptop that approaches 10+pounds.

Also, why would you make a thicker laptop if not to put something more powerful in it? You don't get any less noise using weaker parts.

Quote:


> Originally Posted by *drufause*
> 
> This is for the office and laptop world. Where Corporations buy 1000's of connected PC's at a time and people like HP, Dell and Acer deliver functional secure Linux or Windows PC's for business purposes. AMD, Intel, HP, Dell and Acer make more money on these deals than they do on all of their performance PC's (ie.. FX and i7x) sales combined.


As I stated above if small computers are needed for offices they'll go Nuc or similar because it will be a long time before a workstation can be of that size.
Quote:


> Originally Posted by *epic1337*
> 
> OEMs is what they're called, they order in the hundred thousands per batch, and deliver them worldwide.
> 
> NUCs are getting more popular than typical desktop OEM counterparts though, but i wonder if you can build an AMD NUC, as Intel has the rights on that.


And again, no chance for legit gaming on anything of that size. Unless of course you consider low quality settings; then you might scratch 60FPS at 1080p on some games. And as I posted above there's CompuLab's fit-PC4 or fitlet if you want an AMD type microPC.
http://www.fit-pc.com/web/products/fit-pc4/
http://www.fit-pc.com/web/products/fitlet/


----------



## drufause

And I am even more excited to see the performance of this system if it supports M.2 SSD


----------



## superstition222

Quote:


> Originally Posted by *Rookie1337*
> 
> Laptops with big cooling are around. But they're generally called mobile workstations and most people can't afford them and don't want a laptop that approaches 10+pounds.


They don't need to be super expensive or super heavy. Getting rid of batteries is a way to save weight and cost. It also provides more space for other things, like cooling.
Quote:


> Originally Posted by *Rookie1337*
> 
> You don't get any less noise using weaker parts.


Huh? More powerful parts generally take more power and generate more heat. In general, when people make a laptop they put performance ahead of noise so they cram too much into a form factor/cooling setup. The exception is low-performance "casual" laptops.

My expensive Macbook Pro is a good example. My old one from 2008 could be put onto a large cast iron skillet and that would act as a heatsink. This enabled me to run stuff that loaded the GPU without a lot of noise. But, my newer thinner model generates more noise than that one did and the skillet trick no longer works. I just can't use it for anything that loads the integrated GPU or the Nvidia 750 because of the intensity of the noise. The speakers in this newer thinner laptop also have substantially worse sound quality. These things don't matter to most people. They're ready to skimp on audio quality and quietness in order to have that "sexy" slim profile.







What did Apple do when it got 14nm chips? It made the Macbook Pro even thinner than mine.









Or, the laptop makers make heavy machines that are noisy because they put big batteries into the space instead of quieter cooling. And with "mobile workstations" you get overpriced workstation-class GPUs and such.


----------



## superstition222

Quote:


> Originally Posted by *Imouto*
> 
> It doesn't matter what they want, they've been shoving this 4K nonsense down our throats for quite a while now and it doesn't take off at all. I prefer [email protected]/144 over [email protected] any day of the week.
> 
> There's no 4K media and the gear needed to play at that resolution is stupidly expensive. Unless the 14/16nm GPUs can deal with 4K like a breeze I don't see it going anywhere for at least two GPU generations.


4K is going to be a success. It does matter what they want. Big manufacturers create consumer demand, not the other way around.

Consumers are sheep which is why advertising works so well and so many of them today don't even show the product being sold. No one needs the absurd resolutions that are being put into phones and such but there is demand anyway because manufacturers find that pixel count is easier to deliver than a more meaningful improvement.


----------



## drufause

Quote:


> Originally Posted by *superstition222*
> 
> 4K is going to be a success. It does matter what they want. Big manufacturers create consumer demand, not the other way around.
> 
> Consumers are sheep which is why advertising works so well and so many of them today don't even show the product being sold. No one needs the absurd resolutions that are being put into phones and such but there is demand anyway because manufacturers find that pixel count is easier to deliver than a more meaningful improvement.


I know i love the 3k screen on my Lenovo Yoga 2 Pro. It provides great resolution for use in Excel and Photo Edits


----------



## epic1337

Quote:


> Originally Posted by *Rookie1337*
> 
> And again, no chance for legit gaming on anything of that size. Unless of course you consider low quality settings; then you might scratch 60FPS at 1080p on some games. And as I posted above there's CompuLab's fit-PC4 or fitlet if you want an AMD type microPC.
> http://www.fit-pc.com/web/products/fit-pc4/
> http://www.fit-pc.com/web/products/fitlet/


why the hell does it have to be games? is games the only thing that you can think of doing on a PC? sheesh people these days....


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> why the hell does it have to be games? is games the only thing that you can think of doing on a PC? sheesh people these days....


Gaming is driving the sales of higher-end PC hardware. That trend is only going to increase. In fact, gaming-related hardware sales are the bright point in a slumping PC market. Maybe someday the games will become good enough to deserve the attention...


----------



## epic1337

Quote:


> Originally Posted by *superstition222*
> 
> Gaming is driving the sales of higher-end PCs. That trend is only going to increase.


no it isn't.

heavy compute is whats driving higher-end PC sales, look at the rigs they use on VM machines, [email protected] farms, and even rendering machines.
and what gaming machine? a i7-5960X with quad Titan X? pfft, theres a more massive dual X5650 (thats 12C/24T right there) or farming rigs with 4x R9-295X2 in it (yes thats 8GPUs in total).

gamers that can afford rigs over $3000 is literally the minority of minorities, this forum even has few that would build rigs that costs over $3000.
and thats not even considering the rig's purpose itself, $3000+ for a [email protected] farm is justifiable, even as a VM machine its still justifiable.
but just for satisfying your gaming needs? you must be hella loaded, or got too much time on your hands, maybe tripping too much... gimme some of that weed bro.


----------



## Serios

Quote:


> Originally Posted by *epic1337*
> 
> "superior to GDDR5 in *every way*" is an overstatement, HBM has a notably higher latency compared to GDDR5.
> despite it's higher bandwidth it's too-high latency made hardly better than GDDR5.
> and thats without considering the capacity limitations of current HBM.
> 
> and on that note, the difference of HBM to GDDR5 is like DDR4 is to DDR3.
> despite DDR4's nearly 2x more bandwidth, it's high latency made it hardly better than DDR3 in real-world applications.
> 
> on a further note, no i'm not saying HBM is worthless, what i'm saying is that HBM has it's own pros and cons


overstatement? you are not exaggerating too much. Basically you are saying HBM overall(without taking in consideration HBM2) has a huge latency problem. OK you must have sources for that.


----------



## Serios

Quote:


> Originally Posted by *Shogon*
> 
> Yeah, it doesn't do much except give you a big number in GPU-z where it states your bandwidth speeds. I can forgo the 4GB limitation due to HBM's first iteration as that will increase with HBM2, but AMD's implementation of HBM hasn't been as groundbreaking as I wish it would have been. Ahh yes, the Nano. A successful card that proves its place in the market. It is a testament of engineering to have that much power in a small space I'll give you that, but it hasn't exactly been a _Joan of Arc_ for AMD in revitalizing market share or profits.
> 
> HBM barely gives an advantage over 7 GHz GDDR5 at anything under 4k gaming, and hasn't exactly helped AMD out financially or with sales being the first to utilize HBM. The price to pay for pioneering I suppose.
> 
> It really only helps in terms of efficiency since it uses less power in general compared to GDDR5, which is a good thing, albeit at the cost of being more expensive to produce (for now). Eventually that will change as production ramps up and volume reaches nominal capacity, along with HBM2, to rival GDDR5/X.


So the problem for you, an Nvidia Fan is that HBM did not impress you enough. *Big deal*, that doesn't mean anything.
I'm sure when Nvidia will release their first HBM you will be *super impressed* because why not? technical details have 0 importance for you anyway. It seems that the problem is AMD launched HBM.

Quote:


> Also to retort
> Nvidia has been selling mid range video cards for flagship prices while AMD can barely make ends meat. They let AMD do the hard work (transition of GDDR3 to GDDR5 and GDDR5 to HBM1/2) and Nvidia just swoops in for the profiteering when the time is right. Then again I forget some people consider business as charities on this site, and everyone gets a trophy for simply trying even when they don't succeed. Then again I guess AMD has to be the company to innovate things because what incentive is there for Intel, or even Nvidia when you're a behemoth in your respected market fighting against a David without a slingshot.


You are clueless. Nvidia did not let AMD do anything, AMD also helped develop GDDR3 and GDDR5, that is why the were first to use these them not because Nvidia let them.
Incredible how AMD whit their much lower budget and worse economical situation can develop and support new standards when Nvidia can't. How is this possible?
Anyway what you wrote was definitely not an answer to my question, it doesn't have anything to do whit my question.


----------



## Imouto

Quote:


> Originally Posted by *Rookie1337*
> 
> I think you're over estimating how powerful and how small these things can be with that power. Right now the best iGPU is less than half as powerful as I think anyone would want at 1080p. The iris in the i7-5775C is only able to scrape the low-mid 30s at medium quality settings at the 1080p resolution in many games. AMD's offering is about the same and slightly worse.


The *Iris is a 100mm²@14nm die* and the architecture isn't even close to anything from AMD or Nvidia.

To put it in perspective (again): Being very, very generous it is said to pack almost the power of a *GTX 750 which is 150mm²@28nm.*

Iris so good.
Quote:


> Originally Posted by *Rookie1337*
> 
> Both need a minimum size of a Gigabyte Brix to be viable and even then they're super hot and loud. A Brix is larger than any GPU shy of maybe a Titan or Mars(if they make those anymore).


Ehm, no... The current Brix is dead quiet and most reviews praise it in this regard. Aside of that the cooling solution is a frigging laptop heatsink, of course AIBs can do better.

And for the size the thing is the size of a 2.5 SSD a bit thicker on the sides to make the base a square. Dunno if you have heard about m.2 drives that could drive the size down further.

I don't even know how you could be so wrong about these aspects of the Brix.
Quote:


> Originally Posted by *Rookie1337*
> 
> So no, I don't see anything in the near term (2 or less years) making a gaming system the size of a GPU that is capable of desirable 1080p performance happening. Hence, I am puzzled as to what could possibly be your ideas' target audience. ARM and Android are replacing the non-work non-technical users' desire for a laptop/media consumption device which means your AMD idea can't target that market. So again, what is the market for the thing.
> 
> Also, since it wouldn't be a viable gaming platform it would have to contend with AMD's (and Intels) offerings in systems like those built by CompuLab such as the IntensePC 2, fitlet, and FitPC which are all fanless and super tiny.


Why are you so fixated with Android and ARM? This would be an office/gaming full fledged PC and for any serious task you still need x86 because its performance and software compatibility.

AMD is after the gaming share of the pie. They already booked all the consoles and want to do the same with PC. Those underpowered things are the ones that have no market as ARM solutions surpassed them long ago (a $25 RPi for example).


----------



## superstition222

Quote:


> Originally Posted by *epic1337*
> 
> no it isn't.


It is according to a professional analysis I recently read. It said that PC sales overall are in a downward trend, the exception being higher-end PC component sales thanks to the enthusiast gaming market.
Quote:


> Originally Posted by *epic1337*
> 
> heavy compute is whats driving higher-end PC sales, look at the rigs they use on VM machines, [email protected] farms, and even rendering machines.


The analysis was talking about consumers not enterprise or corporations.


----------



## F3ERS 2 ASH3S

Quote:


> Originally Posted by *superstition222*
> 
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> no it isn't.
> 
> 
> 
> It is according to a professional analysis I recently read. It said that PC sales overall are in a downward trend, the exception being higher-end PC component sales thanks to the enthusiast gaming market.
> Quote:
> 
> 
> 
> Originally Posted by *epic1337*
> 
> heavy compute is whats driving higher-end PC sales, look at the rigs they use on VM machines, [email protected] farms, and even rendering machines.
> 
> Click to expand...
> 
> The analysis was talking about consumers not enterprise or corporations.
Click to expand...

To be honest.. I still say the pc sales decline is due to lack of performance/cost to upgrade. how many people here on this site alone are running a 5+ year old chip cause 2500k performs almost as well as a 6770 in games.. Why spend the extra money unless you are only buying for the features.. that is why everyone is waiting for Zen and hoping it brings a big enough punch to bolster the market for better competition...

I hate the excuse Pc's are going away.. they simply are not.. the mark is now saturated and the cost of more far out weighs the performance gained.


----------



## Shogon

Quote:


> Originally Posted by *Serios*
> 
> So the problem for you, an Nvidia Fan is that HBM did not impress you enough. *Big deal*, that doesn't mean anything.
> I'm sure when Nvidia will release their first HBM you will be *super impressed* because why not? technical details have 0 importance for you anyway. It seems that the problem is AMD launched HBM.
> You are clueless. Nvidia did not let AMD do anything, AMD also helped develop GDDR3 and GDDR5, that is why the were first to use these them not because Nvidia let them.
> Incredible how AMD whit their much lower budget and worse economical situation can develop and support new standards when Nvidia can't. How is this possible?
> Anyway what you wrote was definitely not an answer to my question, it doesn't have anything to do whit my question.


If history is a marker for anything HBM isn't going to do much for Nvidia either. We're not exactly limited by memory speeds at this time, and if history repeats Nvidia may have issues with the transition like they did with Fermi (GTX 480). All the bandwitdh in the world helps, but not if the core is the hinderance.

What will be interesting is what the node shrink and new archeticures from both sides will present. HBM is just a side dish to the main course. But hey call me a Nvidia fan all you want as it doesn't mean much of anything to me on a site filled with armchair engineers and undercover shills. I simply buy stuff, and garnish no emotional attachment to the company that gives me the ability to purchase that product.

Except for SIG though. I'd surely shill out for a 716 Patrol without a 2nd thought.

You say Nvidia did not let AMD do anything, but then state AMD helped develope GDDR3/5 and used them first. If Nvidia was so innovative as a company it would of done what AMD did, but it didn't. Why would they?

Nvidia isn't exactly hurting for cash while AMD has to take advanatage of anything Nvidia doesn't have. AMD utilized the first GDDR5 cards to their advanatage, but HBM didn't repeat that surge.

What's incredible is Nvidia's marketing tactics convincing legions of people they are the right solution when AMD does just fine, if not better compared to Nvidia. So many people get emotional over what a company produces now a days it's both remarkable and hilarious. It's just like sports teams and political parties, but for plastic and circuitry produced overseas.


----------



## EniGma1987

Quote:


> Originally Posted by *Shogon*
> 
> We're not exactly limited by memory speeds at this time


That is because render pipelines take a decent bit of die space and require a certain amount of bandwidth. Since we have been stuck on 28nm for so long the designers have not had the space to add more things in. Now that we are moving to 14 and 16nm though there is a lot more space, and with 4K beginning to move into more affordable areas for gamers we should see an increase in ROPs in next gen cards for 2-3 generations again. We have been stuck around 64 for a while, now we should see increases up to about 128 on the highest end cards. Who knows, maybe even hitting as high as 192 pipelines on the highest end AMD and Nvidia cards after a couple generations on the new nodes. And with large increases in render pipelines we will have large increases in bandwidth requirements. Which will make HBM a necessity as GDDR5 wont really have the bandwidth needed to drive the card.


----------



## Serios

Quote:


> Originally Posted by *Shogon*
> 
> If history is a marker for anything


So you have nothing to add.


----------



## warpuck

Quote:


> Originally Posted by *superstition222*
> 
> HBM is superior, overall, to GDDR5. However, competition will heat up with GDDR5X. One analyst predicts that HBM2 will mainly be reserved for the high-end GPUs with GDDR5X for the remainder.


I don't know if this a neutral source? my conclusion is from page 12

http://r.search.yahoo.com/_ylt=A0LEVjELs4RWdpoAjf0PxQt.;_ylu=X3oDMTBydWNmY2MwBGNvbG8DYmYxBHBvcwM0BHZ0aWQDBHNlYwNzcg--/RV=2/RE=1451565964/RO=10/RU=http%3a%2f%2fwww.cs.unc.edu%2f~lin%2fCOMP089H%2fLEC%2fzach.pptx/RK=0/RS=uT2ofUL98MxxUpwbuNB888K3GK0-


----------



## warpuck

Quote:


> Originally Posted by *Shogon*
> 
> If history is a marker for anything HBM isn't going to do much for Nvidia either. We're not exactly limited by memory speeds at this time, and if history repeats Nvidia may have issues with the transition like they did with Fermi (GTX 480). All the bandwitdh in the world helps, but not if the core is the hinderance.
> 
> What will be interesting is what the node shrink and new archeticures from both sides will present. HBM is just a side dish to the main course. But hey call me a Nvidia fan all you want as it doesn't mean much of anything to me on a site filled with armchair engineers and undercover shills. I simply buy stuff, and garnish no emotional attachment to the company that gives me the ability to purchase that product.
> 
> Except for SIG though. I'd surely shill out for a 716 Patrol without a 2nd thought.
> 
> You say Nvidia did not let AMD do anything, but then state AMD helped develope GDDR3/5 and used them first. If Nvidia was so innovative as a company it would of done what AMD did, but it didn't. Why would they?
> 
> Nvidia isn't exactly hurting for cash while AMD has to take advanatage of anything Nvidia doesn't have. AMD utilized the first GDDR5 cards to their advanatage, but HBM didn't repeat that surge.
> 
> What's incredible is Nvidia's marketing tactics convincing legions of people they are the right solution when AMD does just fine, if not better compared to Nvidia. So many people get emotional over what a company produces now a days it's both remarkable and hilarious. It's just like sports teams and political parties, but for plastic and circuitry produced overseas.


I really have yet to come across that something makes me need more CPU and/or GPU. Wanting is a different story.
I think maybe this is going to be for a 4k gamebox 1st. Maybe with time will go to the desk top also


----------



## StevenT

My hope the AMD Zen cathing up Intel Haswell, but the gap between us still large.


----------



## superstition222

Quote:


> Originally Posted by *warpuck*
> 
> I really have yet to come across that something makes me need more CPU and/or GPU. Wanting is a different story.


VR

Also, regular 3D game development could use CPUs much more. But, it's easier to just use pretty graphics and increase the pixel count than it is to create deep interactive/immersive virtual worlds with lots of AI, physics, and such.

If I had the money I'd make a game that would use all 8 threads of the FX chips. But I would be weird because most developers are going to want money from dual core (i3) owners and owners of plain quads (i5s). In order to develop a game engine that really pushes a chip like FX to its fullest, though, it needs to be designed from the ground up to need so many threads.


----------



## delboy67

Quote:


> Originally Posted by *superstition222*
> 
> If I had the money I'd make a game that would use all 8 threads of the FX chips. But I would be weird because most developers are going to want money from dual core (i3) owners and owners of plain quads (i5s). In order to develop a game engine that really pushes a chip like FX to its fullest, though, it needs to be designed from the ground up to need so many threads.


Along with the game engine you'd need your own os/ui and 'to the metal' api specific to your exact hardware effectively creating your own console for one game, with all the open standards I suppose it could be doable but not realistic. I do hear what your saying though, up until dx12 the fact only one core can actually talk to the gpu is a disgrace. Dx12 should unleash some of the idle power in modern cpus right when the new 16/14nm gpus arrive so good times ahead, Arma3 online with solid 60fps


----------



## superstition222

Quote:


> Originally Posted by *delboy67*
> 
> Along with the game engine you'd need your own os/ui and 'to the metal' api specific to your exact hardware effectively creating your own console for one game, with all the open standards I suppose it could be doable but not realistic.


There is a specific gaming market that could be successfully targeted with such a game, one that has been badly exploited and burned by a certain large company.


----------



## Olivon

Some others findings by dresdenboy :

AMD Zeppelin CPU codename confirmed by patch and perhaps 32 cores per socket for Zen based MPUs, too

http://dresdenboy.blogspot.de/2016/02/amd-zeppelin-cpu-codename-confirmed-by.html


----------



## ku4eto

Quote:


> Originally Posted by *Olivon*
> 
> Some others findings by dresdenboy :
> 
> AMD Zeppelin CPU codename confirmed by patch and perhaps 32 cores per socket for Zen based MPUs, too
> 
> http://dresdenboy.blogspot.de/2016/02/amd-zeppelin-cpu-codename-confirmed-by.html


If i understood right, it means 4 cores per module?


----------



## Cyrious

Quote:


> Originally Posted by *ku4eto*
> 
> If i understood right, it means 4 cores per module?


Appears so. 1 cluster of cores ("Core complex") apparently contains 4 cores, all joined together by a common L3 cache.

The question now is: How many core clusters can AMD shove on a single die?


----------



## ku4eto

I think this raises another question. How will the 4 cores split the L3 cache, and how will this affect latency and FPU's.


----------



## STEvil

I thought it was obvious since a long time ago each module was 4 cores?


----------



## ku4eto

Quote:


> Originally Posted by *STEvil*
> 
> I thought it was obvious since a long time ago each module was 4 cores?


I am not sure, i think i may have missed that. At least this is now a confirmation, before it could have been only rumor.


----------



## Cyrious

Quote:


> Originally Posted by *ku4eto*
> 
> I think this raises another question. How will the 4 cores split the L3 cache, and how will this affect latency and FPU's.


Well (someone correct me if I'm wrong) AMD did develop dynamic resource sharing of the L2 cache for their post Piledriver cores, so it would be logical for them to scale that to the common L3 cache, where the core(s) that need it would get a larger slice of L3, while cores that dont need it get a smaller slice, with it shifting around as needed.

Latency could be interesting. My guess on how it would work is that the latency to the local L3 cache is relatively low, while a snoop of the other L3 caches would incur a higher relative latency penalty, as they'd essentially be "L3.5" caches for cores not directly attached to them. As a result, the memory hierarchy would become: L1 (D/I)->L2->Local L3-> Distal L3(.5)->L4 (if any)->Main memory. The FPUs would end up seeing maximum performance only from the local L3 cache, with a performance hit if checking the other L3 caches is done.


----------



## LuckyStarV

I expect that it would be like how the large Intel chips are like with a ring bus (or other interconnect) linking separate core clusters/l3 cache.

Intel right now has 2 cores having thier own L3 which is then shared among all the cores via 2-3 ring buses.Depending on good thier bus is and how they implement it, the performance hit from accessing other L3 cache might not be so bad.

Biggest question I see is if Zen will have half clusters like 2 cores or 6 or 10.


----------



## escksu

AMD's last chance, better be something good. Else can throw away.


----------



## Faithh

Quote:


> Originally Posted by *ku4eto*
> 
> I think this raises another question. How will the 4 cores split the L3 cache, and how will this affect latency and FPU's.


Just as usual? The L3 cache has always been shared across all the cores. You meant the L2 cache, which was shared between the two ALU clusters in a single module. The only difference then is a single cluster gets the full l2 cache for herself.


----------



## prjindigo

Quote:


> Originally Posted by *2010rig*
> 
> What is their answer for Skylake?


Being 80% as fast at half the cost is...you'll admit... a fairly good answer.


----------



## amd-dude

Come on red team, give me something decent and I'm back with y'all. Definitely won't be the first to buy it again this time though like Bulldozer. I'll wait about 2 weeks to get some real numbers then I'll buy in again.


----------



## Themisseble

AMD is pretty quiet about ZEN core info... they are only saying 40%+ IPC over excavator.


----------



## looncraz

Quote:


> Originally Posted by *Faithh*
> 
> Just as usual? The L3 cache has always been shared across all the cores. You meant the L2 cache, which was shared between the two ALU clusters in a single module. The only difference then is a single cluster gets the full l2 cache for herself.


We know from the Linux patch that Zen has a non-standard LLC layout, so it probably has multiple L3s in many configurations (probably octo-core+)...


----------



## looncraz

Quote:


> Originally Posted by *Themisseble*
> 
> AMD is pretty quiet about ZEN core info... they are only saying 40%+ IPC over excavator.


We know quite a bit from the public patches, though. We know the general capabilities of its ten pipelines, for example.

We have some of their papers and patents that suggest seriously low cache latencies as well (as good as Haswell, even).

We know that they are targeting "closer to 4Ghz" and 95W as well. We just don't know for which SKUs...

In fact, I think we have more information now about Zen's internals than we had for Bulldozer as this point. It wasn't until they released the optimization guide that anyone started to put the math together and realize that Bulldozer, by design, would have trouble with single threaded applications.

The great thing here, is that we have this much information... and those of us who do that math actually have a hard time figuring out how Zen will only be 40% faster than Excavator... the only explanations being that AMD went the fastest, crudest, path possible in the rest of the core in order to get a quality product to market... which would explain why they already know the next iteration will be 15% faster (itself quite a nice jump in performance).


----------



## Robenger

Quote:


> Originally Posted by *looncraz*
> 
> The great thing here, is that we have this much information... and those of us who do that math actually have a hard time figuring out how Zen will only be 40% faster than Excavator


Meaning you think it will be faster?


----------



## Themisseble

Quote:


> Originally Posted by *looncraz*
> 
> We know quite a bit from the public patches, though. We know the general capabilities of its ten pipelines, for example.
> 
> We have some of their papers and patents that suggest seriously low cache latencies as well (as good as Haswell, even).
> 
> We know that they are targeting "closer to 4Ghz" and 95W as well. We just don't know for which SKUs...
> 
> In fact, I think we have more information now about Zen's internals than we had for Bulldozer as this point. It wasn't until they released the optimization guide that anyone started to put the math together and realize that Bulldozer, by design, would have trouble with single threaded applications.
> 
> The great thing here, is that we have this much information... and those of us who do that math actually have a hard time figuring out how Zen will only be 40% faster than Excavator... the only explanations being that AMD went the fastest, crudest, path possible in the rest of the core in order to get a quality product to market... which would explain why they already know the next iteration will be 15% faster (itself quite a nice jump in performance).


+ 40% is no much on FPU... this shouldnt be a problem.
+ 40% for integer... well in some cases A10 7850K OC is doing quite well against older i5 3570K.

Also excavator proves to be very good on AVX2 instructions.
Any athlon x4 845 review?


----------



## looncraz

Quote:


> Originally Posted by *Robenger*
> 
> Meaning you think it will be faster?


.

No, but I think AMD's claim of 40% actually sounds reasonable, whereas many others think it is an "up to" claim... meaning that 40% may be limited to one or two specific cases.

40% seems more like a baseline of improvement, with a few specific exceptions. Faster caches, four dedicated wide-decoders, ten independently addressable pipelines, several generations worth of process improvements, inclusion of all of Excavator's power saving dynamics.... yeah, it's gonna be interesting. I would not be surprised to see more than 40%, but I'd really not be all that surprised to see 35%, either...


----------



## looncraz

Quote:


> Originally Posted by *Themisseble*
> 
> + 40% is no much on FPU... this shouldnt be a problem.
> + 40% for integer... well in some cases A10 7850K OC is doing quite well against older i5 3570K.
> 
> Also excavator proves to be very good on AVX2 instructions.
> Any athlon x4 845 review?


Absolutely, Zen needs to improve more with floating point than with integer. 40% greater integer will pit Zen against Skylake, 40% faster FPU will be acceptable, but not enough to reach Ivy Bridge... which averages to Haswell IPC


----------



## SCollins

Quote:


> Originally Posted by *looncraz*
> 
> Absolutely, Zen needs to improve more with floating point than with integer. 40% greater integer will pit Zen against Skylake, 40% faster FPU will be acceptable, but not enough to reach Ivy Bridge... which averages to Haswell IPC


depends on which instruction sets are being compared on the FPU side, the excavator FPu isn't not a slouch, when more modern instructions are utilized.


----------



## epic1337

Quote:


> Originally Posted by *SCollins*
> 
> depends on which instruction sets are being compared on the FPU side, *the excavator FPu isn't not a slouch*, when more modern instructions are utilized.


wait what, "isn't not a slouch" meaning it is a slouch?
not not means yes.

sadly modern instructions aren't being utilized by old apps that hardly gets updated.
even most newer apps aren't coded to use modern instructions.

and the reason is pretty simple, its a conflict of interests.
most veteran programmers aren't interested in learning how to use modern instructions.


----------



## flopper

Quote:


> Originally Posted by *looncraz*
> 
> The great thing here, is that we have this much information... and those of us who do that math actually have a hard time figuring out how Zen will only be 40% faster than Excavator... the only explanations being that AMD went the fastest, crudest, path possible in the rest of the core in order to get a quality product to market... which would explain why they already know the next iteration will be 15% faster (itself quite a nice jump in performance).


for PR reasons that are some cool looking numbers,
first a 40% ipc then next generation offers another 15% that is some nice baseline for advertising.


----------



## 2010rig

Quote:


> Originally Posted by *flopper*
> 
> for PR reasons that are some cool looking numbers,
> first a 40% ipc then next generation offers another 15% that is some nice baseline for advertising.


They've made similar promises in the past


----------



## ku4eto

Quote:


> Originally Posted by *2010rig*
> 
> They've made similar promises in the past


Yes, and they gain about 10% each...


----------



## Fyrwulf

Quote:


> Originally Posted by *2010rig*
> 
> They've made similar promises in the past


So, I gotta ask because I don't know, is that demonstrably untrue?


----------



## 2010rig

Quote:


> Originally Posted by *Fyrwulf*
> 
> So, I gotta ask because I don't know, is that demonstrably untrue?


Here, see for yourself
http://www.anandtech.com/bench/product/434?vs=697

Clock for clock, the 8350 is not 10% faster than the 8150 in Single Thread performance. In cinebench for example, it's 9% faster with a 400 MHz advantage.

I guess if you go stock vs stock they came close, but that's misleading when one is clocked 400 MHz faster.

My post was merely pointing out that AMD has made similar promised gains in the past. We didn't get anything else that replaced the 8350, so who knows.


----------



## looncraz

Quote:


> Originally Posted by *Fyrwulf*
> 
> So, I gotta ask because I don't know, is that demonstrably untrue?


No, actually they delivered on that quite closely.

Piledriver was ~10% faster than Bulldozer.
Steamroller was ~10% faster than Piledriver, but the lack of L3 resulted in only a 6.7% average boost
Excavator is ~10% on average faster as well.

So, pretty much, they delivered as they claimed all those years ago. Albeit, just barely. They ran into a lot of difficulty scaling the IPC, which makes sense with only two ALUs...

Also, I'm talking IPC only, if you include full performance, meaning also including multi-threading, they easily met those targets. Multi-core efficiency was vastly improved.


----------



## epic1337

indeed, so far their recent slides aren't as exaggerated as their old ones back before they launched steamroller, Ops(operations) per cycle is another term for instructions per cycle.



but hey, if we use this as reference to AMD's habits, then Zen's promised 40% IPC increase might as well fall short of 20%.
i would rather appreciate it if they'd be more modest and reserved with their estimates.


----------



## PiOfPie

Quote:


> Originally Posted by *epic1337*
> 
> indeed, so far their recent slides aren't as exaggerated as their old ones back before they launched steamroller, Ops(operations) per cycle is another term for instructions per cycle.
> 
> 
> 
> but hey, if we use this as reference to AMD's habits, then Zen's promised 40% IPC increase might as well fall short of 20%.
> i would rather appreciate it if they'd be more modest and reserved with their estimates.


All in all, this wasn't a terribly far-off estimate, either.

SR was 20% over Piledriver and 30% over Bulldozer, just the IPC gains were all eroded by the clock hit.

Plus this slide dates to November 2012, which IIRC was when Steamroller A was still on the table.


----------



## Kuivamaa

Quote:


> Originally Posted by *2010rig*
> 
> Here, see for yourself
> http://www.anandtech.com/bench/product/434?vs=697
> 
> Clock for clock, the 8350 is not 10% faster than the 8150 in Single Thread performance. In cinebench for example, it's 9% faster with a 400 MHz advantage.


This is false. For single thread workloads both 8150 and 8350 turbo at 4.2 . They both jump around a bit (I recall Anand finding that 8150 was jumping from 3.9 ST baseline from which it never dropped, to 4.2),but effectively those two run this test at the same clock.


----------



## epic1337

Quote:


> Originally Posted by *PiOfPie*
> 
> All in all, this wasn't a terribly far-off estimate, either.
> 
> SR was 20% over Piledriver and 30% over Bulldozer, just the IPC gains were all eroded by the clock hit.
> 
> Plus this slide dates to November 2012, which IIRC was when Steamroller A was still on the table.


i think you're contradicting what the others have said about the IPC gains.

exactly, they had exaggerated slides during the designing of steamroller, they were overly optimistic.


----------



## 2010rig

Quote:


> Originally Posted by *Kuivamaa*
> 
> This is false. For single thread workloads both 8150 and 8350 turbo at 4.2 . They both jump around a bit (I recall Anand finding that 8150 was jumping from 3.9 ST baseline from which it never dropped, to 4.2),but effectively those two run this test at the same clock.


I forgot the 8150 turboed to 4.2. If they're both truly running at 4.2 during this test, then yeah the improvements are ~9%.

Piledriver fixed a lot of the mistakes with Bulldozer, was this AMD's plan all along with their 10% - 15% improvement all along?


----------



## looncraz

Quote:


> Originally Posted by *epic1337*
> 
> indeed, so far their recent slides aren't as exaggerated as their old ones back before they launched steamroller, Ops(operations) per cycle is another term for instructions per cycle.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> 
> 
> 
> 
> but hey, if we use this as reference to AMD's habits, then Zen's promised 40% IPC increase might as well fall short of 20%.
> i would rather appreciate it if they'd be more modest and reserved with their estimates.


That was referring to the module penalty improvement - and they DEFINITELY fixed that... they just had other bottlenecks they could never solve so that we could see the full benefit.

I still don't think AMD knows where the bottleneck is in the module, which seems pretty obvious by their throw-everything-at-it-and-see-what-works approach. Of course, my back of the envelope calculations I made when Bulldozer's cache architecture was revealed indicated that they would not be able to get any performance out of it tells me that the cache policy and latencies were mostly to blame. No idea who thought a write-through cache with high latencies was a good idea, but they should be lynched... we've known for DECADES that would not work well.


----------



## Kuivamaa

Quote:


> Originally Posted by *2010rig*
> 
> I forgot the 8150 turboed to 4.2. If they're both truly running at 4.2 during this test, then yeah the improvements are ~9%.
> 
> Piledriver fixed a lot of the mistakes with Bulldozer, was this AMD's plan all along with their 10% - 15% improvement all along?


The bulldozer saga is full of redesigns so only AMD people know. Eg. Both SR and (especially) XV were different beasts than what their initial designs were. CMT was a mistake anyway, and it was nehalem that killed it before it was even born , not even sandy bridge. Personally I am happy to see AMD going SMT.


----------



## superstition222

Quote:


> Originally Posted by *Kuivamaa*
> 
> CMT was a mistake anyway ... Personally I am happy to see AMD going SMT.


Is there actual proof of this? From what I've read it has advantages over SMT, but also drawbacks. I would like to see definitive proof that it is a clearly inferior design, not just because AMD's chips may have flaws that prevent its full optimality.
Quote:


> Originally Posted by *looncraz*
> 
> No idea who thought a write-through cache with high latencies was a good idea, but they should be lynched... we've known for DECADES that would not work well.


Did AMD make the cynical decision to use clockspeed as a marketing tool since it wasn't in the position to compete with Intel on IPC? High cache latency is associated with being able to have high clockspeeds, right?


----------



## Faithh

Quote:


> Originally Posted by *superstition222*
> 
> Is there actual proof of this? From what I've read it has advantages over SMT, but also drawbacks. I would like to see definitive proof that it is a clearly inferior design, not just because AMD's chips may have flaws that prevent its full optimality.


We've only seen CMT from AMD, so basing on their CMT implentation the only advantage you get over SMT is more throughput at the cost of more space/power etc.


----------



## looncraz

Quote:


> Originally Posted by *superstition222*
> 
> Is there actual proof of this? From what I've read it has advantages over SMT, but also drawbacks. I would like to see definitive proof that it is a clearly inferior design, not just because AMD's chips may have flaws that prevent its full optimality.
> ...
> Did AMD make the cynical decision to use clockspeed as a marketing tool since it wasn't in the position to compete with Intel on IPC? High cache latency is associated with being able to have high clockspeeds, right?


I'll address both of these.

CMT is inherently more difficult to implement than SMT - and much more difficult to extract all potential performance. The reason is that shared resources require locking out one core while another uses it - or making the resource segment... in which case you may as well have had dedicated resources from the beginning. The ideal scenario for CMT is using two cores to execute one thread. AMD's design looked to be going that way originally, and there was talk about that back in the day, but it appears they couldn't make it happen or people just misunderstood what was being done.

As far as Zen trying to sell based on clockspeed alone - doubtful. It seemed that AMD genuinely believed that they would have higher IPC with Bulldozer than they saw. They could have fixed IPC by adding another ALU per core or making the FPU a bit wider, but they never really did that (beyond adding some ALU capabilities to the AGUs and a little FPU fiddling).

What we do know, though, is that Bulldozer's front-end, which is probably the closest thing to what Zen will use, can handle ~180% of Bulldozer's single core performance without modification. That's an interesting number as it is effectively how much more IPC we should expect from Zen over Bulldozer... and Zen's disclosed front-end details look awfully similar to Bulldozer's... which already handles two threads with four decoders...


----------



## Pro3ootector

ZEN High End 'Exascale' CPU, 1-4 Socket (1P-4P) - Specs As Per CERN

32 ZEN x86 Core, 6-wide
12 8KB L0 Cache (4KB per core)
2 MB L1 D-Cache (64KB per core)
2 MB L1 I-Cache (64 KB per core)
16 MB L2 Cache (512 KB per core)
64? MB L3 Cache (8MB cluster per quad unit)
576-bit Memory Controller (8×72-bit, 64-bit + 8-bit ECC)
204.8 GB/s via DDR4-3200 (ECC Off)
170.6 GB/s via DDR4-2666 (ECC On)

ZEN High End Exascale APU, 1-2 Socket (1P-2P) - Rumored Specs From Fast Forward

16 ZEN x86 Core, 6-wide
64 KB L0 Cache (4KB per core)
1 MB L1 D-Cache (64KB per core)
1 MB L1 I-Cache (64 KB per core)
8 MB L2 Cache (512 KB per core)
No L3 Cache
288-bit CPU Memory Controller (4×72-bit, 64-bit + 8-bit ECC)
102.4 GB/s via DDR4-3200 (ECC Off)
85.3 GB/s via DDR4-2666 (ECC On)
102.4 GB/s between CPU and GPU via GMI
~2000-core Polaris GPU
2048-bit GPU Memory Controller
8 GB HBM2 SGRAM Memory (2 chips at 4GB)
512 GB/s GPU Bandwidth

http://vrworld.com/2016/02/12/cern-confirms-amd-zen-high-end-specifications/


----------



## lutjens

Those specs look promising if they prove to be accurate. I hope that AMD cattle prods Intel straight in their lazy and complacent posteriors and forces Intel to actually reach to their top shelf and release a truly exceptional product that isn't contrained by a ridiculously low, archaic, 10-year old TDP for a change...


----------



## superstition222

Quote:


> Originally Posted by *looncraz*
> 
> The ideal scenario for CMT is using two cores to execute one thread. AMD's design looked to be going that way originally, and there was talk about that back in the day, but *it appears they couldn't make it happen or people just misunderstood what was being done.*


Any more info about this?


----------



## AmericanLoco

Quote:


> Originally Posted by *superstition222*
> 
> Any more info about this?


Way back on the XtremeSystems forum (probably elsewhere too), there was a huge amount of chatter about AMD working on "reverse hyperthreading". Maybe that was the original goal of the construction cores? I think it's probably what looncraz said though: A person close to someone inside AMD heard some information regarding Bulldozer, misinterpreted it, and started a false rumor.


----------



## looncraz

Quote:


> Originally Posted by *superstition222*
> 
> Any more info about this?


AmericanLoco nailed it.

AMD actually teased us at one point about, even before Phenom II, IIRC, but I can't remember for the life of me what they called it back then.

Today, we actually have working designs, called VISC. And, some claim, Intel's Skylake (very doubtful).

I have a high-level 'design' for one:


----------



## STEvil

"Reverse HT" will be a thing eventually, when cpu designers learn to start looking at a cpu as a pool of resources rather than lanes of a highway.

I expect to see it in mobile first, as an evolution of "big.little"


----------



## lolerk52

I have my doubts about reverse hyperthreading happening.

Didn't Intel research it? With their resources, you would think they would find a way.


----------



## Pro3ootector

Quote:


> Originally Posted by *lolerk52*
> 
> I have my doubts about reverse hyperthreading happening.
> 
> Didn't Intel research it? With their resources, you would think they would find a way.


There was something like this, it was called "Mitosis" / "Core Multiplexing" it was rumored in the days before K10 arrived. Both Intel and AMD were trying to push this tech.

This is article in short explains how in short it supposedly could work:

http://pclab.pl/art21258-2.html


----------



## PiOfPie

Quote:


> Originally Posted by *Pro3ootector*
> 
> There was something like this, it was called "Mitosis" / "Core Multiplexing" it was rumored in the days before K10 arrived. Both Intel and AMD were trying to push this tech.
> 
> This is article in short explains how in short it supposedly could work:
> 
> http://pclab.pl/art21258-2.html


Google Translate's kinda bad, so I'm still kind of iffy on how this would work. My understanding is:

1) Single-thread code enters the CPU
2) Something in the CPU analyzes the code, breaks it down into smaller components like a mathematical factoring problem, and then sends each component to a core for independent processing

Speaking from intuition as a guy without a lick of computer science background, sounds insanely difficult to do.

Speaking as a biology guy, I don't get why it's called mitosis. Mitosis would imply doubling something and then splitting into two identical copies.


----------



## ku4eto

Quote:


> Originally Posted by *PiOfPie*
> 
> Google Translate's kinda bad, so I'm still kind of iffy on how this would work. My understanding is:
> 
> 1) Single-thread code enters the CPU
> 2) Something in the CPU analyzes the code, breaks it down into smaller components like a mathematical factoring problem, and then sends each component to a core for independent processing
> 
> Speaking from intuition as a guy without a lick of computer science background, sounds insanely difficult to do.
> 
> Speaking as a biology guy, I don't get why it's called mitosis. Mitosis would imply doubling something and then splitting into two identical copies.


It doubles the code, sending the same code to 2 different cores, but Core 0 analyzes first half, and Core 1 2nd half. But how it works in reality... probably involves some kind of dark magic.


----------



## SpeedyVT

Quote:


> Originally Posted by *ku4eto*
> 
> It doubles the code, sending the same code to 2 different cores, but Core 0 analyzes first half, and Core 1 2nd half. But how it works in reality... probably involves some kind of dark magic.


Only some integer based instructions can be done this way. Floating points can't be done this way. So an FPU always has to be beefy. A floating point is always based on approximates and it's never accurate splitting it's load. The only possibility of a FPU that can be utilized together is one that can divide into smaller ones to handle more simultaneous data. While it's smaller and taking longer because it divided itself it could be enough in lighter FPU instructions like FP16 and basic FP32.

This is still a win with both aspects because IPC isn't something as simple as the whole processor. IPC can refer to a specific instruction set in performance. This could easily make up a huge gap in different in performance easy of stress for more difficult loads.


----------



## ku4eto

Quote:


> Originally Posted by *SpeedyVT*
> 
> Only some integer based instructions can be done this way. Floating points can't be done this way. So an FPU always has to be beefy. A floating point is always based on approximates and it's never accurate splitting it's load. The only possibility of a FPU that can be utilized together is one that can divide into smaller ones to handle more simultaneous data. While it's smaller and taking longer because it divided itself it could be enough in lighter FPU instructions like FP16 and basic FP32.


I am aware of the FP issue, ran into this problem during my C# programming courses. You can't always have the exact aproximiate value, it needs to be rounded for this stuff. Also, "lighter" instructions is relative, if its enough to run the application, why not? Games would run into issues, same with server based work that relies on FP. If this is used for Database servers or Web servers, it should be good enough actually, but the problem is the DB's are multithreaded enough...


----------



## SpeedyVT

Quote:


> Originally Posted by *ku4eto*
> 
> I am aware of the FP issue, ran into this problem during my C# programming courses. You can't always have the exact aproximiate value, it needs to be rounded for this stuff. Also, "lighter" instructions is relative, if its enough to run the application, why not? Games would run into issues, same with server based work that relies on FP. If this is used for Database servers or Web servers, it should be good enough actually, but the problem is the DB's are multithreaded enough...


You can offset this by forcing multiple floating points to be handled simultaneously even if it's slow at handling it. You'd have either one big FPU and many tiny FPUs, but knowing and differentiating between the two as an instruction set is a problem. However I do believe you could use a cache to save the large instruction set with various outs of the same code from one original instruction to fast process any further calls of that instruction. You'd need at least 1-2mb of cache for the large FPU and 256kb for the small. Someone specified that it's possible that AMD for Zen is using some caching for power saving could be similar application to the FPU as well. Just a thought.


----------



## epic1337

Quote:


> Originally Posted by *SpeedyVT*
> 
> Only some integer based instructions can be done this way. Floating points can't be done this way. So an FPU always has to be beefy. A floating point is always based on approximates and it's never accurate splitting it's load. The only possibility of a FPU that can be utilized together is one that can divide into smaller ones to handle more simultaneous data. While it's smaller and taking longer because it divided itself it could be enough in lighter FPU instructions like FP16 and basic FP32.


this is true, theres a lot of different kinds of FP pipelines.

a few examples:
32+32 fused FP64 (can handle two FP32, also most common)
full FP64 (normally cannot handle FP32)
full FP32 (normally cannot handle FP64)
16+16 fused FP32 (can handle two FP16)
etc.


----------



## looncraz

Quote:


> Originally Posted by *ku4eto*
> 
> It doubles the code, sending the same code to 2 different cores, but Core 0 analyzes first half, and Core 1 2nd half. But how it works in reality... probably involves some kind of dark magic.


It's actually not as complicated as some like to think, the biggest hurdle is trying to share register data between cores fast enough that using other cores' pipelines is basically just like having those extra pipelines in the same core.

The only way I see this working is with an ultra-wide core and a cascading scheduler.


----------



## SpeedyVT

Quote:


> Originally Posted by *looncraz*
> 
> It's actually not as complicated as some like to think, the biggest hurdle is trying to share register data between cores fast enough that using other cores' pipelines is basically just like having those extra pipelines in the same core.
> 
> The only way I see this working is with an ultra-wide core and a cascading scheduler.


Isn't there a potential issue with this causing corruption? Higher error rate.


----------



## looncraz

Quote:


> Originally Posted by *SpeedyVT*
> 
> Isn't there a potential issue with this causing corruption? Higher error rate.


That's always a potential issue with any design. Path-finding to the proper execution unit is more of an issue, though, as well as properly clock & power gating dormant units.

Preserving execution order is a matter of tagging dependent instructions and locking registers to enforce serialization. The schedulers check if the lock is free, assert it, then scheduler the instruction. Actually a fairly simple thing to accomplish in hardware (the schedulers can implement the locks locally). This would probably not even be much of a performance cost, if any, outside of line delays. Power usage is, again, a major issue, though.

SIMD instructions would be a MAJOR beneficiary in this scenario, as you might imagine. Some programs will simply not scale well, and they will achieve nearly idealized performance based purely on the execution unit capabilities. Some programs will continue to scale each time you make the core wider.

This is along the lines I thought Bulldozer was headed. And, they may well have tried it, but couldn't make it happen. It's obvious that this core would need shared caches and extremely good cross-core communications and very low latencies to work if you shrink the number of execution units. By necessity, you have to allow stalls to happen right in the scheduler, so you will need some way to detect that and push incoming instructions to other EU groups.

The sweet thing is that you can pretty much set however many OS threads you want with this design, you just need enough bits to tag the thread ID for reorder (and lock holder tags, if needed). Not so obviously, I intend for this core to have four threads, and one decoder complex.


----------



## ku4eto

The latency would probably be horrible with normal cache memory, if its the eDram or HBM, it will probably b far better.


----------



## looncraz

Quote:


> Originally Posted by *ku4eto*
> 
> The latency would probably be horrible with normal cache memory, if its the eDram or HBM, it will probably b far better.


Both of those memory technologies are far inferior to what is used for CPU caches.

HBM has 48ns latencies, whereas decent L2 cache latencies are below 10ns, and can be below 1ns when performance is extremely critical.

HBM might make sense as a L4 cache, but not L3, L2, and especially not anywhere inside the core complex.


----------



## ku4eto

HBM2 on the SOC should work for L3, the bigger the cache tier, the lower the speeds it needs.


----------



## Particle

Quote:


> Originally Posted by *looncraz*
> 
> Both of those memory technologies are far inferior to what is used for CPU caches.
> 
> HBM has 48ns latencies, whereas decent L2 cache latencies are below 10ns, and can be below 1ns when performance is extremely critical.
> 
> HBM might make sense as a L4 cache, but not L3, L2, and especially not anywhere inside the core complex.


So long as HBM is placed before system DRAM, it makes sense as a cache layer regardless of how many layers are built into the CPU itself. Some processors today only have L1 and L2 and still perform just fine. It could certainly take over for L3 in many circumstances, though a high performance CPU these days would almost certainly have L3 on-die. I don't think it would be quite as critical as you might imagine.


----------



## SpeedyVT

Quote:


> Originally Posted by *ku4eto*
> 
> HBM2 on the SOC should work for L3, the bigger the cache tier, the lower the speeds it needs.


It's more complicated than that. Latency in access is huge in comparison to on chip cache. L3 cache with support cache of HBM I would imagine is possible.


----------



## looncraz

Quote:


> Originally Posted by *Particle*
> 
> So long as HBM is placed before system DRAM, it makes sense as a cache layer regardless of how many layers are built into the CPU itself. Some processors today only have L1 and L2 and still perform just fine. It could certainly take over for L3 in many circumstances, though a high performance CPU these days would almost certainly have L3 on-die. I don't think it would be quite as critical as you might imagine.


HBM makes sense only as system buffer of even as system RAM itself. As an L3, it would force less than desirable cache policies due to its access latency - you wouldn't be able to use an inclusive policy for the L2, so you get minimal inter-core benefits (L2 snooping).

Now, if you make the L2s large and fast enough, they can make up for that deficit, then the large size enabled by HBM would begin to only/mostly infer benefits. But we're talking the L2 being nearly as fast as today's L1 caches and as large as the L3s.


----------



## STEvil

HBM would only be L3 if there was no dedicated L3, else you are more likely to find it as L4.


----------



## Particle

I think we're all agreeing?


----------



## epic1337

Quote:


> Originally Posted by *looncraz*
> 
> HBM makes sense only as system buffer of even as system RAM itself. As an L3, it would force less than desirable cache policies due to its access latency - you wouldn't be able to use an inclusive policy for the L2, so you get minimal inter-core benefits (L2 snooping).
> 
> Now, if you make the L2s large and fast enough, they can make up for that deficit, then the large size enabled by HBM would begin to only/mostly infer benefits. But we're talking the L2 being nearly as fast as today's L1 caches and as large as the L3s.


large and fast L2 is never really a good thing, the previous attempts of intel's L2 had shown that larger L2 just wastefully increase power consumption and heat output.

yorkfield - 6/3/2/1MB L2
wolfdale-3M - 3/2/1MB L2
*
conroe - 512KB L2
clarkdale - 512KB L2
*
Lynnfield - 256KB L2
bloomfield - 256KB L2
gulftown - 256KB L2
sandy bridge - 256KB L2
ivy bridge - 256KB L2
haswell - 256KB L2
broadwell - 256KB L2
skylake 256KB L2

what intel have been doing now is the opposite.
making the cache more efficient (less power consumption, less latency, less cache miss, faster throughput, etc.) as opposed to larger capacities.

so in my opinion, HBM "can" be used as an L3 effectively, and without changing the L2.
but it would be even better if HBM were to be used as an L4, the advantages varies from "reducing L3 capacity to save die space" to "minimizing DRAM access instances".


----------



## looncraz

Quote:


> Originally Posted by *epic1337*
> 
> large and fast L2 is never really a good thing, the previous attempts of intel's L2 had shown that larger L2 just wastefully increase power consumption and heat output.
> 
> 
> Spoiler: Warning: Spoiler!
> 
> 
> 
> yorkfield - 6/3/2/1MB L2
> wolfdale-3M - 3/2/1MB L2
> *
> conroe - 512KB L2
> clarkdale - 512KB L2
> *
> Lynnfield - 256KB L2
> bloomfield - 256KB L2
> gulftown - 256KB L2
> sandy bridge - 256KB L2
> ivy bridge - 256KB L2
> haswell - 256KB L2
> broadwell - 256KB L2
> skylake 256KB L2
> 
> what intel have been doing now is the opposite.
> making the cache more efficient (less power consumption, less latency, less cache miss, faster throughput, etc.) as opposed to larger capacities.
> 
> 
> 
> so in my opinion, HBM "can" be used as an L3 effectively, and without changing the L2.
> but it would be even better if HBM were to be used as an L4, the advantages varies from "reducing L3 capacity to save die space" to "minimizing DRAM access instances".


My statement was made with that knowledge specifically in mind. We've gone smaller because it's easier to manage and saves power. That makes it less advantageous to have a larger, slow, L3. There's a reason why L3s haven't grown much per core for many years - we found the point of diminishing returns extremely quickly. The draw for HBM is that it has high capacity and is reasonably fast with just one stack. The problem comes when you consider what modern CPUs use an L3 to do...

L3, which only adds a few percent of performance for single threaded programs, is very helpful when multiple cores work on the same data. It's how data is shared between cores. Having to go out to a high-latency L3 would completely erase the majority of benefit from the L3 for this scenario, even if it were 1GB in size. It would help when the core would otherwise be stalled for data coming from system memory, but that's about it. Of course, allowing it to be used as system RAM (or for graphics or fast database storage) would still be very useful, and there are undoubtedly cases where speedups would be immense.


----------



## epic1337

thats why i said its still fine to use HBM as an L3, the purpose of the L3 is a buffer for the system RAM to begin with, it wasn't the usual "keep the pipelines full" sort of purpose, thats what L1 and L2 are for.

that is to say, even if the L3 were to be slower than it is but is dramatically large, large enough to fit the entire work sheet in it, then the system RAM's performance penalty would be made negligible.
the L1 and L2 is already sufficient in keeping the pipelines fed throughout the cycles, the cache hit of L1+L2 in particular is above 90%, which means only 10% of the time will it end up looking into the L3 or system RAM.

but of course to maximize performance, they could keep the current L3 at a smaller package, that is to say "save more power and die space" since an L4 HBM is more than enough to substitute for the rest.
in both ways, they can make two SKUs, less cost & more power efficient = L3 HBM, more performance = L4 HBM, e.g. the mainstream (LGA11xx) gets L3 HBM, the enthusiast (LGA20xx) gets L4 HBM.

now i'd like to bring up a thing to discuss, will HBM enable them to offload the IMC off the die?
of course an external IMC will have a dramatically higher access latency, making it even far less efficient, but an off-die IMC will enable us to choose how "wide" will it be.

since HBM can be supplied at various capacities:
HBM1 = 4Hi-Stack 1GB (4x256MB)
HBM2 = 2Hi-Stack 2GB (2x1024MB)
HBM2 = 4Hi-Stack 4GB (4x1024MB)
HBM2 = 8Hi-Stack 8GB (8x1024MB)

it can be said that the HBM itself can substitute as the main System RAM, that is to say, a huge cache for the entire DRAM like how Intel SRT's SSDs works with HDDs.
if they were to offload the IMC to the chipset for example, the CPUs themselves can be made cheaper, while the CPU binnings would be straight forward (by HBM capacity, e.g. i3 = 2GB, i5&i7 = 4GB, etc.)
further more, it could allow us to have a PCI-E based IMC card, or rather an infinitely scaling DRAM expansion card.


----------



## SpeedyVT

Quote:


> Originally Posted by *epic1337*
> 
> large and fast L2 is never really a good thing, the previous attempts of intel's L2 had shown that larger L2 just wastefully increase power consumption and heat output.
> 
> yorkfield - 6/3/2/1MB L2
> wolfdale-3M - 3/2/1MB L2
> *
> conroe - 512KB L2
> clarkdale - 512KB L2
> *
> Lynnfield - 256KB L2
> bloomfield - 256KB L2
> gulftown - 256KB L2
> sandy bridge - 256KB L2
> ivy bridge - 256KB L2
> haswell - 256KB L2
> broadwell - 256KB L2
> skylake 256KB L2
> 
> what intel have been doing now is the opposite.
> making the cache more efficient (less power consumption, less latency, less cache miss, faster throughput, etc.) as opposed to larger capacities.
> 
> so in my opinion, HBM "can" be used as an L3 effectively, and without changing the L2.
> but it would be even better if HBM were to be used as an L4, the advantages varies from "reducing L3 capacity to save die space" to "minimizing DRAM access instances".


There is different ways to achieve L2 cache, AMD's got the L2 cache down if you've noticed the performance of the APUs equaling the FX-Series processors without even trying. You're all together better to skip L3 and just use a cache for commonly used instructions one not tied up like the L3 but shared across the cores. You could use the HBM as FPU cache but not the same as Integer core cache. Any really large wide data that can get away with high latency responses is good with HBM.


----------



## superstition222

The EDRAM L4 in Broadwell C has shown significant efficiency gains over Haswell in some areas.

The Broadwell chips even seem to have greater efficiency for gaming than Skylake, overall.


----------

