# Gaming and Networking



## mouacyk

Were you able to measure any decrease in latency?


----------



## Offler

Only with Wireshark which is packet analyzer software.

It went from 1ms to 0,12-0,24ms ping to router when comparing onboard nic (before tuning) and expensive intel nic (after tuning).

As i edited bit later, it helps much more on CPU/RAM level than with networking. Threads of a 3d engine which are dedicated to networking are in CPU utilization graph showing taller (higher CPU utilization) and shorter (less time required) spikes. It indicates (just indicates, does not prove) that game engine was waiting for the data to process or the whole processing took some more time.

Therefore, benefits are measurable, but i would not say perceivable.


----------



## mouacyk

I suppose this could mean the game engine can do less interpolation as more granular network data is available. Overall, this will output more server-accurate rendering.


----------



## EniGma1987

Jumbo packets help with larger data transfers and utilizing high throughput connections, but does having it enabled actually hurt small data transfer performance? I was under the impression that the network card+switch+whatever actually processed a jumbo sized packet at once and did not require breaking it down to smaller sizes to process. So what does it matter what size the packets are if they are all processed in the same time frame? Sure it is less "efficient" when you are doing smaller transfers, but if it is processed the same then what does it really hurt?


----------



## Offler

mouacyk said:


> I suppose this could mean the game engine can do less interpolation as more granular network data is available. Overall, this will output more server-accurate rendering.


Saying yes would be misleading.

Actually it means that CPU spents less time waiting and processing the packets after arrival to local network, considering modem, router, NIC, CPU, RAM and at last game engine. However the time from server to client would be 39 060 microseconds in total, while unoptimized time was 40 000 microseconds. What it actually did changed is that after packet arrival, it does not take 1000 microseconds to process it by the engine and CPU, but just 60.

In the end, it helps mainly to FPS and CPU because engine spends less time waiting until the packets are processed. But its in level well beyond human perception.




EniGma1987 said:


> Jumbo packets help with larger data transfers and utilizing high throughput connections, but does having it enabled actually hurt small data transfer performance? I was under the impression that the network card+switch+whatever actually processed a jumbo sized packet at once and did not require breaking it down to smaller sizes to process. So what does it matter what size the packets are if they are all processed in the same time frame? Sure it is less "efficient" when you are doing smaller transfers, but if it is processed the same then what does it really hurt?


That is specific for differrent types of internet connection. I can enable Jumbo packets, and since maximum transfer unit (MTU) increases from 1500 bytes to 9000bytes, network devices will use 1/6 less of their computing power to work with packet headers (or overhead).

But because I have ADSL 2+ internet connection (quite ancient, but with decent ping), each packet is broken down from 1500 MTU to much smaller 48byte ATM packets which are transferred over DSL line and then are reassembled to a normal packet and continue on its way. Actually the connection i use does not allow 1500 MTU size, but just 1492 (officialy, but its actually even less).

Attempts to transfer bigger packets may result in packet fragmentation. That means that original packet has to be broken down to smaller packets which arrive to the destination separately. If one of two packet fragments is lost, packet as a whole gets lost too.

I am not sure that Jumbo Packets can get thought internet connection line as a whole, or get fragmented. Fact is that most games use smaller sommunication packets exactly to avoid fragmentation or packet loss.


----------



## EniGma1987

Offler said:


> That is specific for differrent types of internet connection. I can enable Jumbo packets, and since maximum transfer unit (MTU) increases from 1500 bytes to 9000bytes, network devices will use 1/6 less of their computing power to work with packet headers (or overhead).
> 
> But because I have ADSL 2+ internet connection (quite ancient, but with decent ping), each packet is broken down from 1500 MTU to much smaller 48byte ATM packets which are transferred over DSL line and then are reassembled to a normal packet and continue on its way. Actually the connection i use does not allow 1500 MTU size, but just 1492 (officialy, but its actually even less).
> 
> Attempts to transfer bigger packets may result in packet fragmentation. That means that original packet has to be broken down to smaller packets which arrive to the destination separately. If one of two packet fragments is lost, packet as a whole gets lost too.
> 
> I am not sure that Jumbo Packets can get thought internet connection line as a whole, or get fragmented. Fact is that most games use smaller sommunication packets exactly to avoid fragmentation or packet loss.


Would it be beneficial to enable the jumbo frames for LAN traffic and then have the router's WAN MTU set to 1500 (or whatever for a specific internet connection) so that local traffic can make use of them but internet traffic gets sized correctly when it hits the router?


----------



## Offler

For local networking, yes.

For anything going to or from the Internet? Depends on your ISP, but if we are speaking strictly about gaming, there will be no benefit.


----------



## Offler

*Interrupt Moderation / DMA Coalescing*
https://docs.microsoft.com/sk-sk/windows-hardware/drivers/network/interrupt-moderation
https://www.intel.com/content/www/u...007456/network-and-i-o/ethernet-products.html

These are a power saving features on Windows Drivers since NDIS 6.0 (Win Vista and later if i am not mistaken).

Both works in a manner that network card driver waits in pre-defined time before it sends data to CPU(s). Expensive network cards i mentioned allow to define this time (250 microseconds, 500 microseconds, 1-5 miliseconds) and to specify exceptions by used ports.* Good news = even cheap Realtek 8111 allows in its driver to disable it.
*
Benefit when disabled: packet data are sent from NIC to CPU immediatelly, smaller amount of receive buffers needed.
Disadvantage when disabled: Slightly higher power consumption of whole system. Might cause some trouble for notebooks and their battery life.

For gaming and high-performance scenarios I would recommend to disable it.


----------



## Nawafwabs

Where can I buy expensive Network card pcie


----------



## Offler

Nawafwabs said:


> Where can I buy expensive Network card pcie


First check what you have onboard. Even cheap Realtek 8111 is not that bad as I expected in the beginning. 

I210-t1 i am testing is PCI-E 2.0 x1 and costs between 30-50 dollars. Thats quite expensive.

But i have to mention, that purchasing the card was more like about getting better educated about network drivers and their settings. Experience i had is that correct driver settings/tweaking and TCP stack setting is more important than the hardware itself.


----------



## vf-

Offler said:


> First check what you have onboard. Even cheap Realtek 8111 is not that bad as I expected in the beginning.
> 
> *I210-t1 i am testing is PCI-E 2.0 x1 and costs between 30-50 dollars. Thats quite expensive.*
> 
> But i have to mention, that purchasing the card was more like about getting better educated about network drivers and their settings. Experience i had is that correct driver settings/tweaking and TCP stack setting is more important than the hardware itself.


Oh... I thought it was cheap? The expensive ones I thought were starting from £100 plus then you got the ones with dual 1GB ports and heatsinks at 8x PCIe from £175 and up. Those were the ones I thought were quite expensive.

I do have the same card as you. Had been thinking of getting one of the Intel Pro with the heatsinks but wasn’t sure yet... I had to purchase the I210-t1 because the onboard Broadcom/Ethernet drivers started playing up with the Windows 10 Fall update. While there were no newer drivers. Newest was 2013.


----------



## Offler

Well, its expensive for gaming purpose. 2-4+ port NICs are usually entry/server grade, i210-t1 is sort of "taste" how such hardware works.

Unless you plan to have 2 LAN cables connected to different switches, and use network team in case one of the lines fail... And routers allow to use secondary or even tertiary WAN connection.


----------



## DzillaXx

vf- said:


> Oh... I thought it was cheap? The expensive ones I thought were starting from £100 plus then you got the ones with dual 1GB ports and heatsinks at 8x PCIe from £175 and up. Those were the ones I thought were quite expensive.
> 
> I do have the same card as you. Had been thinking of getting one of the Intel Pro with the heatsinks but wasn’t sure yet... I had to purchase the I210-t1 because the onboard Broadcom/Ethernet drivers started playing up with the Windows 10 Fall update. While there were no newer drivers. Newest was 2013.


You can still get those used on ebay for pretty cheap...


I'm about to make the jump to 10gb with SFP+ and not Ethernet. As 10GB Ethernet is still crazy expensive. While you can a pair of 10GB SFP+ cards for under $50. You can get a Smart Managed 24 port Switch with two SFP+ 10gb ports for $125. And if just going short distance you can get SFP+ patch cords for about 15-20 each, or you can get SFP+ Transceivers with LC connections and you can use Fiber with LC ends at pretty much any length you want (multimode @ 10GB you can go more than 1500ft, Single mode @ 10GB you can go over 6 miles). But I'm not going for Latency Reduction, I care more about throughput to the data server. 



Still I don't think any of this really has any real noticeable effect in terms of Latency in game. The real Latency problems are on the ISP side. Not to say that a poorly configured home network can't make things worse. Someone hogging bandwidth or a crappy router can easily cause problems.


----------



## vf-

Yeah, I saw some of those 10GB Intel cards for say £200 - 400 ish... I really need to get a new router though.


----------



## Offler

DzillaXx said:


> You can still get those used on ebay for pretty cheap...
> 
> 
> I'm about to make the jump to 10gb with SFP+ and not Ethernet. As 10GB Ethernet is still crazy expensive. While you can a pair of 10GB SFP+ cards for under $50. You can get a Smart Managed 24 port Switch with two SFP+ 10gb ports for $125. And if just going short distance you can get SFP+ patch cords for about 15-20 each, or you can get SFP+ Transceivers with LC connections and you can use Fiber with LC ends at pretty much any length you want (multimode @ 10GB you can go more than 1500ft, Single mode @ 10GB you can go over 6 miles). But I'm not going for Latency Reduction, I care more about throughput to the data server.
> 
> 
> 
> Still I don't think any of this really has any real noticeable effect in terms of Latency in game. The real Latency problems are on the ISP side. Not to say that a poorly configured home network can't make things worse. Someone hogging bandwidth or a crappy router can easily cause problems.





vf- said:


> Yeah, I saw some of those 10GB Intel cards for say £200 - 400 ish... I really need to get a new router though.


Thats more a solution for an external storage. Low latency, transfer rates which can go up to 1250 mb/s. With such setting you can run SAN device like its direclty in your PC.


----------



## DzillaXx

Offler said:


> Thats more a solution for an external storage. Low latency, transfer rates which can go up to 1250 mb/s. With such setting you can run SAN device like its direclty in your PC.


I do my own share of fiber for work. As we pretty much install it for customers for the backbone of their networks in new facilities. Cat6 just doesn't have the range, and for what we do you can be a good 500ft+ away from the main office and different nodes. 


You can still use SFP+ 10G Network cards just like standard Ethernet card. Really there isn't any difference really, it is pretty much plug and play. Just different style of connector. Honestly I think we should just forget about Ethernet and just use LC connectors for everything. I honestly think 10G Ethernet is a expensive joke, No reason not to switch to Fiber at this point in time. Cheaper and more stable. But instead of SFP ports, they should just give you an LC port. No point in needing to use a transceiver. As SFP port allows you to use more than just fiber, and more than just one style of Connector. LC is by far the most standard type though. 

112MB/s just isn't enough for me, but it isn't slow either.


----------



## Offler

If you use command "netsh int tcp show global", at the bottom of he report you will find "RFC 1323 timestamps" option at the bottom.

Long explanation here https://tools.ietf.org/html/rfc1323#section-4

Simplified explanation
Its a mechanism which should improve TCP reliability by adding timestamps. Existing 20byte header of a TCP packet will be increased by additional 12 bytes. In such case TCP protocol will disregard certain packets which arrive with old timestamps, which might otherwise break TCP connection.

Many online games are using UDP connection, not TCP connection.

In most cases this option will have little to no effect to online gaming. In case the Server-Client connection suffers from trouble (high retransmission rate), and its using TCP packets, you might consider to enable it, but there are following recommendations.

1. Retransmitted segments are reported high
Use command "netsh interface ipv4 show tcpstats". Be sure that since reboot, you did not opened browser, just the online game you are examining.

2. If "Retransmitted Segments" is showing high amounts and/or "In Errors" are showing high numbers, try to enable it.
(more data including those above can be obtained by command "netstat -s")

In theory it might reduce amount of errors/retransmissions in case when connection to game server is troublesome for some reason. However this is just a mechanism which is helping to lower the impact of the existing connection issue.

If there are 0 retransmissions, and 0 errors, connection to the game server is working fine and the option can remain disabled.

Also its worth noting that TCP timestamps can be used for information gathering:
https://www.scip.ch/en/?labs.20150305


----------



## EniGma1987

I checked on a couple of my downstairs computers that have some X540-T1 NICs in them, it looks like the driver supports up to 16 Receive Side Scaling Queues. Id attach a screenshot but this new forum seems to have broken that ability for me. So if the driver supports that many queue's in its dropdown list it would be best to use that amount? I have an 8 thread CPU, so maybe 8 RSS queues would be better?


----------



## Offler

EniGma1987 said:


> I checked on a couple of my downstairs computers that have some X540-T1 NICs in them, it looks like the driver supports up to 16 Receive Side Scaling Queues. Id attach a screenshot but this new forum seems to have broken that ability for me. So if the driver supports that many queue's in its dropdown list it would be best to use that amount? I have an 8 thread CPU, so maybe 8 RSS queues would be better?


Depending on CPU threads, or rather cores, I would use RSS to be 2 threads less than is total amount of cores.

Anyway X540 has T1 and T2 variant which is 2 port. These are overkill for normal gaming setups.

These cards have obviously different chips compared to i210-T1, and considering the passive cooler I would expect they are designed either for high and constant data transfers, or the chip has a lot of processing power of its own.

Would be interesting to see if it even needs RSS, and if the ping test to nearest network device will decrease ping time below 120-240 microseconds.


----------



## EniGma1987

Offler said:


> Depending on CPU threads, or rather cores, I would use RSS to be 2 threads less than is total amount of cores.
> 
> Anyway X540 has T1 and T2 variant which is 2 port. These are overkill for normal gaming setups.
> 
> These cards have obviously different chips compared to i210-T1, and considering the passive cooler I would expect they are designed either for high and constant data transfers, or the chip has a lot of processing power of its own.
> 
> Would be interesting to see if it even needs RSS, and if the ping test to nearest network device will decrease ping time below 120-240 microseconds.



Ya my server has a X540-T2 variant for one of its NICs and the downstairs computers use the -T1 variants. In normal ping tests I always see "<1ms" as a result, do you know of a program I should usew to give the real result in either micro seconds or nano seconds? If you know something I can use for that ill also show some results from the infiniband setup my gaming computer uses.


----------



## Offler

EniGma1987 said:


> Ya my server has a X540-T2 variant for one of its NICs and the downstairs computers use the -T1 variants. In normal ping tests I always see "<1ms" as a result, do you know of a program I should usew to give the real result in either micro seconds or nano seconds? If you know something I can use for that ill also show some results from the infiniband setup my gaming computer uses.


I use Wireshark. Just do ping to switch and it captures packets and add timestamps. That was actually a reason why i have decided to go rather with Intel card over Killer card.


----------



## vf-

What do people set the receive and transmit buffers since default is 256/512.


----------



## EniGma1987

vf- said:


> What do people set the receive and transmit buffers since default is 256/512.


The default for my card was 512/1024. Seeing as this is only selecting how much RAM to commit to a buffer to help performance, I maxed mine out as high as the driver would allow. Which was 4096/8192 I saw a jump of about 50MB RAM for network tasks, which now days isnt that much


----------



## vf-

Thing is, I’ve read mixed reviews about that. Some say max it out for local network performance while keep it low for gaming. Nothing confirms it though.


----------



## Offler

vf- said:


> What do people set the receive and transmit buffers since default is 256/512.


a) Higher LAN speeds while transferring a lot of data, would utilize more buffers
This is more important if using Jumbo Packets, or Tx Bursting over Wifi, but both features are recommended to disable.

b) If DMA coalescing and Interrupt moderation are disabled, packets should be processed by system almost immediatelly.
I recommended those features because of latency benefit. If those are enabled, waiting in the process increases buffers utilization.

c) Gaming is not utilizing the buffers that much as the data transfers are few kilobytes per second.


Higher amount of buffers definitely help to every transfers, or in case of multiple networking tasks at the same time. Unless you want to spare every single megabyte of RAM, you dont need to decrease them, also you might encounter buffer overflow in case you want to play and download at the same time - that would be at cost of latency.

Therefore there is no need to decrease their size, if you already disabled Jumbo Packets, TX Bursting, DMA Coalescing, Interrupt moderation and similar features.


----------



## EniGma1987

Offler said:


> Depending on CPU threads, or rather cores, I would use RSS to be 2 threads less than is total amount of cores.
> 
> Anyway X540 has T1 and T2 variant which is 2 port. These are overkill for normal gaming setups.
> 
> These cards have obviously different chips compared to i210-T1, and considering the passive cooler I would expect they are designed either for high and constant data transfers, or the chip has a lot of processing power of its own.
> 
> Would be interesting to see if it even needs RSS, and if the ping test to nearest network device will decrease ping time below 120-240 microseconds.



I finally got around to checking the ping time between devices over wireshark. Looks like pin is 38 microseconds on the X540 NICs on my 10 gigabit network. That is a ping from a computer downstairs up to the switch in an upstairs closet, to the router, and then back.










I tried checking the ping time on my much more powerful network that uses Infiniband "NICs", as they advertise latency in the nanoseconds. Unfortunately wireshark will not see those NICs so it cant capture traffic on them. My switch says it has port to port latency of less than 100ns though. One cool thing that these ConnectX-3 cards have is an option for choosing further configuration of receive side scaling. I have the option of choosing to use the closest processor, a "conservative scaling", or a NUMA node. You can also tell it which processor to default to in multi-CPU systems (as these are meant for servers), and it supports up to 64 processors for RSS. That is actual processors, not the queues. It supports up to 512 thread queues on the ConnectX-3 NICs. haha. For interrupt moderation, these also have further options. While I bet turning off the moderation provides the best latency, on very fast networks (like the 10 gig and higher I have) you get a whole lot of interrupts when traffic is pegging the network limits. So interrupt moderation/DMA coalescing is a bit useful to limit how often an interrupt occurs. These cards have a driver option to set independent receive and transmit interrupt settings, with options for aggressive moderation (which I am betting tries to pack as many packets into a single interrupt as possible to limit the number), a medium setting that is like normal windows, and a low latency mode that does only a small bit of holdup and maintains a maximum delay time that it keeps the packets under.


----------



## Offler

Getting anything SFP based like https://www.alza.sk/hp-ethernet-10-gb-2-port-530sfp-adapter-d4136325.htm would be extreme overkill  But there will be latency benefit for sure. But yeah, having ping to closest network device in nanoseconds helps, yet as the time fractions get smaller and smaller, we are getting into same area as certain myths about high FPS/high refresh rates of displays.

Client Pc>ISP would always cause the highest delay in latency.


----------

