Go back to previous topic
Forum Name Gameplay
Topic subjectmud performance
Topic URLhttps://forums.carrionfields.com/dcboard.php?az=show_topic&forum=6&topic_id=60868
60868, mud performance
Posted by Isildur on Wed 31-Dec-69 07:00 PM
So I noticed the game feeling kind of laggy today despite my ping times looking normal. This made me think maybe the delay is internal to the cf server and not actually a networking issue.

I ran that little "CF ping" script I cooked up a couple weeks ago, which just issues a "nod" command and then times how long it takes before my client sees the "You nod" response. The average was around ~250ms and most of the times were right around there. However, some were considerably shorter (~100ms) and some considerably longer (~500ms).

My round-trip network latency is 35ms, so at a bare minimum my "cf ping" will be at least 35ms.

Internally the game employs some system of pulses so things don't happen "immediately" once a command is received. I get that. My "nod" could arrive just prior to a checkpoint or it could arrive just after a checkpoint, in which case I gather it would have to wait until the next checkpoint before being processed. There's probably also some sort of queuing system, so it's possible it could miss that one as well if there are too many commands queued up.

But when a command like "nod" sometimes takes only ~100ms to get a response but occasionally takes ~500ms...that makes me think something wonky is going on.

Any thoughts? Am I just way off?

btw, please don't take this post as a gripe. That is absolutely not my intent. I'm just trying to help out. If the above is inaccurate to the point of being unhelpful than I apologize in advance for trying to "armchair diagnose".
60895, Inconsistent processing times were present in the past
Posted by -flso on Wed 31-Dec-69 07:00 PM
when Zulg was around, right before he left really, I did extensive tests at different times of day with different number of people logged in.

When there were few/no people logged in, CF was *flying*. Movement was ultra-smooth and everything felt very fluid and *consistent*, timing wise.

When 30+ people were logged in, there was obvious jitter in everything but one could immediately see it in movement lag (if was used to the previous condition).

My ping time to CF during those periods was also 100% stable (below 50ms).

I think back then, CF was running on shared hosting (so other #### running on the same machine by user accounts not related to CF may have affected performance), maybe that is no longer the case today so if performance issues remain, they must be caused by CF itself.


The default pulse time for stock ROM is 250ms AFAIK (4 pulses per second).
Assuming CF pulse time is 250ms, if everything is running within spec,
one should never see a command take *more* than 250+network roundtrip to CF
from the time he sends the packet to the time he gets the response.

That number can jitter as long as it stays below 250+roundtrip. If it jitters above,
it indicates that the CF process can't schedule 4 pulses in one second. The higher it jitters (say 1second or more that I've seen in some cases) the more loaded the CF process is and the higher the actual pulse time becomes.

Another way (the smart way really) to time this sort of issue in bulk, without doing the timing from inside a mud client is to simply use tcpdump. You don't want to do the timing inside a MUD client cause you're introducing unwanted timing overhead from the MUD client process, the scripts it's running and whatever scripting language it exposes for you to do the timing in. This overhead could be in the order of tens of milliseconds to more than a couple of hundred which would make the results useless.

First, capture the packets (my example ROM server is running at port 4000 and the data packets that my mud client/server send have the PUSH flag set, so i'm only capturing those):

tcpdump -vvni lo0 -X -s 0 -w rom.pcap "port 4000 and tcp\(tcpflags\) & tcp-push != 0"

\( and \) are square brackets.

Next, type "nod" 10 times say in the mud client.
Finally, stop the packet capture and calculate time deltas between packet pairs (request/response).

tcpdump -r rom.pcap |cut -f 3 -d ':' |cut -f 1 -d ' '| tail -r |gpaste -s -d '-\n' | bc

For my example:

.243956
.227105
.230149
.242648
.217974
.153603
.104957
.120465
.161000
.184953

So we can see here that we never go above 250 ms which is right since i'm running this ROM server locally so network lag is negligible.
60896, RE: Inconsistent processing times were present in the past
Posted by Umiron on Wed 31-Dec-69 07:00 PM
We already know that the main "game loop" occasionally takes longer than it should because we time it, but we haven't identified why.
60897, RE: Inconsistent processing times were present in the past
Posted by Isildur on Wed 31-Dec-69 07:00 PM
I would be shocked if the overhead from mud client processing were significant. It doesn't take on the order of 100ms to match a regex.

Re: the delays Umiron mentioned in his reply, I would strongly lean towards something I/O related. Either filesystem or database. But, again, that's just a guess.
60871, RE: mud performance
Posted by Umiron on Wed 31-Dec-69 07:00 PM
It's certainly possible the MUD Is responsible for some additional lag in addition to any possible network-related latency. It's also possible that other applications on the box are too, either because they're consuming resources CF needs (e.g., disk I/O, etc.) or because CF directly depends on them (e.g,. the database).

Some time ago (before my time), Zulg made a list of major functions within the codebase and added some rudimentary profiling code to them that logs how log they took to execute. Looking at that, it would appear that a handful of them are occasionally over the goals that he originally set for them, though I have no idea what criteria went into setting those original goals. Because of the way CF (and most Diku MUDs) work, that can ultimately lead to a variance in the time (in milliseconds) that a "tick" or other regularly schedule event takes. Having said all that, none of these numbers look seriously out of hand.

Obviously when exactly a "nod" goes through will impact how long it takes for the MUD to respond in that if you happen to send it right before an expensive function call (e.g., a tick, but there are others a mortal wouldn't necessarily notice) then it may take longer, though ideally not so much so that it impacts your game play.

If your script is something I can use in TinTin++ or another CLI client, and you want to share it, I would happily run it on a character connected to CF on localhost and thus effectively ruling out inconsistent network speed.

Unfortunately, I don't really have the ability to profile CF under load or live conditions, but I'll try to do some digging on a test port and see what I see.


60877, RE: mud performance
Posted by Isildur on Wed 31-Dec-69 07:00 PM
My guess on the slow ones is some blocking I/O that everything else has a dependency on. Probably db related. Total guess, so take with a huge grain of salt.

Here's the script if you want to try it on the server:

http://forums.carrionfields.com/dc/dcboard.php?az=show_topic&forum=6&topic_id=60652&mesg_id=60652&page=2#60657

I wrote it for tintin++ so it should work. Enter "cfping" to start it running and "off" to stop. Right now it just echos each round-trip time but doesn't compute any stats.
60880, RE: mud performance
Posted by Umiron on Wed 31-Dec-69 07:00 PM
The difference between my laptop and the CF server was negligible. I only ran it for a few minutes on each session, but in that time I wasn't able to see the bigger discrepancies you were describing.

I'll dig a little more as I have time.
60881, RE: mud performance
Posted by Isildur on Wed 31-Dec-69 07:00 PM
In the mean time, I'll try to modify the script to compute min/max/mean/median/stdev. Probably just store all the times in a list and compute the stats when it's switched off. With the understanding that if you let it run too long it'll blow up tintin++.
60890, RE: mud performance
Posted by Isildur on Wed 31-Dec-69 07:00 PM
Here's a version that doesn't output each time but does output
the min, max, average, median and count when you switch it
off.

#ali {cfping} {#act {You nod.} {#format {endtime} {%U};#math
{duration} {$endtime - $starttime};#math {duration} {$duration
/ 1000};#list {times} {add} {$duration};#format {starttime}
{%U};nod};#ali {off} {#unact {You nod.};#unali {off};#unvar
{starttime};#unvar {endtime};#unvar {duration};#var {min}
{9999};#var {max} {0};#var {avg} {0};#list {times} {size}
{size};#math {halfsize} {$size / 2};#forall {$times[%*]} {#if
{&0 < $min} {#var {min} {&0}};#if {&0 >
$max} {#var {max} {&0}};#math {avg} {$avg + &0}};#list
{times} {sort};#math {avg} {$avg / $size};#math {med}
{$times[$halfsize]};#showme {min: $min ms,  max $max ms,  avg:
$avg ms,  med: $med ms,  count: $size}#unvar {min};#unvar
{max};#unvar {avg};#unvar {med};#unvar {size};#unvar
{halfsize};#unvar {times}};#unvar {times};#format {starttime}
{%U};nod}


Here's my output from just now:

min: 101 ms,  max 1373 ms,  avg: 259 ms,  med: 255 ms,  count:
594

I may go back and add standard deviation but doing math in
tintin++ is kind of a pain.
60892, RE: mud performance
Posted by KaguMaru on Wed 31-Dec-69 07:00 PM
I'm guessing there's no square root function, but variance is
just E(x)^2 - E(x^2) and if you could get E(x) there's no
reason you can't get those two quantities. Standard deviation
is then just the square root of the variance.
60884, i've been getting disconnected a lot
Posted by Dallevian on Wed 31-Dec-69 07:00 PM
about 2 to 3 times an hour it will totally disconnect

it doesn't matter if i'm mudding from my mac at home or from wifi on my work computer during lunch, same thing is happening

it isn't impacting other people, just some, from what i've noticed on the 'who' list

could it be from what Isildur is experiencing, high latency at periodic times that makes my session think it lost connection?
60888, RE: i've been getting disconnected a lot
Posted by Isildur on Wed 31-Dec-69 07:00 PM
If you're getting completely disconnected then it's almost surely a network thing. And if other people *aren't* getting disconnected then it's probably something happening "out there on the internet". That said, it's possible the hosting company is at fault.