Jump to content

  •  

Bug Tracker Migration

June 3rd
Good news everyone! The staff has decided that it is time to slowly kill off this Bug Tracker. We will begin the process of slowly migrating from this Bug Tracker over to our Github Issues which can be found here: https://github.com/HerculesWS/Hercules/issues

Over the next couple of days, I will be closing off any opportunity to create new reports. However, I still will keep the opportunity to reply to existing Bug Reports. Doing this will allow us to slowly fix any bug reports we have listed here so that we can easily migrate over to our Issue Tracker.

Update - June 7th 2015: Creating new bug posts has been disabled. Please use our https://github.com/HerculesWS/Hercules/issues tracker to post bugs. Users are still able to reply to existing bug posts.

- Administration

Issue Information

  • #000287

  • 3 - Medium

  • Started

Issue Confirmations

  • Yes (4)No (0)
Photo

Packet Queueing And Castle Data Loading

Posted by Hercules Bot on 24 October 2007 - 01:47 AM

Originally posted by theultramage
http://www.eathena.w...ker&showbug=287

I added a safeguard against send buffer overload (too many packets queued at once causing increased memory usage, etc), and this uncovered a very bad bug...

According to network traffic traces, this is what happens during startup...

1. The agit init script calls 'getcastledata(..., 0, ...).
getcastledata normally just returns a piece of information that's already loaded, but for some mysterious reason (probably stupidity), case 0 performs a charserver data load request instead - ONE F*CKING ATTRIBUTE AT A TIME
for(i=1;i<26;i++) // Initialize[AgitInit]
guild_castledataload(gc->castle_id,i);
As to why castle data loading isn't done automatically at initial charserver connect but instead the script engine has to request it, I'll leave that for future examination. The point is this packet spam.

2. After loading is done, the event provided with 'getcastledata' is executed for each callee (=> castle), which is...
OnRecvCastleN01:
RequestGuildInfo GetCastleData("nguild_alde",1);
which ends up with a 6-byte packet 0x3031 request (guild data lookup).

3. Charserver processes packet, and builds packet 0x3831, which contains a complete dump (!) of a guild object = 13kB, and shoves it into the send queue.
Now, this part is really dumb, because when a guild owns more than 1 castle, the data gets sent multiple times...

The result is that nCastles x 13K bytes gets shoved into the char->map pipeline at once (w/ redundancy). This, coupled with extra unsent stuff and increased castle count to 50 (or so I assume) causes a buildup of more than 1 MB of data, which the socket code will flag as overload and the connection will get killed.


In function WFIFOSET, there was once
if (s->wdata_size > (TCP_FRAME_LEN))
send_from_fifo(fd);
that actually forced the server to not buffer anything past 1 kilobyte. I nuked that part some time ago, and this is the result... I guess rethinking the strategy is in order...

Originally posted by Gepard
This is actually far worse than described.

There is a huge redundancy when receiving castle info and guild info:
  • When map-server logs into char-server, two things happen:
    • char-server sends all castles to map-server in 0x3842 packet
    • map-server executes all OnInterIfInit/OnInterIfInitOnce script events
  • 0x3842 packet is parsed and all castle data is updated, and then map-servers requests guild info for each castle (for the last occupied, it does it twice, and also enqueues ::OnAgitInit and ::OnAgitInit2 events in guildinfoevent_db) with 0x3031 packets
    • char-server responds with 0x3831 packets (len=11324); as soon as map-server receives info for the guild of the last occupied castle, it triggers the enqueued events ::OnAgitInit and ::OnAgitInit2.
  • Meanwhile OnInterIfInitOnce events in WoE scripts do similar stuff again:
    • first, GetCastleData causes spam of 0x3040 packets (asking one attribute at time, totaling 17 packets for each of 34 castles = 578 packets with info that has already been sent in 0x3842)
    • for each castle an event ::OnRecvCastle is enqueued in castleinfoevent_db); as soon as map-server receives info for the last requested (17th) castle attribute, it triggers the event
    • the event causes RequestGuildInfo, which in turn causes 0x3031 packet to be sent again to char-server
It turns out that both script and source code do the similar job loading castle info and guild info from char-server.

I suppose it would be sufficient if source code did all the loading by itself, while script did only event enqueueing part, so futher parts of WoE script (like guardian spawning) would be executed when map-server is done with loading data from char-server.

Originally posted by Gepard
#1 and #2 from original report has been fixed in [rev=15657].

Related issues that has not been mentioned yet:
  • some pieces of guild castle data are updated via char-server (ie. map -> (request) -> char -> (ack) -> map), and some are updated on map-server immediately and only then sent to char-server
    • it may cause inconsistent state when char-server is offline
    • note: denying guild castle changes when char-server is offline looks like a bad idea, as it could interfere with WoE scripts, effectively breaking WoE gameplay in case of map-char disconnection
    • because of that, guild castles need some mechanics of autosaving or saving on reconnect
  • since [rev=15657] map-servers are not aware of guild castles located on other map-servers, which may cause issues with:
    • number of castles owned displayed in client's guild window
    • max_guild_castles setting
    • I suggest putting this piece of information into guild structure so it is available on all map-servers


Originally posted by Gepard
Issues with incosistent saving and saving after reconnect to char-server fixed in [rev=15658].

Originally posted by Vali
Hi,

The problem now with "map-server now requests data for all guild castles from char-server on initial connect (bugreport:287)" is that the scripts that use GetCastleData(strnpcinfo(2),1); with OnInterIfInitOnce: or OnInterIfInit: will not get that information correctly because the map server do not has it.

The possible solution is request the information before the scripts are loaded.

Vali~

Originally posted by Gepard
Scripts that use castle data should start on OnAgitInit or OnAgitInit2 labels, not OnInterIfInit.

See [rev=15657/trunk/npc/guild/agit_template.txt] for reference.

Originally posted by Vali
I was talking more about custom scripts. If there is not option to get it at the start then the only way is put a sleep and wait until the mapserver finished to load the castle data.

Vali~

Originally posted by Gepard
All castle data is loaded automatically at the initial connection between map-server and char-server (so there is no need to do manual requests on OnInterIfInitOnce).
Once castle data (and guild data for the castle owners) is loaded, map-server triggers OnAgitInit labels in all scripts.

There is no way (and never was) to have castle data available at map-server startup. You need to wait until it's loaded from char-server. Now it is done automatically, and all you have to do is to make sure that custom scripts that use castle data start with OnAgitInit label.

Originally posted by Vali
I understand. Thank you for the explanation Gepard.

Originally posted by karazu
so how to fix it vali? hehehe!