Jump to content

  •  

Bug Tracker Migration

June 3rd
Good news everyone! The staff has decided that it is time to slowly kill off this Bug Tracker. We will begin the process of slowly migrating from this Bug Tracker over to our Github Issues which can be found here: https://github.com/HerculesWS/Hercules/issues

Over the next couple of days, I will be closing off any opportunity to create new reports. However, I still will keep the opportunity to reply to existing Bug Reports. Doing this will allow us to slowly fix any bug reports we have listed here so that we can easily migrate over to our Issue Tracker.

Update - June 7th 2015: Creating new bug posts has been disabled. Please use our https://github.com/HerculesWS/Hercules/issues tracker to post bugs. Users are still able to reply to existing bug posts.

- Administration

Issue Information

  • #006123

  • 4 - High

  • Fixed

Issue Confirmations

  • Yes (1)No (0)
Photo

battle_delay_damage_sub do_timer Segmentation fault

Posted by Hercules Bot on 27 June 2012 - 01:55 PM

Originally posted by Cookie
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
    tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
206                     if( id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
    tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
        dat = 0x7f3639e59ac8
        target = 0x7f363d886cec
#1  0x0000000000586daf in do_timer (tick=1672996460) at timer.c:370
        tid = 4528
        diff = 0
        __FUNCTION__ = "do_timer"
#2  0x000000000058432c in main (argc=1, argv=0x7fff65b18cb8) at core.c:300
        next = <value optimized out>
I've dug through the source (I've got decent C experience), and I'm completely stumped on why this is happening. Here's the core dump and I'll provide anything else necessary.

Originally posted by Ind

he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.

not sure about the freeblock_lock stuff, but a dangling pointer should not be dereferenced, period. it may crash at any given time, and not crash at others, which makes it unpredictable. the segfault clearly resulted from it. it's also different between OSes/compilers/whatnot, and  generally unpredictable.

thats what the whole lock stuff is for, when it breaks (or say it wasn't available) these pointer checks would be required all over the place.

Originally posted by Wildcard

thats what the whole lock stuff is for, when it breaks (or say it wasn't available) these pointer checks would be required all over the place.


said lock is not active when that first check happens, though (obviously, since it's "asyncronous", evoked by a timer) dat->src does get free'd whenever the disconnect code is done cleaning up. if it has not yet been free'd, it would not contain garbage data, either. maybe I'm still missing something but the more I look at it, the more I think not :D

Originally posted by Ind
oh, you're absolutely right. whops /me shame

Originally posted by Ind
Should be fully fixed in [rev=16359]

Originally posted by Wildcard
..I'm not sure. My point was we need more of these mapid lookups, not less :D
From my understanding, for any bl somehow passed to a timer, we need to perform a lookup, or run the risk of dereferencing a dangling pointer, and subsequently the possibility of an access violation/segfault.
For that matter, we might as well never pass a bl* directly to a timer, but instead always just save the ID and perform said lookup, just for clarity's sake. As of r16359, I can make both dat->src and dat->target point to garbage memory, which is then dereferenced and should mean it can potentially crash at any time. Pending confirmation from users here. Seems to be easier to reproduce on *nixes. Another curious thing I noticed regarding this, the old code (pre-15314) also checked against the case that the map ID had been re-assigned. Not entirely sure of the implications here, though.

Bottom line, lets see if there are any more crashes :D

Originally posted by Cookie
Wildcard, you're right. Without the map_id2bl(id) == dat->src in the if statement, it still crashes. XD Didn't take long for my server to BOOM!

Originally posted by Cookie
/debug/lib64/libnss_files-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files-2.12.so
Core was generated by `./map-server_sql'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000542b74 in battle_delay_damage_sub (tid=<value optimized out>,
	tick=2654204114, id=110012417, data=139743418977048) at battle.c:204
204		if ( dat && dat->target && map_id2bl(id) == dat->src && dat->target->prev != NULL && !status_isdead(dat->target) ) {
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x0000000000542b74 in battle_delay_damage_sub (tid=<value optimized out>,
	tick=2654204114, id=110012417, data=139743418977048) at battle.c:204
		dat = 0x7f188cd90b18
#1  0x00000000005870cf in do_timer (tick=2654204123) at timer.c:370
		tid = 2608
		diff = -9
		__FUNCTION__ = "do_timer"
#2  0x000000000058464c in main (argc=1, argv=0x7fff96141868) at core.c:300
		next = <value optimized out>


Any ideas? I applied the revision above too.

Edited by Cookie, 08 July 2012 - 10:24 PM.


Originally posted by Ind
that is modified. use the latest code available in battle.c

Originally posted by Cookie
Reproduced the issue again on Divinity. With around 400 online, someone reproduced the issue. I've added the check that prevents this in [rev=16476] thanks to WildCard! :)

Originally posted by Ind
pointers are never deferenced in rathena (unless you manually var = null), they end up with random/corrupted memory (never null after being afreed) -- I'm not saying it couldn't have fixed your crashing, however something else should be crashing it (within the bounds of the damage delay system). also r16476 broke the official behavior on reflect shield.