Issue Information
-
#006123
-
4 - High
-
Fixed
Issue Confirmations
-
Yes (1)No (0)
0
battle_delay_damage_sub do_timer Segmentation fault
Posted by Hercules Bot on 27 June 2012 - 01:55 PM
Originally posted by Cookie
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
206 if( id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
dat = 0x7f3639e59ac8
target = 0x7f363d886cec
#1 0x0000000000586daf in do_timer (tick=1672996460) at timer.c:370
tid = 4528
diff = 0
__FUNCTION__ = "do_timer"
#2 0x000000000058432c in main (argc=1, argv=0x7fff65b18cb8) at core.c:300
next = <value optimized out>
I've dug through the source (I've got decent C experience), and I'm completely stumped on why this is happening. Here's the core dump and I'll provide anything else necessary.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
206 if( id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
dat = 0x7f3639e59ac8
target = 0x7f363d886cec
#1 0x0000000000586daf in do_timer (tick=1672996460) at timer.c:370
tid = 4528
diff = 0
__FUNCTION__ = "do_timer"
#2 0x000000000058432c in main (argc=1, argv=0x7fff65b18cb8) at core.c:300
next = <value optimized out>
I've dug through the source (I've got decent C experience), and I'm completely stumped on why this is happening. Here's the core dump and I'll provide anything else necessary.
Originally posted by Ind
thats what the whole lock stuff is for, when it breaks (or say it wasn't available) these pointer checks would be required all over the place.not sure about the freeblock_lock stuff, but a dangling pointer should not be dereferenced, period. it may crash at any given time, and not crash at others, which makes it unpredictable. the segfault clearly resulted from it. it's also different between OSes/compilers/whatnot, and generally unpredictable.he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.
Originally posted by Wildcard
said lock is not active when that first check happens, though (obviously, since it's "asyncronous", evoked by a timer) dat->src does get free'd whenever the disconnect code is done cleaning up. if it has not yet been free'd, it would not contain garbage data, either. maybe I'm still missing something but the more I look at it, the more I think not
thats what the whole lock stuff is for, when it breaks (or say it wasn't available) these pointer checks would be required all over the place.
said lock is not active when that first check happens, though (obviously, since it's "asyncronous", evoked by a timer) dat->src does get free'd whenever the disconnect code is done cleaning up. if it has not yet been free'd, it would not contain garbage data, either. maybe I'm still missing something but the more I look at it, the more I think not
Originally posted by Ind
oh, you're absolutely right. whops /me shame
oh, you're absolutely right. whops /me shame
Originally posted by Ind
Should be fully fixed in [rev=16359]
Should be fully fixed in [rev=16359]
Originally posted by Wildcard
..I'm not sure. My point was we need more of these mapid lookups, not less
From my understanding, for any bl somehow passed to a timer, we need to perform a lookup, or run the risk of dereferencing a dangling pointer, and subsequently the possibility of an access violation/segfault.
For that matter, we might as well never pass a bl* directly to a timer, but instead always just save the ID and perform said lookup, just for clarity's sake. As of r16359, I can make both dat->src and dat->target point to garbage memory, which is then dereferenced and should mean it can potentially crash at any time. Pending confirmation from users here. Seems to be easier to reproduce on *nixes. Another curious thing I noticed regarding this, the old code (pre-15314) also checked against the case that the map ID had been re-assigned. Not entirely sure of the implications here, though.
Bottom line, lets see if there are any more crashes
..I'm not sure. My point was we need more of these mapid lookups, not less
From my understanding, for any bl somehow passed to a timer, we need to perform a lookup, or run the risk of dereferencing a dangling pointer, and subsequently the possibility of an access violation/segfault.
For that matter, we might as well never pass a bl* directly to a timer, but instead always just save the ID and perform said lookup, just for clarity's sake. As of r16359, I can make both dat->src and dat->target point to garbage memory, which is then dereferenced and should mean it can potentially crash at any time. Pending confirmation from users here. Seems to be easier to reproduce on *nixes. Another curious thing I noticed regarding this, the old code (pre-15314) also checked against the case that the map ID had been re-assigned. Not entirely sure of the implications here, though.
Bottom line, lets see if there are any more crashes
Originally posted by Cookie
Wildcard, you're right. Without the map_id2bl(id) == dat->src in the if statement, it still crashes. XD Didn't take long for my server to BOOM!
Wildcard, you're right. Without the map_id2bl(id) == dat->src in the if statement, it still crashes. XD Didn't take long for my server to BOOM!
Originally posted by Cookie
Any ideas? I applied the revision above too.
/debug/lib64/libnss_files-2.12.so.debug...done. done. Loaded symbols for /lib64/libnss_files-2.12.so Core was generated by `./map-server_sql'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000542b74 in battle_delay_damage_sub (tid=<value optimized out>, tick=2654204114, id=110012417, data=139743418977048) at battle.c:204 204 if ( dat && dat->target && map_id2bl(id) == dat->src && dat->target->prev != NULL && !status_isdead(dat->target) ) { Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64 (gdb) bt full #0 0x0000000000542b74 in battle_delay_damage_sub (tid=<value optimized out>, tick=2654204114, id=110012417, data=139743418977048) at battle.c:204 dat = 0x7f188cd90b18 #1 0x00000000005870cf in do_timer (tick=2654204123) at timer.c:370 tid = 2608 diff = -9 __FUNCTION__ = "do_timer" #2 0x000000000058464c in main (argc=1, argv=0x7fff96141868) at core.c:300 next = <value optimized out>
Any ideas? I applied the revision above too.
Edited by Cookie, 08 July 2012 - 10:24 PM.
Originally posted by Ind
that is modified. use the latest code available in battle.c
that is modified. use the latest code available in battle.c
Originally posted by Cookie
Reproduced the issue again on Divinity. With around 400 online, someone reproduced the issue. I've added the check that prevents this in [rev=16476] thanks to WildCard!
Reproduced the issue again on Divinity. With around 400 online, someone reproduced the issue. I've added the check that prevents this in [rev=16476] thanks to WildCard!
Originally posted by Ind
pointers are never deferenced in rathena (unless you manually var = null), they end up with random/corrupted memory (never null after being afreed) -- I'm not saying it couldn't have fixed your crashing, however something else should be crashing it (within the bounds of the damage delay system). also r16476 broke the official behavior on reflect shield.
pointers are never deferenced in rathena (unless you manually var = null), they end up with random/corrupted memory (never null after being afreed) -- I'm not saying it couldn't have fixed your crashing, however something else should be crashing it (within the bounds of the damage delay system). also r16476 broke the official behavior on reflect shield.