Issue Information
-
#006123
-
4 - High
-
Fixed
Issue Confirmations
-
Yes (1)No (0)
battle_delay_damage_sub do_timer Segmentation fault
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
206 if( id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0 0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
dat = 0x7f3639e59ac8
target = 0x7f363d886cec
#1 0x0000000000586daf in do_timer (tick=1672996460) at timer.c:370
tid = 4528
diff = 0
__FUNCTION__ = "do_timer"
#2 0x000000000058432c in main (argc=1, argv=0x7fff65b18cb8) at core.c:300
next = <value optimized out>
I've dug through the source (I've got decent C experience), and I'm completely stumped on why this is happening. Here's the core dump and I'll provide anything else necessary.
dat->src most likely null for some super weird reason o_O the following probably fixes
if( id == dat->src->id &&to
if( dat->src && id == dat->src->id &&i wouldn't call this a final fix though since dat->src being broken means something else is breaking it
Ind, my friend, you're awesome. Thanks for the quick fix. I'll apply it and keep digging around on my end. I'm curious as to what else is breaking it. I just moved to rAthena... I figure I'd break something. lol.
EDIT:
Not sure if this is relate-able, however, I'm receiving these errors in the console:
[Error]: map_freeblock_timer: block_free_lock(2) is invalid. [Error]: map_freeblock_timer: block_free_lock(1) is invalid. [Error]: map_freeblock_timer: block_free_lock(3) is invalid.I'm going to look dig into the source more. The above map_free error isn't crashing me, fyi.
Thanks a lot.
Edited by Cookie, 27 June 2012 - 03:23 PM.
these errors means the block lock reference counter is mismatching at some point; its not related to the snippet i provided you above however since it doesn't keep any locks unchanged
Thanks for the info.
Program terminated with signal 11, Segmentation fault. #0 0x000000000054285c in battle_delay_damage_sub (tid=<value optimized out>, tick=1713031938, id=2014330, data=140004230941104) at battle.c:206 206 if( dat->src && id == dat->src->id && Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64 (gdb) bt full #0 0x000000000054285c in battle_delay_damage_sub (tid=<value optimized out>, tick=1713031938, id=2014330, data=140004230941104) at battle.c:206 dat = 0x7f554673c9b0 target = 0x7f554650ecfc #1 0x0000000000586dbf in do_timer (tick=1713031943) at timer.c:370 tid = 1060 diff = -5 __FUNCTION__ = "do_timer" #2 0x000000000058433c in main (argc=1, argv=0x7ffffa0448d8) at core.c:300 next = <value optimized out> (gdb)
Any ideas?
With installed debug infos, here's a better error dump.
Program terminated with signal 11, Segmentation fault. #0 0x000000000054285c in battle_delay_damage (tick=0, amotion=2014330, src=0x7f554650ecfc, target=0x7f554673c9b0, attack_type=1713032137, skill_id=1713031938, skill_lv=-100374869, damage=2376, dmg_lv=4194584792, ddelay=1) at battle.c:263 263 if (src->type != BL_PC && amotion > 1000) Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64 (gdb) bt full #0 0x000000000054285c in battle_delay_damage (tick=0, amotion=2014330, src=0x7f554650ecfc, target=0x7f554673c9b0, attack_type=1713032137, skill_id=1713031938, skill_lv=-100374869, damage=2376, dmg_lv=4194584792, ddelay=1) at battle.c:263 dat = <value optimized out> sc = <value optimized out>I've altered 263 to reflect -
if (dat->src && src->type != BL_PC && amotion > 1000)I will see what happens. Let me know if you have any ideas of what it is.
Before I added the above change, I received this crash:
Program terminated with signal 11, Segmentation fault. #0 mob_log_damage (md=0x7f0898dfaafc, src=0x7f089726ce34, damage=153) at mob.c:2157 2157 char_id = hd->master->status.char_id; Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64 (gdb) bt full #0 mob_log_damage (md=0x7f0898dfaafc, src=0x7f089726ce34, damage=153) at mob.c:2157 hd = 0x7f089726ce34 char_id = 0 flag = 1 #1 0x00000000004b1910 in mob_damage (md=0x7f0898dfaafc, src=0x7f089726ce34, damage=<value optimized out>) at mob.c:2268 No locals. #2 0x000000000048f055 in status_damage_ (src=0x7f089726ce34, target=0x7f0898dfaafc, hp=<value optimized out>, sp=0, walkdelay=576, flag=0, skill=0) at status.c:1190 status = 0x7f0898dfad9c sc = 0x7f0898dfadf4 __FUNCTION__ = "status_damage_" #3 0x00000000005429ad in battle_delay_damage_sub (tid=<value optimized out>, tick=1718001439, id=110027904, data=139674899129880) at battle.c:212 dat = 0x7f0898bf2618 target = 0x7f0898dfaafc #4 0x0000000000586e6f in do_timer (tick=1718001446) at timer.c:370 tid = 4634 diff = -7 __FUNCTION__ = "do_timer" #5 0x00000000005843ec in main (argc=1, argv=0x7fff480916e8) at core.c:300 ---Type <return> to continue, or q <return> to quit--- next = <value optimized out>
Side note, I'm running a server with 200+ players. So, it's hard to reproduce some of this unless it happens live. :x
Edited by Cookie, 28 June 2012 - 02:07 AM.
I think the lock mismatch issue is what is giving your problems; you should check anywhere in your custom code where it uses map_freeblock_lock/unlock -- the issue being the mismatch causes the system to delete units before they're done being used by some functions, which ends up causing these weird crashes, in my opinion.
and by the way this what you did here
if (dat->src && src->type != BL_PC && amotion > 1000)should at be
if (src && src->type != BL_PC && amotion > 1000)
you should go after the lock problem though, because it'll break code over many other places oo not only these (as you see mob_log_damage also joined the crew)
I haven't really touched the source, to be honest, since I moved to rAthena. :/ Nothing major, besides re-adding bindatcmd to the atcommand.c. What should I chase in my NPC scripts? I've got a helluva of modifications npc-wise.
ah i thought you had a highly customized source or something, as soon as I read your last post I jumped to the code editor and have gone through all of the map_freeblock_lock/unlock instances and found 2 mismatches. one related to a status change of a elemental (sorc summon) skill and other to RETURN_TO_ELDICASTES and ALL_GUARDIAN_RECALL skills. The fixes are available in [rev=16349] please let me know if the crashes stop or continue.
#0 0x000000000054290c in battle_delay_damage_sub (tid=<value optimized out>, tick=1736408851, id=2009430, data=140350851725744) at battle.c:206 206 if( dat->src && id == dat->src->id && Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64 (gdb) bt full #0 0x000000000054290c in battle_delay_damage_sub (tid=<value optimized out>, tick=1736408851, id=2009430, data=140350851725744) at battle.c:206 dat = 0x7fa5faa939b0 target = 0x7fa5e3e57714 #1 0x0000000000586e6f in do_timer (tick=1736408851) at timer.c:370 tid = 1782 diff = 0 __FUNCTION__ = "do_timer" #2 0x00000000005843ec in main (argc=1, argv=0x7fff04e0f9f8) at core.c:300 next = <value optimized out>
Same thing, again. Any ideas?
Edited by Cookie, 28 June 2012 - 07:06 AM.
I've used my editor to go through every instance of map_freeblock_stuff within the source as it can be seen in the screen below, I've gone through everyone of them more than once and didn't find any possible mismatching case. scary stuff.
[Error]: map_freeblock_timer: block_free_lock(1) is invalid.
On the console again.
Sorry, it double posted.
Just a FYI, I also did a search like you did on my source (while I do have a few edits they don't affect these areas) and all checked out... it seems. Didn't find anything returning without being unlocked, et cetera. :/ Still getting that console error... scary stuff (as Ind says, XD)
Edited by Cookie, 28 June 2012 - 11:17 AM.
probably r15314:
the check removed there
map_id2bl(id) == dat->srcused to ensure that the source pointer is not dangling. delayed reflect, source logging out, boom. null checks won't help there.
Pretty sure that's it, but better look again to make sure
Makes sense. I'll test this out.
[Error]: map_freeblock_timer: block_free_lock(1) is invalid.
We're getting more of them in the console all showing (1).
if ( target && dat && target->prev != NULL && !status_isdead(target) ) { if( dat->src && map_id2bl(id) == dat->src && id == dat->src->id &&Edit I made.
Edited by Cookie, 28 June 2012 - 01:59 PM.
he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.probably r15314:the check removed there
map_id2bl(id) == dat->srcused to ensure that the source pointer is not dangling. delayed reflect, source logging out, boom. null checks won't help there.Pretty sure that's it, but better look again to make sure
he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.
not sure about the freeblock_lock stuff, but a dangling pointer should not be dereferenced, period. it may crash at any given time, and not crash at others, which makes it unpredictable. the segfault clearly resulted from it. it's also different between OSes/compilers/whatnot, and generally unpredictable.