Jump to content

  •  

Bug Tracker Migration

June 3rd
Good news everyone! The staff has decided that it is time to slowly kill off this Bug Tracker. We will begin the process of slowly migrating from this Bug Tracker over to our Github Issues which can be found here: https://github.com/HerculesWS/Hercules/issues

Over the next couple of days, I will be closing off any opportunity to create new reports. However, I still will keep the opportunity to reply to existing Bug Reports. Doing this will allow us to slowly fix any bug reports we have listed here so that we can easily migrate over to our Issue Tracker.

Update - June 7th 2015: Creating new bug posts has been disabled. Please use our https://github.com/HerculesWS/Hercules/issues tracker to post bugs. Users are still able to reply to existing bug posts.

- Administration

Issue Information

  • #006123

  • 4 - High

  • Fixed

Issue Confirmations

  • Yes (1)No (0)
Photo

battle_delay_damage_sub do_timer Segmentation fault

Posted by Hercules Bot on 27 June 2012 - 01:55 PM

Originally posted by Cookie
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
    tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
206                     if( id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x0000000000542857 in battle_delay_damage_sub (tid=<value optimized out>,
    tick=1672996460, id=2015079, data=139870876310216) at battle.c:206
        dat = 0x7f3639e59ac8
        target = 0x7f363d886cec
#1  0x0000000000586daf in do_timer (tick=1672996460) at timer.c:370
        tid = 4528
        diff = 0
        __FUNCTION__ = "do_timer"
#2  0x000000000058432c in main (argc=1, argv=0x7fff65b18cb8) at core.c:300
        next = <value optimized out>
I've dug through the source (I've got decent C experience), and I'm completely stumped on why this is happening. Here's the core dump and I'll provide anything else necessary.

Originally posted by Ind
dat->src most likely null for some super weird reason o_O the following probably fixes
if( id == dat->src->id &&
to
if( dat->src && id == dat->src->id &&
i wouldn't call this a final fix though since dat->src being broken means something else is breaking it

Originally posted by Cookie
Ind, my friend, you're awesome. Thanks for the quick fix. I'll apply it and keep digging around on my end. I'm curious as to what else is breaking it. /swt I just moved to rAthena... I figure I'd break something. lol.

EDIT:
Not sure if this is relate-able, however, I'm receiving these errors in the console:
[Error]: map_freeblock_timer: block_free_lock(2) is invalid.
[Error]: map_freeblock_timer: block_free_lock(1) is invalid.
[Error]: map_freeblock_timer: block_free_lock(3) is invalid.
I'm going to look dig into the source more. The above map_free error isn't crashing me, fyi.

Thanks a lot. :)

Edited by Cookie, 27 June 2012 - 03:23 PM.


Originally posted by Ind
these errors means the block lock reference counter is mismatching at some point; its not related to the snippet i provided you above however since it doesn't keep any locks unchanged

Originally posted by Cookie
Thanks for the info. :)

Originally posted by Cookie
Program terminated with signal 11, Segmentation fault.
#0  0x000000000054285c in battle_delay_damage_sub (tid=<value optimized out>,
	tick=1713031938, id=2014330, data=140004230941104) at battle.c:206
206			if( dat->src && id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x000000000054285c in battle_delay_damage_sub (tid=<value optimized out>,
	tick=1713031938, id=2014330, data=140004230941104) at battle.c:206
		dat = 0x7f554673c9b0
		target = 0x7f554650ecfc
#1  0x0000000000586dbf in do_timer (tick=1713031943) at timer.c:370
		tid = 1060
		diff = -5
		__FUNCTION__ = "do_timer"
#2  0x000000000058433c in main (argc=1, argv=0x7ffffa0448d8) at core.c:300
		next = <value optimized out>
(gdb)

Any ideas? :)

With installed debug infos, here's a better error dump.

Program terminated with signal 11, Segmentation fault.
#0  0x000000000054285c in battle_delay_damage (tick=0, amotion=2014330,
	src=0x7f554650ecfc, target=0x7f554673c9b0, attack_type=1713032137,
	skill_id=1713031938, skill_lv=-100374869, damage=2376, dmg_lv=4194584792,
	ddelay=1) at battle.c:263
263			 if (src->type != BL_PC && amotion > 1000)
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x000000000054285c in battle_delay_damage (tick=0, amotion=2014330,
	src=0x7f554650ecfc, target=0x7f554673c9b0, attack_type=1713032137,
	skill_id=1713031938, skill_lv=-100374869, damage=2376, dmg_lv=4194584792,
	ddelay=1) at battle.c:263
		dat = <value optimized out>
		sc = <value optimized out>

I've altered 263 to reflect -
if (dat->src && src->type != BL_PC && amotion > 1000)
I will see what happens. Let me know if you have any ideas of what it is.

Before I added the above change, I received this crash:
Program terminated with signal 11, Segmentation fault.
#0  mob_log_damage (md=0x7f0898dfaafc, src=0x7f089726ce34, damage=153)
    at mob.c:2157
2157								    char_id = hd->master->status.char_id;
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  mob_log_damage (md=0x7f0898dfaafc, src=0x7f089726ce34, damage=153)
    at mob.c:2157
	    hd = 0x7f089726ce34
	    char_id = 0
	    flag = 1
#1  0x00000000004b1910 in mob_damage (md=0x7f0898dfaafc, src=0x7f089726ce34,
    damage=<value optimized out>) at mob.c:2268
No locals.
#2  0x000000000048f055 in status_damage_ (src=0x7f089726ce34,
    target=0x7f0898dfaafc, hp=<value optimized out>, sp=0, walkdelay=576,
    flag=0, skill=0) at status.c:1190
	    status = 0x7f0898dfad9c
	    sc = 0x7f0898dfadf4
	    __FUNCTION__ = "status_damage_"
#3  0x00000000005429ad in battle_delay_damage_sub (tid=<value optimized out>,
    tick=1718001439, id=110027904, data=139674899129880) at battle.c:212
	    dat = 0x7f0898bf2618
	    target = 0x7f0898dfaafc
#4  0x0000000000586e6f in do_timer (tick=1718001446) at timer.c:370
	    tid = 4634
	    diff = -7
	    __FUNCTION__ = "do_timer"
#5  0x00000000005843ec in main (argc=1, argv=0x7fff480916e8) at core.c:300
---Type <return> to continue, or q <return> to quit---
	    next = <value optimized out>


Side note, I'm running a server with 200+ players. So, it's hard to reproduce some of this unless it happens live. :x

Edited by Cookie, 28 June 2012 - 02:07 AM.


Originally posted by Ind
I think the lock mismatch issue is what is giving your problems; you should check anywhere in your custom code where it uses map_freeblock_lock/unlock -- the issue being the mismatch causes the system to delete units before they're done being used by some functions, which ends up causing these weird crashes, in my opinion.

Originally posted by Ind
and by the way this what you did here
if (dat->src && src->type != BL_PC && amotion > 1000)
should at be
if (src && src->type != BL_PC && amotion > 1000)


Originally posted by Ind
you should go after the lock problem though, because it'll break code over many other places oo not only these (as you see mob_log_damage also joined the crew)

Originally posted by Cookie
I haven't really touched the source, to be honest, since I moved to rAthena. :/ Nothing major, besides re-adding bindatcmd to the atcommand.c. What should I chase in my NPC scripts? I've got a helluva of modifications npc-wise.

Originally posted by Ind
ah i thought you had a highly customized source or something, as soon as I read your last post I jumped to the code editor and have gone through all of the map_freeblock_lock/unlock instances and found 2 mismatches. one related to a status change of a elemental (sorc summon) skill and other to RETURN_TO_ELDICASTES and ALL_GUARDIAN_RECALL skills. The fixes are available in [rev=16349] please let me know if the crashes stop or continue.

Originally posted by Cookie
#0  0x000000000054290c in battle_delay_damage_sub (tid=<value optimized out>, 
    tick=1736408851, id=2009430, data=140350851725744) at battle.c:206
206            if( dat->src && id == dat->src->id &&
Missing separate debuginfos, use: debuginfo-install openssl-1.0.0-20.el6_2.5.x86_64 pcre-7.8-3.1.el6.x86_64
(gdb) bt full
#0  0x000000000054290c in battle_delay_damage_sub (tid=<value optimized out>, 
    tick=1736408851, id=2009430, data=140350851725744) at battle.c:206
        dat = 0x7fa5faa939b0
        target = 0x7fa5e3e57714
#1  0x0000000000586e6f in do_timer (tick=1736408851) at timer.c:370
        tid = 1782
        diff = 0
        __FUNCTION__ = "do_timer"
#2  0x00000000005843ec in main (argc=1, argv=0x7fff04e0f9f8) at core.c:300
        next = <value optimized out>

Same thing, again. Any ideas?

Edited by Cookie, 28 June 2012 - 07:06 AM.


Originally posted by Ind
that last gdb report, was it updated to 16349? i was told over irc it was.

Originally posted by Ind
I've used my editor to go through every instance of map_freeblock_stuff within the source as it can be seen in the screen below, I've gone through everyone of them more than once and didn't find any possible mismatching case. scary stuff.


Originally posted by Cookie
[Error]: map_freeblock_timer: block_free_lock(1) is invalid.

On the console again.

Originally posted by Cookie
Sorry, it double posted.

Just a FYI, I also did a search like you did on my source (while I do have a few edits they don't affect these areas) and all checked out... it seems. Didn't find anything returning without being unlocked, et cetera. :/ Still getting that console error... scary stuff (as Ind says, XD)

Edited by Cookie, 28 June 2012 - 11:17 AM.


Originally posted by Wildcard
probably r15314:
the check removed there
map_id2bl(id) == dat->src
used to ensure that the source pointer is not dangling. delayed reflect, source logging out, boom. null checks won't help there.

Pretty sure that's it, but better look again to make sure :D

Originally posted by Cookie
Makes sense. I'll test this out. :)

Originally posted by Cookie
[Error]: map_freeblock_timer: block_free_lock(1) is invalid.

We're getting more of them in the console all showing (1).

if ( target && dat && target->prev != NULL && !status_isdead(target) ) {
        if( dat->src && map_id2bl(id) == dat->src && id == dat->src->id &&
Edit I made.

Edited by Cookie, 28 June 2012 - 01:59 PM.


Originally posted by Ind

probably r15314:the check removed there

map_id2bl(id) == dat->src
used to ensure that the source pointer is not dangling. delayed reflect, source logging out, boom. null checks won't help there.Pretty sure that's it, but better look again to make sure :D

he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.

Originally posted by Wildcard

he has pointers broken in different areas and many map_freeblock_invalid messages within the console. as for r15314 xazax and i have tested several scenarios prior to getting there, it'd not be enough to crash. when the pointer gets freed screwed or whatever the dat->src->id would give some negative/random value, never null.


not sure about the freeblock_lock stuff, but a dangling pointer should not be dereferenced, period. it may crash at any given time, and not crash at others, which makes it unpredictable. the segfault clearly resulted from it. it's also different between OSes/compilers/whatnot, and  generally unpredictable.