Discussion:
Event timeout after it has been freed
(too old to reply)
Robin
2014-06-26 12:36:52 UTC
Permalink
Raw Message
Hey,
I'm still investigating the issues from my other email and came across
another thing.
This might, again, be due to memory corruption on my side - I'm hoping
it isn't though.

I implemented a simple timer-event class which basically creates an
event with a timeout and calls an std::function once that timeout is hit.
Every now and then I get crashes in my ping event.
The ping event is a lambda which captures the "this" pointer of my
socket and just pings the client every 30 seconds.
The crashes are always related to the ping event having a nullptr as the
socket pointer.
So I did some recording in gdb and came to the conclusion it gets set to
0 in the destructor of the event class, which gets called when a socket
disconnects.
Backtrace can be seen here:
http://puu.sh/9KSs8/927aa257e2.txt
Destructor just calls event_del, if the event is running, followed by an
event_free
It rarely happens, I had to wait for that crash for about 5hours, thats
10k connections minimum.

Could it be the event doesnt get canceled when it is pending?
Like the event "queue" could look something like this
1. Disconnect socket x
2. Socket y recieve
4. Socket x send
5. Event z timeout

"1." would cancel Event z though
(I am just guessing here since I have no idea why it's happening)

Really hope someone is able to help me out, slowly going insane..

imer

***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
Nick Mathewson
2014-06-26 14:13:31 UTC
Permalink
Raw Message
Post by Robin
Hey,
I'm still investigating the issues from my other email and came across
another thing.
This might, again, be due to memory corruption on my side - I'm hoping it
isn't though.
I implemented a simple timer-event class which basically creates an event
with a timeout and calls an std::function once that timeout is hit.
Every now and then I get crashes in my ping event.
The ping event is a lambda which captures the "this" pointer of my socket
and just pings the client every 30 seconds.
The crashes are always related to the ping event having a nullptr as the
socket pointer.
So I did some recording in gdb and came to the conclusion it gets set to 0
in the destructor of the event class, which gets called when a socket
disconnects.
http://puu.sh/9KSs8/927aa257e2.txt
Destructor just calls event_del, if the event is running, followed by an
event_free
It rarely happens, I had to wait for that crash for about 5hours, thats 10k
connections minimum.
This looks to me more like a symptom of whatever other event_base
corruption issue is going on than it does like an independent bug,
though I can't know for sure...

This is the kind of thing that could totally benefit from some example
code to show the problem. I'm afraid that I can't be too sure of
what's going on in between the layers of C++ glue in your backtrace.
Post by Robin
Could it be the event doesnt get canceled when it is pending?
Like the event "queue" could look something like this
1. Disconnect socket x
2. Socket y recieve
4. Socket x send
5. Event z timeout
event_del() really is supposed to cancel an event. Generally
speaking, you need to event_del() or event_free() any event that uses
X as its data pointer before you free X.

(Things get a _little_ hairy if that event if running in one thread
and you event_del() it in another: Libevent 2.0 handles that
differently from the latest libevent 2.1 alphas. Could that be going
on here?)

One possible temporary workaround, to help verify whether this is the
bug or whether it's something else, would be to use a weak reference
to the socket rather than a pointer to the socket itself.

hope this helps,
--
Nick
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
Robin
2014-06-26 15:18:16 UTC
Permalink
Raw Message
Post by Nick Mathewson
Post by Robin
Hey,
I'm still investigating the issues from my other email and came across
another thing.
This might, again, be due to memory corruption on my side - I'm hoping it
isn't though.
I implemented a simple timer-event class which basically creates an event
with a timeout and calls an std::function once that timeout is hit.
Every now and then I get crashes in my ping event.
The ping event is a lambda which captures the "this" pointer of my socket
and just pings the client every 30 seconds.
The crashes are always related to the ping event having a nullptr as the
socket pointer.
So I did some recording in gdb and came to the conclusion it gets set to 0
in the destructor of the event class, which gets called when a socket
disconnects.
http://puu.sh/9KSs8/927aa257e2.txt
Destructor just calls event_del, if the event is running, followed by an
event_free
It rarely happens, I had to wait for that crash for about 5hours, thats 10k
connections minimum.
This looks to me more like a symptom of whatever other event_base
corruption issue is going on than it does like an independent bug,
though I can't know for sure...
This is the kind of thing that could totally benefit from some example
code to show the problem. I'm afraid that I can't be too sure of
what's going on in between the layers of C++ glue in your backtrace.
Post by Robin
Could it be the event doesnt get canceled when it is pending?
Like the event "queue" could look something like this
1. Disconnect socket x
2. Socket y recieve
4. Socket x send
5. Event z timeout
event_del() really is supposed to cancel an event. Generally
speaking, you need to event_del() or event_free() any event that uses
X as its data pointer before you free X.
(Things get a _little_ hairy if that event if running in one thread
and you event_del() it in another: Libevent 2.0 handles that
differently from the latest libevent 2.1 alphas. Could that be going
on here?)
One possible temporary workaround, to help verify whether this is the
bug or whether it's something else, would be to use a weak reference
to the socket rather than a pointer to the socket itself.
hope this helps,
The application is singlethreaded, thankfully. So that's out of the way

Code..
You can find the Event timer thing here:
https://github.com/imermcmaps/m2t/tree/master/Server/shared/util
Basic Socket class:
https://github.com/imermcmaps/m2t/blob/master/Server/shared/net/Socket.cpp
The ping event is created here:
https://github.com/imermcmaps/m2t/blob/master/Server/game/net/socket/Client.cpp#L141
The codebase is still a bit messy, you should find everything related to
the network in either shared/net/ or game/net/

Well, if it is a memory corruption issue.. Any ideas how I would debug that?
I've reviewed the code myself at least once, multiple times for the
"core" files. Debugged it with valgrind & gdb. Ran it through multiple
static code analysers and now the record thing since I have a consistent
crash which happens 50% of the time, although I'm beginning to think
this is just a coincidence...
Kinda out of ideas now
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
Nick Mathewson
2014-06-27 14:25:57 UTC
Permalink
Raw Message
On Thu, Jun 26, 2014 at 11:18 AM, Robin <***@imer.cc> wrote:
[...]
Post by Robin
Code..
https://github.com/imermcmaps/m2t/tree/master/Server/shared/util
https://github.com/imermcmaps/m2t/blob/master/Server/shared/net/Socket.cpp
https://github.com/imermcmaps/m2t/blob/master/Server/game/net/socket/Client.cpp#L141
The codebase is still a bit messy, you should find everything related to the
network in either shared/net/ or game/net/
This is an odd thought -- but I believe it's possible for
Socket::OnEvent to get called with both BEV_EVENT_EOF and
BEV_EVENT_ERROR . If that happens, your code will call
Socket::OnDisconnect twice, creating _two_ timer events for a
reconnect. Could that be going on here?
--
Nick
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
Robin
2014-06-27 15:24:06 UTC
Permalink
Raw Message
Post by Nick Mathewson
[...]
Post by Robin
Code..
https://github.com/imermcmaps/m2t/tree/master/Server/shared/util
https://github.com/imermcmaps/m2t/blob/master/Server/shared/net/Socket.cpp
https://github.com/imermcmaps/m2t/blob/master/Server/game/net/socket/Client.cpp#L141
The codebase is still a bit messy, you should find everything related to the
network in either shared/net/ or game/net/
This is an odd thought -- but I believe it's possible for
Socket::OnEvent to get called with both BEV_EVENT_EOF and
BEV_EVENT_ERROR . If that happens, your code will call
Socket::OnDisconnect twice, creating _two_ timer events for a
reconnect. Could that be going on here?
Well, if it were called twice in a row the second time m_reconnect_event
would be set
Post by Nick Mathewson
if (m_reconnect_event) { // 2nd time
if (m_reconnect_event->IsStarted()) { // will be started
sys_err("Reconnect event already running");
return; // and return
} else {
m_reconnect_event->Start();
}
} else { // first run: create a new event ...
Not to mention the reconnect is only used for the connection to the
cache, and I dont have any logs of that failing.
Although, thanks for mentioning that, made me look over the code and
improve some stuff. For one, the event would never get freed, so yeah.
Thanks :)


***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.

Loading...