Discussion:
Multithreading to handle HTTP keep-alive connections
Nick Giles
2014-05-18 16:43:29 UTC
Permalink
Hello.

A while ago I asked this question on Stackoverflow (
http://stackoverflow.com/questions/21677154/libevent-multithreading-to-handle-http-keep-alive-connections
) but didn't have much luck with getting any replies, so I thought I'd
ask on the mailing list. Here's the question:

I am writing an HTTP reverse-proxy in C using Libevent and I would like
to implement multithreading to make use of all available CPU cores. I
had a look at this example:
http://roncemer.com/software-development/multi-threaded-libevent-server-example/

In this example it appears that one thread is used for the full duration
of a connection, but for HTTP 1.1 I don't think this will be the most
effective solution as connections are kept alive by default after each
request so that they can be reused later. I have noticed that even one
browser panel can open several connections to one server and keep them
open until the tab is closed which would immediately exhaust the thread
pool. For an HTTP 1.1 proxy there will be many open connections but only
very few of them actively transferring data at a given moment.

So I was thinking of an alternative, to have one event base for all
incoming connections and have the event callback functions delegate to
worker threads. This way we could have many open connections and make
use of a thread only when data arrives on a connection, returning it
back to the pool once the data has been dealt with.

My question is: is this a suitable implementation of threads with Libevent?

Specifically -- is there any need to have one event base per connection
as in the example or is one for all connections sufficient?

Also -- are there any other issues I should be aware of?

Currently the only problem I can see is with burstiness, when data is
received in many small chunks triggering many read events per HTTP
response which would lead to a lot of handing-off to worker threads.
Would this be a problem? If it would be, then it could be somewhat
negated using Libevent's watermarking, although I'm not sure how that
works if a request arrives in two chunks and the second chunk is
sufficiently small to leave the buffer size below the watermark. Would
it then stay there until more data arrives?

Also, I would need to implement scheduling so that a chunk is only sent
once the previous chunk has been fully sent.

The second problem I thought of is when the thread pool is exhausted,
i.e. all threads are currently doing something, and another read event
occurs -- this would lead to the read event callback blocking. Does that
matter? I thought of putting these into another queue, but surely that's
exactly what happens internally in the event base. On the other hand, a
second queue might be a good way to organise scheduling of the chunks
without blocking worker threads.

Thanks!

Nick.
Azat Khuzhin
2014-05-19 08:23:32 UTC
Permalink
Post by Nick Giles
Hello.
A while ago I asked this question on Stackoverflow ( http://stackoverflow.com/questions/21677154/libevent-multithreading-to-handle-http-keep-alive-connections
) but didn't have much luck with getting any replies, so I thought I'd ask
I am writing an HTTP reverse-proxy in C using Libevent and I would like to
implement multithreading to make use of all available CPU cores. I had a
look at this example: http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of
a connection, but for HTTP 1.1 I don't think this will be the most effective
solution as connections are kept alive by default after each request so that
they can be reused later. I have noticed that even one browser panel can
open several connections to one server and keep them open until the tab is
closed which would immediately exhaust the thread pool. For an HTTP 1.1
proxy there will be many open connections but only very few of them actively
transferring data at a given moment.
Yes, connection pool is a standart technique for browser, to avoid
timeouts for connect's.
And personally I think that there is not much difference between
keep-alive and not for event-based mechanism, since it will work only
when there will be some data to transfer (read/write).
Post by Nick Giles
So I was thinking of an alternative, to have one event base for all incoming
connections and have the event callback functions delegate to worker
threads. This way we could have many open connections and make use of a
thread only when data arrives on a connection, returning it back to the pool
once the data has been dealt with.
Sounds better, I think you can give it a try.
Post by Nick Giles
My question is: is this a suitable implementation of threads with Libevent?
You could look into libevhtp, and in one of recent emails libevhtp2 was
announced - https://github.com/threatstack/libevhtp/tree/libevhtp2.
Post by Nick Giles
Specifically -- is there any need to have one event base per connection as
in the example or is one for all connections sufficient?
I think that this is not so _optimal_.
Since when you will have 100K connections than you will have 100K event
bases, which is waste of memory and file descriptors, and even more this
is not the case for which fd monitoring was created
(epoll/kqueue/select/...)
One event base per thread this is optimal I think.
Post by Nick Giles
Also -- are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is
received in many small chunks triggering many read events per HTTP response
which would lead to a lot of handing-off to worker threads. Would this be a
problem? If it would be, then it could be somewhat negated using Libevent's
watermarking, although I'm not sure how that works if a request arrives in
two chunks and the second chunk is sufficiently small to leave the buffer
size below the watermark. Would it then stay there until more data arrives?
About libevent and watermarks look here, section "Callbacks and
watermarks":
http://www.wangafu.net/~nickm/libevent-book/Ref6_bufferevent.html
Post by Nick Giles
Also, I would need to implement scheduling so that a chunk is only sent once
the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e.
all threads are currently doing something, and another read event occurs --
this would lead to the read event callback blocking. Does that matter? I
thought of putting these into another queue, but surely that's exactly what
happens internally in the event base. On the other hand, a second queue
might be a good way to organise scheduling of the chunks without blocking
worker threads.
Thanks!
Nick.
--
Respectfully
Azat Khuzhin
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
n***@4pmp.com
2014-05-19 10:40:03 UTC
Permalink
Hello Azat,

Many thanks for your comments. Whilst reading, another idea occurred
to me: have one event base, but call event_base_dispatch() from
several threads and have the callbacks executed within the same thread
as they were called.

This sounds to me better than have one thread to handle _all_ events
and simply farm out the callbacks to other threads, I can imagine that
if there are many events then the one thread which runs the event loop
might get overloaded.

However, if all the threads would be busy executing callbacks, what
happens to the incoming queue of events - where would they be stored
before a callback finishes and a thread is available to handle it?
Also - how would events distributed amongst the threads?

I presume this is a problem that must have been solved already many
times, so I was really just wondering what the canonical solution is.

In the meantime I'll take a look at libevhtp2 as you suggest and see
how they've solved the issue.

Cheers,

Nick.
Post by Azat Khuzhin
Post by Nick Giles
Hello.
A while ago I asked this question on Stackoverflow (
http://stackoverflow.com/questions/21677154/libevent-multithreading-to-handle-http-keep-alive-connections
) but didn't have much luck with getting any replies, so I thought I'd ask
I am writing an HTTP reverse-proxy in C using Libevent and I would like to
implement multithreading to make use of all available CPU cores. I had a
http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of
a connection, but for HTTP 1.1 I don't think this will be the most effective
solution as connections are kept alive by default after each request so that
they can be reused later. I have noticed that even one browser panel can
open several connections to one server and keep them open until the tab is
closed which would immediately exhaust the thread pool. For an HTTP 1.1
proxy there will be many open connections but only very few of them actively
transferring data at a given moment.
Yes, connection pool is a standart technique for browser, to avoid
timeouts for connect's.
And personally I think that there is not much difference between
keep-alive and not for event-based mechanism, since it will work only
when there will be some data to transfer (read/write).
Post by Nick Giles
So I was thinking of an alternative, to have one event base for all incoming
connections and have the event callback functions delegate to worker
threads. This way we could have many open connections and make use of a
thread only when data arrives on a connection, returning it back to the pool
once the data has been dealt with.
Sounds better, I think you can give it a try.
Post by Nick Giles
My question is: is this a suitable implementation of threads with Libevent?
You could look into libevhtp, and in one of recent emails libevhtp2 was
announced - https://github.com/threatstack/libevhtp/tree/libevhtp2.
Post by Nick Giles
Specifically -- is there any need to have one event base per connection as
in the example or is one for all connections sufficient?
I think that this is not so _optimal_.
Since when you will have 100K connections than you will have 100K event
bases, which is waste of memory and file descriptors, and even more this
is not the case for which fd monitoring was created
(epoll/kqueue/select/...)
One event base per thread this is optimal I think.
Post by Nick Giles
Also -- are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is
received in many small chunks triggering many read events per HTTP response
which would lead to a lot of handing-off to worker threads. Would this be a
problem? If it would be, then it could be somewhat negated using Libevent's
watermarking, although I'm not sure how that works if a request arrives in
two chunks and the second chunk is sufficiently small to leave the buffer
size below the watermark. Would it then stay there until more data arrives?
About libevent and watermarks look here, section "Callbacks and
http://www.wangafu.net/~nickm/libevent-book/Ref6_bufferevent.html
Post by Nick Giles
Also, I would need to implement scheduling so that a chunk is only sent once
the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e.
all threads are currently doing something, and another read event occurs --
this would lead to the read event callback blocking. Does that matter? I
thought of putting these into another queue, but surely that's exactly what
happens internally in the event base. On the other hand, a second queue
might be a good way to organise scheduling of the chunks without blocking
worker threads.
Thanks!
Nick.
--
Respectfully
Azat Khuzhin
***********************************************************************
unsubscribe libevent-users in the body.
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.
Azat Khuzhin
2014-05-19 12:05:52 UTC
Permalink
Post by n***@4pmp.com
Hello Azat,
Many thanks for your comments. Whilst reading, another idea occurred to
me: have one event base, but call event_base_dispatch() from several
threads and have the callbacks executed within the same thread as they were
called.
Hi Nick,

According to
http://www.wangafu.net/~nickm/libevent-book/Ref2_eventbase.html:
... Its loop can only be run in a single thread, however. ...
(You could see into event_base_loop() for more details, basically it
because of locks and shared structures)
And besides there is no problems with per-thread event_base at all.

But of course if you will firstly break the loop it will work, but I
don't think that this is the desired behaviour for proxying.

Cheers,
Azat
Post by n***@4pmp.com
This sounds to me better than have one thread to handle _all_ events and
simply farm out the callbacks to other threads, I can imagine that if there
are many events then the one thread which runs the event loop might get
overloaded.
However, if all the threads would be busy executing callbacks, what happens
to the incoming queue of events - where would they be stored before a
callback finishes and a thread is available to handle it? Also - how would
events distributed amongst the threads?
I presume this is a problem that must have been solved already many times,
so I was really just wondering what the canonical solution is.
In the meantime I'll take a look at libevhtp2 as you suggest and see how
they've solved the issue.
Cheers,
Nick.
Post by Azat Khuzhin
Post by Nick Giles
Hello.
A while ago I asked this question on Stackoverflow ( http://stackoverflow.com/questions/21677154/libevent-multithreading-to-handle-http-keep-alive-connections
) but didn't have much luck with getting any replies, so I thought I'd ask
I am writing an HTTP reverse-proxy in C using Libevent and I would like to
implement multithreading to make use of all available CPU cores. I had a
look at this example: http://roncemer.com/software-development/multi-threaded-libevent-server-example/
In this example it appears that one thread is used for the full duration of
a connection, but for HTTP 1.1 I don't think this will be the most effective
solution as connections are kept alive by default after each request so that
they can be reused later. I have noticed that even one browser panel can
open several connections to one server and keep them open until the tab is
closed which would immediately exhaust the thread pool. For an HTTP 1.1
proxy there will be many open connections but only very few of them actively
transferring data at a given moment.
Yes, connection pool is a standart technique for browser, to avoid
timeouts for connect's.
And personally I think that there is not much difference between
keep-alive and not for event-based mechanism, since it will work only
when there will be some data to transfer (read/write).
Post by Nick Giles
So I was thinking of an alternative, to have one event base for all incoming
connections and have the event callback functions delegate to worker
threads. This way we could have many open connections and make use of a
thread only when data arrives on a connection, returning it back to the pool
once the data has been dealt with.
Sounds better, I think you can give it a try.
Post by Nick Giles
My question is: is this a suitable implementation of threads with Libevent?
You could look into libevhtp, and in one of recent emails libevhtp2 was
announced - https://github.com/threatstack/libevhtp/tree/libevhtp2.
Post by Nick Giles
Specifically -- is there any need to have one event base per connection as
in the example or is one for all connections sufficient?
I think that this is not so _optimal_.
Since when you will have 100K connections than you will have 100K event
bases, which is waste of memory and file descriptors, and even more this
is not the case for which fd monitoring was created
(epoll/kqueue/select/...)
One event base per thread this is optimal I think.
Post by Nick Giles
Also -- are there any other issues I should be aware of?
Currently the only problem I can see is with burstiness, when data is
received in many small chunks triggering many read events per HTTP response
which would lead to a lot of handing-off to worker threads. Would this be a
problem? If it would be, then it could be somewhat negated using Libevent's
watermarking, although I'm not sure how that works if a request arrives in
two chunks and the second chunk is sufficiently small to leave the buffer
size below the watermark. Would it then stay there until more data arrives?
About libevent and watermarks look here, section "Callbacks and
http://www.wangafu.net/~nickm/libevent-book/Ref6_bufferevent.html
Post by Nick Giles
Also, I would need to implement scheduling so that a chunk is only sent once
the previous chunk has been fully sent.
The second problem I thought of is when the thread pool is exhausted, i.e.
all threads are currently doing something, and another read event occurs --
this would lead to the read event callback blocking. Does that matter? I
thought of putting these into another queue, but surely that's exactly what
happens internally in the event base. On the other hand, a second queue
might be a good way to organise scheduling of the chunks without blocking
worker threads.
Thanks!
Nick.
--
Respectfully
Azat Khuzhin
***********************************************************************
unsubscribe libevent-users in the body.
***********************************************************************
unsubscribe libevent-users in the body.
***********************************************************************
To unsubscribe, send an e-mail to ***@freehaven.net with
unsubscribe libevent-users in the body.

Loading...