Performance anlysis of our own full blown HTTP


In previous post Let's do our own full blown HTTP server with Netty 4 you and I were excited by creation of our own web server. So far so good.But how good? Gven ordinary notebook 
cat /proc/cpuinfo | grep model\ name
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
cat /proc/meminfo | grep MemTotal
MemTotal:        3956836 kB
(Ok,there are only 4 cores with hyperthreading.)We have following numbers 
build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
progress:  10% done
progress:  20% done
progress:  30% done
progress:  40% done
progress:  50% done
progress:  60% done
progress:  70% done
progress:  80% done
progress:  90% done
progress: 100% done

finished in 20 sec, 195 millisec and 236 microsec, 49516 req/s, 7251 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 149967800 bytes total, 145967800 bytes http, 4000000 bytes data
49516 req/s.Is this good?I'd say very good.We need some baseline to compre to.As such baseline we'll use inx serving small static file.'Why serving a file is faster than serving a piece of memory?you ask.With send file turned on nginx after reponse header sent just instruct kersnel to send file to socket、file after first hit is in linux buffer cache、thus kersnel sends a page from memory to Ethernet card.Zero-copy.If you are about to send dynamically generant reponse from memory kersneeds to copy.If appe of data from userspace to nelspace.Thus examplefine logbe。 
worker_processes  4;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       6666;
        server_name  localhost;

        location / {
            root   html;
            index  index.html index.htm;
        }
    }
}
The same test 
build/default/weighttp -n 1000000 -k -c 100 http://localhost:6666/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 11 sec, 402 millisec and 913 microsec, 87696 req/s, 72705 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 848950295 bytes total, 236950295 bytes http, 612000000 bytes data
87696 req/s.Very well,thus we are not so far behind.Lets draw other baseline.Python is consided slow.But it is not as slow.Add to your nginx config 
http {
    upstream test {
        server unix:///home/adolgarev/uwsgi.sock;
    }
    ...
    server {
        location /test {
            uwsgi_pass  test;
            include     uwsgi_params;
        }
        ...
    }
}
uwsgi protocol is far more effective than http thus we are using uwsgipass instead of proxy_pass.Start uwsgi with small WSGI test program 
cat test.py
def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return ["Hello World"]
uwsgi --socket uwsgi.sock --wsgi-file test.py --master --processes 4 --threads 2
The same test 
build/default/weighttp -n 100000 -k -c 100 http://localhost:6666/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 100000 total requests
...
progress: 100% done

finished in 9 sec, 264 millisec and 26 microsec, 10794 req/s, 1844 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 17495280 bytes total, 15395280 bytes http, 2100000 bytes data
10 794 req/s,I'd say good enough as for python. Another thing to compre to is GlassFish Server Open Source Edition 4.0 with Servlets 3.1パワードby Grizly - a comptitor of Netty.Small Servlet 
package test;

import java.io.IOException;
import java.io.PrintWriter;
import javax.json.Json;
import javax.json.stream.JsonGenerator;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet("/test")
public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        resp.setContentType("text/plain");
        PrintWriter writer = resp.getWriter();
        try (JsonGenerator gen = Json.createGenerator(writer)) {
            gen.writeStartObject().write("res", "Ok").writeEnd();
        }
    }

}
And the same test 
build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 41 sec, 745 millisec and 917 microsec, 23954 req/s, 6855 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 293074157 bytes total, 281074157 bytes http, 12000000 bytes data
23954 req/s,somewhere in between.But surpringly if we remove json dependency it hell fast. 
@WebServlet("/test")
public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        resp.setContentType("text/plain");
        try (PrintWriter writer = resp.getWriter()) {
            writer.print("Ok");
        }
    }

}
build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 10 sec, 944 millisec and 231 microsec, 91372 req/s, 25169 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 282074100 bytes total, 280074100 bytes http, 2000000 bytes data
Amazing 91372 req/s,faster than nginx(due to the whole reponse is small enough to fit to one memory page and even packet with MTU 1500).I'm still not satisfied.Profile our server. 
Eliminate most ovious hot spots. 
build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 9 sec, 298 millisec and 269 microsec, 107546 req/s, 10292 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 98000000 bytes total, 94000000 bytes http, 4000000 bytes data
107546 req/s,and we can do even better.Conclusion?Our server is fast.But Servlets are fast too、taring into account asynchronous support(Servlets 3.0)and nonblocking I/O(Servlets 3.1)they are good choice for http.Our server habviders vantage