Thank you for visiting!

My little window on internet allowing me to share several of my passions

Last Articles:

FAPWS is still in the game ;)

Posted on 2024-07-08 18:52:00 from Vincent in fapws

It's about 10 years now that I'm using FAPWS as WSGI server. Despite his very simple code base, it's remains the fastest I can test. This surprised me.

Introduction

I'm using a small virtual server on internet where I run several websites.
All of them are running their own Fapws system.
They are combined thanks to pound.

After several years of "use and forget", I was interested to move away and use something a bit more professional and maintained.

This post is the travel I did, and the conclusion I take for my self.

How to compare webservers ?

Which tool

Like you can see in my different posts on this blog, I'm mainly using the ApacheBenchmark tool to test a webserver.

On an OpenBSD machine, you have to install the whole Apache-HTTP package, just to have AB:

$ doas pkg_add apache-httpd

At the time of writing this page, the version is 2.4.59.

With AB, I mainly use 2 parameters:

-n requests     Number of requests to perform
-c concurrency  Number of multiple requests to make at a time

I usually run with 10.000 requests and 100 or 200 concurrent requests.

How to measure it ?

It's not a good idea to run different AB benchmark without taking some cares.
Indeed, a massive run like this let lot of open socket, connections, ...
That's why I've build a small script waiting that the "Network status" is going back to a normal stage.
In the script bellow, this is what "wait_netstat" does.

On the other side, performing 3 runs with same parameters allows to see if we have a consistency. In not, maybe the context of the test is not optimal. This is what "bench" is doing.

It depends, but in my case having a number of open "Network connections" going back to a value lower than 1000 is good enough.
Usually the calculation like I've put in comments is quite good. But for the following runs, I've artificially setup the limit to 1000.

So, here after my bench.sh script.
I have to adapt the 1st line to match the port used by the webserver (SERVER) and I trigger the whole test.

On my lenovo T460 is take about 5 minutes.

#!/bin/sh

SERVER="http://127.0.0.1:8080"

#NET_INIT=$(netstat -a | wc -l)
#NET_INIT=$(($NET_INIT + 100))
NET_INIT=1000

echo "init netstat:$NET_INIT"
wait_netstat()
{
    NETSTAT=$(netstat -a | wc -l)
    while [ $NETSTAT -gt $NET_INIT ]
    do
         echo -ne "netstat: $NETSTAT\r"
         sleep 1
        NETSTAT=$(netstat -a | wc -l)
    done
}

bench ()
{
    for i in 1 2 3
    do
        wait_netstat
        curr_date=$(date)
        echo "$curr_date: $1 $2 $SERVER$3" >> bench.errorlog
        res=$(ab -n$1 -c$2 $SERVER$3 2>>bench.errorlog | grep Requests)
        val=$(echo $res | cut -d' ' -f4)
        echo "$1 $2 $3: $val"
    done
}

bench 1000 10 /
bench 10000 10 /
bench 50000 10 /
bench 10000 100 /
bench 50000 100 /
bench 10000 500 /
bench 50000 500 /
bench 10000 1000 /
bench 50000 1000 /

Now that we have our torture's instrument, let's look at the webservers

Which are WSGI server available ?

Falsk: installation

Since I see this name quite often (Flask, I immediately switch to it.

the installation was quite easy

$ doas pkg_add py3-flask

So, on my OpenBSD 7.5 we have this:

 $ flask --version
Python 3.10.14
Flask 2.1.3
Werkzeug 2.1.2

Then I build the simple "Hello World!" page:

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello_world():
    return "Hello, World!"

Let's call this file: flask-hello.py

The execution will be via definition of FLASK_APP and execution of flask:

$ export FLASK_APP=flask-hello.py
$ flask run                                 <
 * Serving Flask app 'flask-hello.py' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000 (Press CTRL+C to quit)

Flask: benchmarks

To avoid performance impact due to the display, I'll do:

$ flask run >> /dev/null 2>&1

By adapting the SERVER variable of our bench.sh script, we have those results:

init netstat:1000
1000 10 /: 1070.43
1000 10 /: 1160.09
1000 10 /: 1109.75
10000 10 /: 1123.70
10000 10 /: 1128.35
10000 10 /: 1124.01
50000 10 /: 1127.14
50000 10 /: 1124.42
50000 10 /: 1125.11
10000 100 /: 1124.54
10000 100 /: 1121.72
10000 100 /: 1123.78
50000 100 /: 1123.33
50000 100 /: 1125.00
50000 100 /: 1125.06
10000 500 /: 1129.33
10000 500 /: 1123.75
10000 500 /: 1104.29
50000 500 /: 1120.51
50000 500 /: 1118.79
50000 500 /: 1116.33
10000 1000 /: 910
10000 1000 /:
10000 1000 /:
50000 1000 /: 059
50000 1000 /: 200
50000 1000 /: 047

Flask: observations

We can see that Flask is quite consistent. It keeps his performance of +-1100 requests per seconds whatever the load we apply on it.

With 1000 concurrent requests, I have lot of errors. We will see that this is the case for all webservers. So, we do not have to take those elements into account. I've not investigated the root cause, but this is probably a limit of my machine.

FastAPI : installation

Our next candidate is FastAPI

FastAPI is not present in the OpenBSD's repository. So, let's follow the documentation and we will install via the recommendation:

$ pip3 install --break-system-packages fastapi

The parameter "--break-system-packages" is mandatory. Because without it it says that you will break you machine. But is you noticed it I've not used "doas". So, I'm installing it in my "local" environment. Whatever will goes, a simple rm command in ~/.local/ will allow me to remove any traces ;)

This installation process took +20 minutes because it compiles all elements not yet present on your machine. And it has compiled lot of rust programs.

What we have is this:

$ pip list | grep -i fast
fastapi            0.111.0
fastapi-cli        0.0.4

I've taken the example like described in the documentation. So our script will be this:

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def read_root():
    return "Hello World"

Let's save it as fastapi-hello.py

To run the server, we can do:

$ fastapi run fastapi-hello.py

FastAPI: benchmark

Like for Flask, we will avoid performance impact due to displays. so we will execute this:

$ fastapi run fastapi-hello.py >> /dev/null 2>&1

By adapting the SERVER variable of our bench.sh script, we have the following results:

init netstat:1000
1000 10 /: 1408.50
1000 10 /: 1519.04
1000 10 /: 1512.39
10000 10 /: 1479.77
10000 10 /: 1491.21
10000 10 /: 1487.46
50000 10 /: 1481.20
50000 10 /: 1480.50
50000 10 /: 1479.50
10000 100 /: 1440.75
10000 100 /: 1438.47
10000 100 /: 1470.74
50000 100 /: 1469.71
50000 100 /: 1457.66
50000 100 /: 1473.27
10000 500 /: 1474.88
10000 500 /: 1463.48
10000 500 /: 1437.19
50000 500 /: 1465.47
50000 500 /: 1461.15
50000 500 /: 1466.50
10000 1000 /: 630
10000 1000 /:
10000 1000 /:
50000 1000 /: 250
50000 1000 /: 290
50000 1000 /:

FastAPI: observations

Here too figures are quite consistent. The performance goes to +- 1500 requests/seconds. And the high load is reporting lot of errors here too.

So, we are a bit more quick than Flask, but this is not a big step.

Gunicorn: installation

Our next candidate is gunicorn as reported on several sites as a strong and stable candidate for wsgi production systems.

The installation process is quite simple:

$ doas pkg_add py3-gunicorn

And I've received the version: 20.0.4p4

The script we will use is the following:

def app(environ, start_response):
    data = b"Hello, World!

"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
return iter([data])

Let's call it gunicorn-hello.py

Note: Do not be worried with the iter and start_response. Despite not written in previous scrips, all wsgi webservers are doing the same. Either transparently, either like written in this script.

Gunicorn: benchmark

Like for previous cases, we redirect output messages to /dev/null:

$ gunicorn -w 4 gunicorn-hello:app >> /dev/null 2>&1

After small modification of SERVER in our bench.sh script, we have the following results:

init netstat:1000
1000 10 /: 5159.19
1000 10 /: 5290.59
1000 10 /: 5230.02
10000 10 /: 5188.24
10000 10 /: 5322.52
10000 10 /: 5430.72
50000 10 /: 5171.86
50000 10 /: 5166.13
50000 10 /: 5051.11
10000 100 /: 5214.67
10000 100 /: 5395.82
10000 100 /: 5473.83
50000 100 /: 5306.10
50000 100 /: 5254.61
50000 100 /: 5292.09
10000 500 /: 5194.47
10000 500 /: 5296.52
10000 500 /: 5148.82
50000 500 /: 5213.70
50000 500 /: 5248.13
50000 500 /: 5259.97
10000 1000 /: 269
10000 1000 /:
10000 1000 /:
50000 1000 /:
50000 1000 /:
50000 1000 /: 130

Gunicorn: observations

We have a quite consistent performance results. Here we are much higher than the 2 previous tools, since we reach +- 5200 requests per seconds.

I must emphasis on the fact that we have used 4 pre-forked webservers. This is not really fair compared to the previous tests. But why not using it ;). It's available out of the box, so why not using it ?

I let you discover the ApacheBenchmark results if you pre-fork with only 1 instance ;).

Fapws4: installation

Finally, let's look at what our good old Fapws4 can do ?

After having downloaded the tar file from Git of SourceForge, we can uncompress it:

$ unzip fapws4-code*
$ cd fapws4-code*
$ sh makes.sh build

Everything is included ;) No need of extract download, compile, ...

If you have compilation errors, please check that you have python or python3 or python3.xx on your machine. Normally it's not necessary, but you could adapt the PYTHON_EXE variable of the script.

With all recent versions of Python's library, you could keep LIBS as it is. If not, remove the "-embed" strings.

The script we will use is the one present in samples/hello:

from fapws4.base import Start_response, Environ

def hello(env, resp):
    return "hello world!"

PATHS = [('/', hello)]

Let's save it as fapws-hello.py

To run this script, you just have to perform:

fapws4-3.10 fapws-hello.py

Fapws4: benchmark

Like for other, we will redirect outputs to /dev/null:

$ fapws4-3.10 fapws-hello.py >> /dev/null 2>&1

As usual we have to adapt our SERVER variable in our bench.sh script.

The results are:

init netstat:1000
1000 10 /: 9588.37
1000 10 /: 9666.97
1000 10 /: 6758.90
10000 10 /: 9157.32
10000 10 /: 9604.04
10000 10 /: 9035.37
50000 10 /: 9263.16
50000 10 /: 9190.90
50000 10 /: 9282.17
10000 100 /: 8777.69
10000 100 /: 9333.50
10000 100 /: 9321.82
50000 100 /: 9053.99
50000 100 /: 9184.82
50000 100 /: 9048.74
10000 500 /: 9040.19
10000 500 /: 9056.06
10000 500 /: 9171.56
50000 500 /: 8993.50
50000 500 /: 8887.31
50000 500 /: 9041.68
10000 1000 /:
10000 1000 /:
10000 1000 /:
50000 1000 /:
50000 1000 /:
50000 1000 /:

Fapws4: observations

Here the consistency is a bit less the same between different runs. We have one run to 6758 request/seconds, few others to +- 8800 request/seconds and some to 9200 requests/seconds.

Like for the other tests, 1000 concurrent requests is generating errors.

But why do we have 9x of 2x the speed of the other tools ?

httpd: benchmark

Just for fun, I've decided to compare results with the httpd server provided on each OpenBSD install.
So, it's the one provided with OpenBSD 7.5

I've build a file in /var/www/htdocs/ called hello.html with the famous phrase: "Hello world!".

To run it, I've done:

$ doas rcctl -f start httpd

I've adapted the SERVER variable of bench.sh, but also the url which become /hello.html.

And the results are:

init netstat:1000
1000 10 /hello.html: 7467.03
1000 10 /hello.html: 7558.98
1000 10 /hello.html: 7474.40
10000 10 /hello.html: 7364.44
10000 10 /hello.html: 7204.13
10000 10 /hello.html: 7292.36
50000 10 /hello.html: 6982.97
50000 10 /hello.html: 7247.39
50000 10 /hello.html: 7235.13
10000 100 /hello.html: 6902.97
10000 100 /hello.html: 7233.37
10000 100 /hello.html: 7295.96
50000 100 /hello.html: 7364.34
50000 100 /hello.html: 7560.10
50000 100 /hello.html: 7634.14
10000 500 /hello.html: 7557.44
10000 500 /hello.html: 6824.47
10000 500 /hello.html: 6722.13
50000 500 /hello.html: 7220.75
50000 500 /hello.html: 7217.92
50000 500 /hello.html: 7331.89
10000 1000 /hello.html:
10000 1000 /hello.html:
10000 1000 /hello.html:
50000 1000 /hello.html:
50000 1000 /hello.html:
50000 1000 /hello.html:

So, here too 1000 concurrent connections is not working. So this is definitively something related to my machine. But surprisingly, we see some variations of the results, a bit like with Fapws4. But this time between 6400 requests/seconds and 7600 requests/seconds.

It's worth to note that on OpenBSD httpd is pre-forked with 4 instances.

Conclusions

After those past years, It's amazing to see that Fapws4 is still one of the fastest webservers.
it's even at the level of httpd, a pure C webserver.

Fapws has definitively not all bells and whistles that other SWGI servers have, but what he does, he does it fast ;).

I remind that Fapws4 is based on libuv. And performance of Fapws4 is surely linked to performance of this library.

If you have any suggestion of WSGI I could install and test, feel free to let me a comment bellow ;)

Next test within 5 or 6 years ?

Tweet to @videlft

0
displayed: 3540

Comments:

1. From Alexandre on Fri Sep 20 14:15:54 2024

Hey, Interesting benchmark, thank you for sharing. I've taken a lot of insights. I'm hosting a python web app in my own openbsd instance for the first time. At first I tried to use httpd + gunicorn but it seems httpd cannot proxy requests to gunicorn. And the fact you just used some html files in your httpd benchmark seems to confirm this fact. For now I will go with relayd + gunicorn. gunicorn looks like a reasonable choice in your benchmark. And now I know that I can consider Fapws4 if performances gets critical. Bonne continuation Alexandre

2. From Vincent on Mon Sep 23 10:36:06 2024

Thanks for the comment. Great that you use OpenBSD ;) I should have compared with Tornado ( one of the backend of gunicorn). Maybe later.

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)