Mailing List

For the website and the games I will build over time, I wanted to have a mailing list, so I could have my future players register and receive updates of the development process, releases, etc. I could register to some kind of online mailing list but it looked like a project I could use to test an approach I wanted to experiment for some time. Doing a web service in C.

When thinking and talking with other people about doing some web services in C, I would usually get comments about it like:

After some time I decided to experiment and see if these concerns would be real or just something that was propagated over time and people just repeated it.

Initial Investigation

From the pointed problems I could dismiss the following ones has they aren't concrete problems:

For the other concerns I tried to identify the roots of what makes people say these things and some techniques to see if I could prevent those problems.

When managing memory manually you will always get vulnerabilities, usually people say this because they allocate things individually everywhere and then forget to free the allocated memory in some place. We can prevent this by not doing any individual allocations. We will implement memory allocators, so we can group the allocations and free them in just two or three places.

Handling of strings is hard to do in C, I would say this is caused because of a mixture of the zero terminated strings, the standard library string handling functions and the same problem of allocating memory individually everywhere. Improving the memory management by using allocators will also give us a simpler way to manage the strings, so we will just go with that.

General Requirements

With my motivation settled I started identifying the features that I would need to have a working mailing list:

So we will need the system to interact with any user on my site that wants to register in the mailing list, and we also need more restricted endpoints which will only allow myself to send emails to the registered users.

To send emails I will need to use an email server. I have my own server, so I will go for a self-hosted setup. In order to store the data I will use a database. Also I want a simple interface to check the mailing lists and send the mails. For this I will build a desktop application.

Tools Choices

Im used to working with web servers in multiple programming languages and usually my deploy setup uses nginx as a reverse proxy, so I can have a secure configuration for the user facing connection and then use any application server I want behind.

For the application server since I was going to use C I selected the libhttp library. For the database I went with PostgreSQL and to interact with it I went with their C library libpq.

For desktop application I followed the conclusions I got from my memory comparison exploration. I went with GTK since I found it the easiest to work with and I think it looks good. Then to communicate from the desktop application to the web server and to send emails I chose libcurl.

Finally, for the self-hosted mail server I chose postfix for simplicity and the awesome emailwiz script created by Luke Smith. This script helps alot with the postfix setup and also configures spamassassin and OpenDKIM. With these choices the resulting high level structure of the systems was something like:

Mailing List Architecture
Implementation

User Registration

The first feature to implement is the registration of users in the mailing list. To do this I need to set up the server, handle the registration request, approve or not the registration and finally email the registered user telling him he is registered.

The setup of the http server with httplib was pretty standard. Just had to define configurations like number of threads, ports and setup the request handler for each wanted endpoint. The place where differences started to appear was when implementing the specific endpoints. The code to get the request and each variable had to specify the expected sizes in bytes.

post_data_len = httplib_read(context, conn, post_data, sizeof(post_data));
httplib_get_var(post_data, post_data_len, "email", email, EMAIL_SIZE);
httplib_get_var(post_data, post_data_len, "game_id", game_id, INT_MAX_DIGITS);

In scripting languages this can be done at the validation step but it's optional. Here I was forced to put limits in the size of the variables from the beggining which led me to check RFCs for emails and things like that to know their max possible size. This little difference will make us have fixed sizes which will led to a predictable value of the max needed memory. Also it will give us specific numbers we can use for the database and other parts instead of using dynamic sizes.

Continuing on this memory thinking I started to implement the arena allocator. The implementation is just allocating a big array when the arena is created which later is used as memory source for each individual allocation. This essentially makes our individual allocations not request memory from the OS since it was requested in a batch at the start. At the same time we will just need to free all the memory once at the end.

Also since our memory usage for the endpoint is fixed, we don't need to really change the arena memory size after the server starts. So we can just reset the offset from the array where we are doing the next allocation to the beggining and reuse the same memory for each new request to the same endpoint. This will result in we just doing one memory allocation at the moment the program starts and never more.

typedef struct Arena {
  unsigned char * buffer;
  size_t buffer_length;
  size_t offset;
} Arena;

void reset_arena(Arena * arena){
  arena->offset = 0;
}

For this case, and with this approach, I think it's hard for us to leak memory. Also if any problem appears its just one place that is allocating the memory so it should be easy to find!

With the data from the registration request, next we need to insert this data in the database. The libpq is the standard interface to interact with PostgreSQL and is close to the interfaces in other languages. We just need to know sql and check the documentation on which functions to use.

const char * query = "INSERT INTO registration(game_id, email) VALUES ($1, LOWER($2))";
const char * values[2] = {game_id, email};
int lengths[2] = {INT_MAX_DIGITS, EMAIL_SIZE};
int formats[2] = {0, 0};

db->result = PQexecParams(db->conn, query, 2, NULL, values, lengths, formats, 0);

Having checked the email is valid according to the email RFC and with it inserted in the database we consider the operation a success and email the new user telling it everything went ok. To send the email I used libcurl.

I used their easy api which simplifies the operations, but the need to implement the function that decides how to copy the data over time may give us more control but is a bit more involved than libraries in python or other languages. On those we just need to say the content and the title. In the end with their examples I was still able to implement it.

static size_t read_callback(char *ptr, size_t size, size_t nmemb, void *userp){
  if((size == 0) || (nmemb == 0) || ((size*nmemb) < 1)) {
    return 0;
  }

  Email * email = (Email *)userp;
  const char *data = &(email->body[email->progress]);
  if(data) {
    size_t len_to_read = min(strlen(data), size * nmemb);
    memcpy(ptr, data, len_to_read);
    email->progress += len_to_read;
    return len_to_read;
  }

  return 0;
}

Having the email sent I faced a problem. The sending of the email sometimes took a good amount of time and this was in the middle of the handling of the request which took alot of time to answer the user and at the same time blocked the thread not allowing it to pick other users requests.

To solve this problem I created a dedicated module to send the emails. It runs in a separate thread, allows the registration of new emails into a queue and then manages the sending of emails. This allows for emails to be sent and if any delay occurs it doesn't impact other areas of the code just this module.

int queue_email(EmailDaemon * daemon, Email * email)

Thinking about memory again, there are some things we need to consider in this module. We need to be careful since emails can have a big size. As a hard limit I used 100KB as the max size. I can do this without worries since I'm the one which will send the emails. And the images and things like that will come from external sources.

Then we need to consider that the memory usage pattern in this case is not linear. We may send one email and want to free the memory associated with it while other threads register three new emails to be sent. This will not work well with the arena since it does the free in bulk and only at the end of the scope, which for this module will be only at the end of the program. We could grow the arena indefinitely as needed, but since we are reserving 100kb for each email we will get into trouble fast.

We know that all the allocations for emails will have the same size, this allows us to implement a specific allocator called the Pool Allocator. This allocator splits the allocated buffer in chunks of the same size, and then uses a linked list to manage the allocations and frees of the chunks in constant O(1) time. All with a fixed memory usage and no more allocations after the first one.

typedef struct Pool {
  unsigned char * buffer;
  size_t total_chunks;
  size_t chunk_size;

  PoolFreeNode * head;
} Pool;

Having the pool implemented the building of queue was done by creating a linked list of pointers to emails allocated from the pool. Having the queue ready, the sending of email was to use the initial implementation to send emails from the registration endpoint in the module and picking the email from the queue. Other than that there's some thread synchronization in the queue insert and remove and error handling in the email sending.

With this done now we need to build the UI for the user to register. This was done with simple html and some javascript to send the request to the server and to display the result of the registration.

Mailing List Web UI

If the registration is a success we also need to email the user. At the start I was trying to add images and things like that to the email using base64, but email clients like gmail were not reading it as intended. So in the end I just put links for images I have stored in my website. This is supported by more clients and also has the advantage that emails size will be much smaller.

Mailing List Registration Email

With these steps done the registration of the user now works as wanted. It was a bit of extra work since it's my first time using these memory management techniques, but it should be simpler for the next features since these are already implemented.

User Unregistration

As was possible to see in the above email image there is a link at the bottom if the user wants to unregister the mailing list. This link has an uuid which was generated by the database automatically at the moment of the user registration. This is possible to do by just using the gen_random_uuid in the database model which is a function that exists in PostgreSQL since version 13.

CREATE TABLE registration (
  id SERIAL PRIMARY KEY,
  game_id INTEGER REFERENCES game(id),
  email VARCHAR(254) NOT NULL,
  unregister_token UUID DEFAULT gen_random_uuid(),
  UNIQUE (email, game_id)
);

The deletion is simple. We just pick the uuid from the request, check if the it exists in the database and if it exists we delete the registration entry from the database.

The UUID parameter in the request has a max size, the intermediate operations to the database and the code to answer the user will use a fixed amount of memory, so we know exactly the max amount of needed memory at the start. As before we can allocate the needed memory from the arena at the start and reuse it forever without worry.

Check Registered Users

This feature will only be available to me, or people I allow working with the mailing list. It will need a desktop client and more restricted endpoints on the server side. For this validation that the request comes from a trusted person I will use an api token in the server that is known by the client and will be sent in each request. For now this should be enough since I will be the only person using it. If later I need to expand I can create a token for each client and then on the server pick the tokens from the database.

After the request from the user is validated we pick the registered entries from the database, this operation may give any number of results and as such will use an undetermined amount of memory. For now, I used a bigger value for the initially reserved memory for the arena which responds to this endpoint, but the true solution for this is to add pagination to the endpoint which will make the results size fixed.

const char * query = "SELECT email, unregister_token FROM registration WHERE game_id=$1";
const char * values[1] = {game_id};
int lengths[1] = {INT_MAX_DIGITS};
int formats[1] = {0};
db->result = PQexecParams(db->conn, query, 1, NULL, values, lengths, formats, 0);
int total_rows = PQntuples(db->result);
list_out->registrations = alloc_in_arena(arena, sizeof(Registration) * total_rows);

Having the registrations, they are then converted to JSON and sent to the client. This ends the part that was needed on the server side. Then I started building the place where the registered emails are displayed on the desktop client. With that done we fetched the emails from the newly created endpoint with a http request using libcurl.

Having received the data in JSON format, I used jsmn to deal with it. Since it's a single header library, to include it in the project I just needed to include it's jsmn.h file. With the data processed I added it to the new created UI as we can see bellow.

Desktop GUI Registered Emails

For the memory management in the desktop client I also used the same Arena implementation and their memory is also allocated in batch at the beggining and then reused. Only one allocation and free is done for each Arena during the whole execution of the GUI.

In terms of memory, the place where I needed care was mostly with the gtk elements which I needed to know their lifecycle and remember to free each one individually.

Send Email to Registered Users

The endpoint to send emails to the registered users receives the needed email fields, subject and body. With these an email is prepared and inserted in the email queue module to be sent to all the registered users. The max size of fields and the email are known from the beggining which makes the used memory always fixed and allows us easily to reuse our arena forever.

For the UI it is mainly two boxes with labels, one for the subject and another one for the body of the email. These fields are then sent to the email sending endpoint on the server and are sent to their associated users.

Desktop GUI Send Email
Conclusions

With this all the mailing list features are ready. The main point I wanted to test was the memory management approach. I really liked it, It was some more work to implement the arena and pool the first time but after that it was much simpler to do memory management. It started to be more about lifecycles and scopes than individual allocations.

Also I get to reuse the memory and not allocate memory again after the initial allocations. This gives us speed in allocations and prevents possible bugs since each malloc can fail and usually programs do not have a strategy to handle each individual memory error.

Well of course this approach also has tradeoffs. Since I allocate all the needed memory at the beggining I have it reserved even when I'm not using it. While I think the tradeoff is completely worth it, in some cases the operative system will help us. Because of the copy-on-write if I don't touch some part of the allocated memory it will not be used by the program. The kernel allocation will reserve the memory but until we write to it, the program does not really have it in use.

Other thing that was a worry was the string handling code. I used mostly the standard library functions which have the "n" letter, those have a max size parameter which helps. I think better string support would be preferable. But one big thing the allocators helped was that I was always sure all the intermediate temporary strings and buffers I allocated were freed at the end.

Actually the place that scared me the most in terms of failing to manage memory was when dealing with gtk. Their documentation is good but since multiple individual elements with different lifecycles have to be freed, I was always with the fear I miss something.

So in general programming for the web in C was not that difficult. With the allocators doing the memory management in scopes was simple. While I would like native string support the place were I think I suffered a bit more was with the small quantity of examples and community that exists. Well in the end I found things for everything that I needed, web server, database access, sending emails, JSON handling, so maybe it's not that much of a problem.

Repository

If you want to check the code and play with it you can check the following repository.

Other

If you want to receive updates when there are new games or development stories join the mailing list