Note

In this series, we will explore the world of Django Channels and how it simplifies handling asynchronous tasks in Django applications. We will start with the basics of web servers and WSGI, then gradually move to more advanced topics like ASGI and WebSockets. (this probably sounds like gibberish to you right now, but don’t worry, we will get there step by step)

This series is meant to be super beginner-friendly. Think of it as a casual Q&A, just like chatting with a friend over coffee. Yes, I’m your friend now 😇. So, grab your favorite drink, sit back, and let me put cool information in your brain.


Q: what is a web server?

  • A web server is a program like any other program running as a process on the operating system.
Info
For the sake of simplicity, let’s assume that we have a single core machine, this assumption might sound a bit strange and unnecessary right now but it will help us understand concepts in this series better.

Q: So okay, what is this web server process doing?

  • The web server process listens on a socket which is simply implemented as a file descriptor in the operating system. So you can think of it as a process waiting for data in a file, and when data comes in, it reads it from that file.

Q: Oh, so the process is just waiting for incoming requests?

  • Yes, exactly. The web server process is waiting for incoming requests on that socket, and when a request comes in, it reads the request data from the socket.

Q: Wait a minute, you said we have a single core machine, so won’t the web server process block everything else while it waits for incoming requests?

  • Good question! No, it won’t block anything else. This is where the operating system comes into play. The operating system is responsible for managing processes and their execution. When the web server process is waiting for incoming requests, the operating system can switch to another process and let it run while the web server is waiting. This is called multitasking. That’s how you can have multiple apps running on your machine at the same time, even if you have a single core CPU.

Q: Can you elaborate more on how the operating system does this multitasking thing?

  • operating systems have what’s called a scheduler, which is responsible for deciding which process to run at any given time. When the web server process is waiting for incoming requests, it goes into a SLEEP state , the scheduler can switch to another process and let it run. When a request comes in, the scheduler switches back to the web server process and lets it handle the request.

flowchart LR a((READY)) -- scheduler chooses web server process to run --> b((RUNNING)) b -- web server process wants data from socket --> c((SLEEP)) c -- data arrived and OS notifies web server process --> a
Figure: Web Server Process State Transition

Q: Ah so the web server process is simply waiting to be notified by the operating system and it’s not really consuming any CPU cycles while it’s waiting?

  • Exactly, In Operating Systems, there is a concept called “context switching”. When the web server process is waiting for incoming requests, the operating system can switch to another process and let it run. When the request comes in, the operating system switches back to the web server process and lets it handle the request. This is done by saving the state of the web server process and restoring it later when it needs to run again.

Q: Okay, so the web server process get the request finally, what does it do with it?

  • In the past, the web server process used to handle the request directly. It would read the request data, process it, and then send a response back to the client. As in the past, websites were mostly static HTML files, the web server would simply read the requested file from disk and send it back to the client.

Q: What changed?

  • People wanted their websites to be more dynamic, so they started using server-side programming languages like PHP, Python, Ruby, etc. to generate dynamic content (Web Apps!). It started as simple scripts that would read the request data, process it, and then send a response back to the client. But as web applications became more complex, this approach became difficult to manage.

Q: How did the scripts get the request data from the web server process?

  • In the past, web servers would pass request data to scripts using environment variables (for headers and metadata) and standard input (for POST data), following the CGI (Common Gateway Interface) standard. The script would then read from these sources to extract the relevant information. This approach worked fine for simple scripts, but it became difficult to manage as web applications became more complex.

Q: I don’t get it… What are environment variables? And what is standard input?

  • Environment variables are key-value pairs that are set by the operating system and can be accessed by any process running on the system. They are used to pass information. So that different processes can communicate with each other. For example, the web server process can set an environment variable called REQUEST_METHOD to indicate the HTTP method used in the request (GET, POST, etc.). The script can then read this variable to determine how to handle the request.
  • Standard input (stdin) is a way for a process to read data from the operating system. Again, you can think of it as a file that the process can read from. The web server process can write the post data to the standard input of the script, and the script can read from it to extract the relevant information.

Q: Okay, and now that the script has the request data it can generate html and responses dynamically. I get it. So why is this something is the past? I see nothing wrong with the whole CGI thing.

  • Well, CGI was a good start, but it had some limitations. For example, it was slow because the web server had to start a new process (aka the script) for each request. This meant that if you had a lot of requests coming in at the same time, the web server would have to start a lot of processes, which would consume a lot of resources and slow down the server.
  • CGI scripts received input via environment variables and standard input. While this was part of the spec, poor input sanitization in many scripts (especially early ones written in shell or C) led to security vulnerabilities
  • CGI was standardized (see RFC 3875), but different servers implemented it in slightly different ways or added their own quirks. This made it hard to write truly portable CGI scripts that would run identically on Apache, Nginx, and IIS.

Q: What did they do about it then?

  • since CGI was slow and had some limitations, people started looking for a better way to handle requests. This is where WSGI comes into play. See part 2 of this series (WIP).