Multiplexing is a method in HTTP/2 by which multiple HTTP requests can be sent and responses can be received asynchronously via a single TCP connection. Multiplexing is the heart of HTTP/2 protocol.
Frames and Streams
HTTP/2 is a binary protocol. Every HTTP/2 request and response is given a unique id called as stream id and an HTTP request & response is divided into frames. Frames are binary pieces of data. Stream id is used to identify to which request or response a frame belongs to. A stream is a collection of frames with a same stream id.
To make an HTTP request, first the client divides the request into binary frames and assigns the stream ID of the request to the frames. Then it initiates a TCP connection with the server. And then the client starts sending the frames to the server. Once the server has the response ready it divides the response into frames and gives the response frames the same response stream id. Server sends the response in frames.
Stream ID is necessary because multiple requests to an origin is made using a single TCP connection so stream ID makes it possible to identify to which request or response a frame belongs to.
Multiple HTTP Requests
A single TCP connection can be used to make HTTP requests to a single origin only. For multiple origins multiple TCP connections are required.
Once a TCP connection is established all the requests for that origin is done via that TCP connection. Multiple HTTP/2 requests are divided into frames and assigned their respective stream ids. All the frames from multiple streams are sent asynchronously. And the server also sends responses asynchronously. Therefore if one response is taking too long then other’s don’t have to wait for it to finish. The client receives the frames and arranges them according to their stream id.
The request and response both happen parallelly i.e., while client is sending frames the server is also sending frames back to the client.
The above snapshot shows how requests and responses happen parallelly. It also shows how multiple requests/responses are split into individual frames and are send one by one asynchronously.
When the requests for a particular origin are done the TCP connection is kept open for sometime before closing.
Prioritization
A stream(or request) can have a priority value. Using the priority value server decides how much memory, CPU time and bandwidth needs to be given to a request. By default server sends the frames of multiple requests asynchronously without any order and process them parallely. But priority can force server to send frames of a response before other responses.
Settings Frame
Flow control is a method of controlling the rate of data transmission between client and server. Client and server decide the transmission rate settings once connection is established and before actual data transfer begins.
TCP flow control and HTTP/2 flow controls are two different things. TCP flow control decides settings for the whole connection whereas HTTP/2 flow control decides settings for the individual streams and also for the whole connection too.
When the HTTP/2 TCP connection is established, the client and server exchange SETTINGS frames first, which indicates how many streams can be open at a time(or how many parallel requests), how many bytes server is ready to receive for a stream and for the whole connection, and the rate at which data can be delivered or received i.e., window size.