While generally the available documentation on how the OpenGL matrices work is quite good, there are some missing bits. Although not necessary for your everyday rendering, they give one some insight on how rasterization in general and OpenGL in special works.

W coordinate after perspective divide

After conversion to normalized device coordinates(ndc) by applying a matrix like

\begin{bmatrix}A & 0 & B & 0\\ 0 & C & D & 0\\ 0 & 0 & E & F\\ 0& 0 & -1 & 0\end{bmatrix} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix}

one might think that projection is applied on each vertex looks like

\vec{v}_{ndc} = \frac{1}{w} \begin{pmatrix} x \\ y \\ z \\ w \end{pmatrix} = \begin{pmatrix} \tfrac{x}{w} \\ \tfrac{y}{w} \\ \tfrac{z}{w} \\ 1 \end{pmatrix}

however it actually looks like

\vec{v}_{ndc} = \begin{pmatrix} \tfrac{x}{w} \\ \tfrac{y}{w} \\ \tfrac{z}{w} \\ \tfrac{1}{w} \end{pmatrix}

the $w$ coordinate is not divided by itself, but is inverted instead. This is done because the interpolation between vertices still needs to take place and for perspective correct interpolation one needs the camera space depth $z = -w_{cam}$ .

\begin{aligned} \vec{v}_{\alpha} &= \frac{(1-\alpha) \tfrac{\vec{v}_0}{-z_0} + \alpha \tfrac{\vec{v}_1}{-z_1}}{(1 - \alpha)\tfrac{1}{-z_0} + \alpha \tfrac{1}{-z_1}} \\[1.5em] &= \frac{(1-\alpha)\vec{v}_0 w_{0_{ndc}} + \alpha\vec{v}_1 w_{1_{ndc}}}{(1-\alpha) w_{0_{ndc}} + \alpha w_{1_{ndc}}} \end{aligned}

instead of dividing by $-z$ we can multiply with $w_{ndc}$ as multiplication is faster than division.

Note that for brevity the given formula assumes a scanline based rasterizer as it interpolates only between two vertices. The general approach is to use barycentric coordinates to interpolate between all three vertices simultaneously.

Row major or column major

Even though even Wikipedia says OpenGL is column major, it is actually storage agnostic. However by default it interprets your 16 element array as:

\begin{bmatrix}m_0 & m_4 & m_8 & m_{12}\\ m_1 & m_5& m_9 & m_{13}\\ m_2 & m_6 & m_{10} & m_{14}\\ m_3 & m_7 & m_{11} & m_{15}\end{bmatrix}

Yet most OpenGL functions dealing with matrices offer a transpose parameter which you can use to specify the used order. For a comparison of storage orders see the Eigen documentation.

Notably however, GLSL matrices do neither follow C nor mathematical notation; the mat2x4 M type has 2 columns and 4 rows and thus $M \in \mathbb{R}^{4 \times 2}$ mathematically.
Consequently though – albeit breaking with the C notation – M[0] will return the first column (vec4).

Now, if you use the transpose parameter mentioned before, prepare to think hard about the data you are actually getting in the Shader.