Notes on algebra, probability theory, and linear algebra
A binary operation on a set $S$ is a rule
\(*:S\times S\to S,\) which takes two elements of $S$ and returns another element of $S$.
The operation is associative if
\((a*b)*c=a*(b*c)\) for all $a,b,c\in S$.
A set with one associative binary operation is called a semigroup.
Associativity is the axiom that turns a binary rule into a coherent rule for combining any finite non-empty sequence of elements.
Without associativity, a binary operation only tells us how to combine two objects. It does not tell us how to combine three without extra data.
For three elements there are two possible bracketings:
\((a*b)*c, \qquad a*(b*c).\) If they are not equal, then the expression $abc$ is ambiguous.
So the real meaning of associativity is:
Associativity says that the result of combining a finite sequence depends on the order of the elements, but not on the way we insert parentheses.
This is not merely notational convenience. It is a coherence condition.
If the operation is associative, then expressions such as
\(a_1a_2\cdots a_n\) make sense without specifying parentheses.
Without associativity, the expression is incomplete: it needs a bracketing tree.
Associativity lets us define
\(a^n=\underbrace{a*a*\cdots *a}_{n\text{ times}}\) unambiguously.
Without associativity, even $a^3$ could mean either
\((a*a)*a \quad\text{or}\quad a*(a*a).\)
Associativity makes it possible to speak about elements generated by finite products of chosen generators.
This is fundamental in semigroups, monoids, groups, rings, categories, formal languages, and many other areas.
The most important source of associativity is composition.
If $f,g,h$ are functions, then
\((f\circ g)\circ h=f\circ(g\circ h).\) Both sides mean: first do $h$, then $g$, then $f$. The parentheses only describe how we mentally group the calculation, not the underlying process.
This is why associativity appears everywhere actions are composed sequentially.
Without associativity, expressions are no longer sequences. They are sequences plus parentheses.
For four elements there are several bracketings:
\(((ab)c)d, \quad (ab)(cd), \quad (a(bc))d, \quad a((bc)d), \quad a(b(cd)).\) In a non-associative structure these may all be different.
So non-associativity creates a more tree-like world. That world is important, but it is more complex and usually requires other identities to keep it manageable.
Functions $X\to X$ under composition form an associative structure.
Matrix multiplication is associative:
\((AB)C=A(BC).\) It is generally not commutative, but it is associative.
For strings,
("ab" + "c") + "de" = "ab" + ("c" + "de") = "abcde".
Concatenation is associative because grouping substrings does not change the final string.
Both are associative:
\((a+b)+c=a+(b+c), \qquad (ab)c=a(bc).\)
Not all algebra is associative. Important non-associative structures include:
But in those theories, associativity is usually replaced by some other controlling identity, such as the Jacobi identity for Lie algebras.
Associativity is the first major algebraic threshold:
A binary operation becomes a law of finite composition.
From here we can add more structure: