I recently sent an e-mail to the MLS working group with some comments on the draft. Because there appears to be little interest in improving the draft I want to use this post to elaborate on the issues I see with the draft an the working group.
This is an opinion piece.
MLS
Let’s first recap on MLS. According to messaginglayersecurity.rocks the Messaging Layer Security (MLS)
is a security layer for encrypting messages in groups of size two to many.
It can be used to solve the issue of end-to-end encrypted (e2ee) group messaging There are mainly two different approaches to e2ee group messaging. Use pairwise one-to-one connections. This is obviously very inefficient for larger groups. Anyone who’s used Signal or Wire with larger groups will have noticed this. The other option is to use a single key for the entire group, which obviously comes with a number of security issues.
MLS is supposed to solve the issue by providing a way to negotiate keys with a large number of other clients that can be used for e2ee group messaging.
The Working Group
MLS is driven by the IETF MLS working group (WG).
As all IETF WGs communication should be done on the mailing list.
State of the draft
While trying to implement the MLS protocol from scratch I ran into a couple issues and I think there’s a general problem regarding the structure of the draft. Note that these are all editorial issues.
Right now I see no straightforward way to implement the spec as there’s no clear structure. To implement even the basic tree structure it’s necessary to jump through the entire document. There’s also a lot of prose and examples and very little implementable parts.
Let me give some examples:
Section 5 (Ratchet Trees) reads like a non-normative section (or paper). The really interesting bit related to trees is in Appendix A. I suggest moving the formal description of the tree from Appendix A to this section. It’s impossible to implement this from the hand full examples here.
There’s no description in here on how the tree actually works. For example, adding a node to the tree is defined in 10.1.1 (Add).
The ratchet tree nodes as described in Section 5 don’t hold a KeyPackage. But Section 7 says “As the KeyPackage is a structure which is stored in the Ratchet Tree and updated depending on the evolution of this tree”.
The tree nodes have a parent hash field, which is of course part of the ratchet tree but shouldn’t be necessary for the description of the tree structure.
Also, the description of the tree hash (Section 7.5) follows the definition of the parent_hash extension in Section 7.4 such that it is not entirely clear which parent_hash is used there (extension or node value).
Section 5 also talks about commit messages without linking to them or introducing them. Having to talk about commit messages in this section is also a little confusing.
The term “handshake message” is first used in Section 7.8 (Key Schedule) but never introduced. This should probably happen in 10 (Group Evolution), which doesn’t talk about handshake messages at all until 10.3 (Ratchet Tree Extension).
There are probably plenty more issues like this. But I don’t think addressing any of these issues individually makes sense. I therefore suggest having an overhaul of the document structure.
The way I see MLS there are a set of high-level structures and components that should be clearly described and built upon each other. There’s a data structure (tree) that is used to represent a group. Then there is the MLS group that implements a specific semantic on the tree. Operations to modify the group are specified in the handshake protocol. And application messages define messages encrypted with keys derived in the handshake protocol with the key schedule.
The tree is a basic left-balanced binary tree and defines functions to create a tree, and add and remove nodes.
A group is the main part of MLS, a tree with nodes of a certain type. So this part should describe what nodes look like and how the tree is used to compute tree invariants and keys.
The handshake protocol defines all messages needed to perform operations on the MLS group.
This is certainly not exhaustive and would need further refinement. But it’s the high-level structure I’d expect from the document. I’m sure other structures would work as well. But what we currently have isn’t what is needed to implement MLS imho.