PSA: Your ProtonMail backups might not be safe

5 Jun 2022 12:16 | email

I was a fan of the ProtonMail email service until I was casually linked to this
issue while discussing something else:

https://github.com/ProtonMail/proton-bridge/issues/220

TL;DR message UIDs returned by proton-bridge are unstable and subject to change
*without* UIDVALIDITY changing. This is not only bad because it violates the RFC
but also because it can lead to data loss in at least a couple of scenarios:

1) Incremental backups that match on message UID, such as offlineimap, will
eventually end up with local duplicates of some messages and other messages
never downloaded because the UIDs can't be relied on. Although not technically
direct loss of data, you might assume your backups are complete and try to
migrate to another service and only later realise some data was missing.

2) An IMAP client communicating with the bridge could be asked to delete a
message. If the UIDs are changing underneath the message that gets deleted could
be the wrong one. There is at least one report of this happening to someone on
the github thread.

The ProtonMail team have apparently decided that completely rewriting the
backend is the solution to this issue which was reported as far back as
September last year. I don't agree with this strategy but it is what it is,
however a much more fundamental failing here is that they haven't notified any
users that the bridge is unsafe. I think this is really terrible as many people
might be relying on IMAP backup tools or clients and not even realise they could
be losing data.

Additionally during the course of investigating this problem I discovered
another problem with the service - orphaned messages. It's not entirely clear to
me all of the ways that this can occur but somehow messages can become orphaned
from any folder. They exist in your account and appear under the "All Mail"
folder but aren't actually in any of the real folders returned over the IMAP
interface. I discovered I had 23,793 messages in this state that were
consequently not being backed up by my backup system as I had wrongly assumed
that everything would at least be in Archive or another folder by default.

Apparently this is a longstanding issue that they've neglected to fix for quite
some time.

So what to do? I considered just migrating my email to another service and I may
yet still do that however before taking such drastic action I wanted to try to
fix the issues at hand and configure sane backups that work properly.

The first thing to do was to fix the orphaned messages and set up some kind of
monitor so I'd know if it happened again. Fixing this was a bit convoluted,
first I needed a way to *reliably* retrieve all of the messages via the IMAP
interface exposed by the bridge. Matching on headers seemed to be the sensible
way to do this and the ProtonMail system convenient uses an X-Pm-Internal-Id
header for every message which seems to be globally unique. Unfortunately
offlineimap doesn't support matching on headers so I rolled a local dovecot
instance and switched to imapsync - https://github.com/imapsync/imapsync.

Using imapsync I was then able to reliably retrieve all of my messages that were
in folders and also the "All Mail" folder which contained absolutely everything,
including the orphaned messages. Then I wrote a quick bit of python to scan the
X-Pm-Internal-Id header for all of the messages in order to determine which ones
were orphaned. So at this point I had identified 23793 orphaned messages, 15000
of which were messages that I had actually thought were deleted - more on this
below.

Then I had to figure out how best to fix them. My first thought was contact
ProtonMail support so I did that, but then thought about it some more and I
wondered what would happen if I were to drop one of the orphaned messages into
my local Archive folder and then sync it back into ProtonMail via the bridge
using imapsync. My theory was that it would match the message on the
X-Pm-Internal-Id header and assign it to the Archive folder without duplicating
it. This worked and I was then able to do the same for the rest of the messages.

After this has been completed I ran imapsync with the --justfoldersizes option
and was able to verify that the message count in the All Mail folder was now
equal to the sum of messages in the other folders, which means there are now no
orphans.

So why did I have 15000 orphaned messages that should have been deleted? Well
there's something funny about the way the IMAP protocol is implemented on the
bridge, the normal method of deleting a message is to set the \\Deleted flag on
the message and then EXPUNGE. However with ProtonMail's bridge IMAP interface
this merely removes the message from any folders, it's still there in All Mail.
I thought perhaps moving it to the Trash folder before deleting it might work
but the result was the same. I'm waiting on ProtonMail support to provide an
answer as to how to permanently delete messages via the IMAP interface.

So beware if you're currently backing up your ProtonMail account with tools that
don't match on headers, you might be at risk of losing data.

This is the full imapsync command I'm using:

imapsync --host1 localhost --port1 1143 --user1 user1 --passfile1 ~/.pass1 \
    --host2 localhost --port2 143 --user2 user2 --passfile2 ~/.pass2 \
    --useheader X-Pm-Internal-Id --delete2 --folder Archive --folder Drafts \
    --folder INBOX --folder Sent --folder Spam --folder Trash --include \
    Folders.* --noemailreport1 --noemailreport2