PSA: your ProtonMail backups might not be safe
5 Jun 2022 12:16 | email
I was a fan of the ProtonMail email service until I was casually linked to this issue while discussing something else:
TL;DR message UIDs returned by proton-bridge are unstable and subject to change without UIDVALIDITY changing. This is not only bad because it violates the RFC but also because it can lead to data loss in at least a couple of scenarios:
1) Incremental backups that match on message UID, such as offlineimap, will eventually end up with local duplicates of some messages and other messages never downloaded because the UIDs can’t be relied on. Although not technically direct loss of data, you might assume your backups are complete and try to migrate to another service and only later realise some data was missing.
2) An IMAP client communicating with the bridge could be asked to delete a message. If the UIDs are changing underneath the message that gets deleted could be the wrong one. There is at least one report of this happening to someone on the github thread.
The ProtonMail team have apparently decided that completely rewriting the backend is the solution to this issue which was reported as far back as September last year. I don’t agree with this strategy but it is what it is, however a much more fundamental failing here is that they haven’t notified any users that the bridge is unsafe. I think this is really terrible as many people might be relying on IMAP backup tools or clients and not even realise they could be losing data.
Additionally during the course of investigating this problem I discovered another problem with the service - orphaned messages. It’s not entirely clear to me all of the ways that this can occur but somehow messages can become orphaned from any folder. They exist in your account and appear under the “All Mail” folder but aren’t actually in any of the real folders returned over the IMAP interface. I discovered I had 23,793 messages in this state that were consequently not being backed up by my backup system as I had wrongly assumed that everything would at least be in Archive or another folder by default.
Apparently this is a longstanding issue that they’ve neglected to fix for quite some time.
So what to do? I considered just migrating my email to another service and I may yet still do that however before taking such drastic action I wanted to try to fix the issues at hand and configure sane backups that work properly.
The first thing to do was to fix the orphaned messages and set up some kind of monitor so I’d know if it happened again. Fixing this was a bit convoluted, first I needed a way to reliably retrieve all of the messages via the IMAP interface exposed by the bridge. Matching on headers seemed to be the sensible way to do this and the ProtonMail system convenient uses an X-Pm-Internal-Id header for every message which seems to be globally unique. Unfortunately offlineimap doesn’t support matching on headers so I rolled a local dovecot instance and switched to imapsync - imapsync.
Using imapsync I was then able to reliably retrieve all of my messages that were in folders and also the “All Mail” folder which contained absolutely everything, including the orphaned messages. Then I wrote a quick bit of python to scan the X-Pm-Internal-Id header for all of the messages in order to determine which ones were orphaned. So at this point I had identified 23793 orphaned messages, 15000 of which were messages that I had actually thought were deleted - more on this below.
Then I had to figure out how best to fix them. My first thought was contact ProtonMail support so I did that, but then thought about it some more and I wondered what would happen if I were to drop one of the orphaned messages into my local Archive folder and then sync it back into ProtonMail via the bridge using imapsync. My theory was that it would match the message on the X-Pm-Internal-Id header and assign it to the Archive folder without duplicating it. This worked and I was then able to do the same for the rest of the messages.
After this has been completed I ran imapsync with the –justfoldersizes option and was able to verify that the message count in the All Mail folder was now equal to the sum of messages in the other folders, which means there are now no orphans.
So why did I have 15000 orphaned messages that should have been deleted? Well there’s something funny about the way the IMAP protocol is implemented on the bridge, the normal method of deleting a message is to set the \Deleted flag on the message and then EXPUNGE. However with ProtonMail’s bridge IMAP interface this merely removes the message from any folders, it’s still there in All Mail. I thought perhaps moving it to the Trash folder before deleting it might work but the result was the same. I’m waiting on ProtonMail support to provide an answer as to how to permanently delete messages via the IMAP interface.
So beware if you’re currently backing up your ProtonMail account with tools that don’t match on headers, you might be at risk of losing data.
This is the full imapsync command I’m using:
imapsync --host1 localhost --port1 1143 --user1 user1 --passfile1 ~/.pass1 \
--host2 localhost --port2 143 --user2 user2 --passfile2 ~/.pass2 \
--useheader X-Pm-Internal-Id --delete2 --folder Archive --folder Drafts \
--folder INBOX --folder Sent --folder Spam --folder Trash --include \
Folders.* --noemailreport1 --noemailreport2
