2013-04-30
Backing up Gmail messages with offlineimap
A while back I realised I had a ton of email archived on Gmail which I would be sad to lose if I lost access to my Google account or couldn't access the internet for some reason. I also wanted a backup in case I decided to migrate away from Gmail to use another service.
The approach I took was to use offlineimap to download the contents of my mail using Gmail's IMAP support. I set it up to download a few days of email at a time so I wouldn't encounter any bandwidth limiting from Google or risk getting my account temporarily suspended for aggressive use.
I chose to use 'Maildir' format for the downloaded mail so I could use notmuch locally to read and search.
The matter of dealing with Gmail folders is a bit tricky. These are exposed as IMAP folders and if you're not careful you can end up downloading emails multiple times for each folder. I didn't really want the folder structure. I just wanted all emails and I'd use the tagging mechanism of notmuch to add tags after the fact.
The secret to ignoring folders is to create a folderfilter
entry in the .offlineimaprc
file. This is a lambda function that given a folder name should return true if it's a folder you want to be downloaded by offlineimap. I use:
folderfilter = lambda foldername: foldername in ['[Gmail]/All Mail', '[Gmail]/Sent Mail']
This downloads "All Mail" and "Sent Mail". This way I get everything in my Gmail without the folder structure.
I chose to add a nametrans
entry so that the downloaded folders in the Maildir have more relevant names. nametrans
is a lambda function that, given a folder name, returns the name that should be used locally for that folder. Here I translate "All Mail" to "all" and "Sent Mail" to "sent":
nametrans = lambda foldername:
re.sub('^[Gmail]/All Mail$', 'all',
re.sub('^[Gmail]/Sent Mail$', 'sent',foldername))
To connect to Gmail the following entries are used in the remote repository section:
type = Gmail
remotehost = imap.gmail.com
realdelete=no
maxconnections=1
ssl = yes
cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042
remoteport = 993
remoteuser = ...
remotepass = ...
My local repository section is:
type = Maildir
localfolders = ~/.Mail
To prevent having to run offlineimap for a long time on the initial sync I did it over a series of days. I used the maxage
setting in the Account
section. When set mail older than this number of days is not synced. So I'd set it for 100 days, do a sync. Then I'd increase it by a 100 the next day and do another sync. Over a series of days/weeks I have all my email. Once completely synced I removed the entry from the .offlineimaprc
file. I'm not sure what the best value is and maybe it doesn't matter but this worked for me.
My .offlineimaprc then looks like:
[general]
accounts = gmail
ui = TTY.TTYUI
[Account gmail]
localrepository = gmailLocal
remoterepository = gmailRemote
maxage = 1000
[Repository gmailLocal]
type = Maildir
localfolders = ~/.Mail
[Repository gmailRemote]
type = Gmail
remotehost = imap.gmail.com
realdelete=no
maxconnections=1
ssl = yes
cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042
remoteport = 993
remoteuser = ...
remotepass = ...
nametrans = ...show above...
folderfilter = ...show above...
I used notmuch
to process and search the Maildir
locally. By setting synchronize_flags=true
in my .notmuch-config
file I could read the offline email in notmuch, incrementally sync with offlineimap, and the 'read', 'replied', etc flags are synchronized between them.
To tag with notmuch
I run a script after syncing with .offlineimap
that tags based on certain criteria. Something like:
offlineimap
notmuch new
notmuch tag +sent -- folder:sent and not tag:sent
notmuch tag +bugzilla -inbox -- tag:inbox and from:bugzilla-daemon@mozilla.org
notmuch tag +ats -inbox -- tag:inbox and to:ats-lang-users
...etc...