A while back I realised I had a ton of email archived on Gmail which I would be sad to lose if I lost access to my Google account or couldn't access the internet for some reason. I also wanted a backup in case I decided to migrate away from Gmail to use another service.
The approach I took was to use offlineimap to download the contents of my mail using Gmail's IMAP support. I set it up to download a few days of email at a time so I wouldn't encounter any bandwidth limiting from Google or risk getting my account temporarily suspended for aggressive use.
I chose to use 'Maildir' format for the downloaded mail so I could use notmuch locally to read and search.
The matter of dealing with Gmail folders is a bit tricky. These are exposed as IMAP folders and if you're not careful you can end up downloading emails multiple times for each folder. I didn't really want the folder structure. I just wanted all emails and I'd use the tagging mechanism of notmuch to add tags after the fact.
The secret to ignoring folders is to create a
folderfilter entry in the
.offlineimaprc file. This is a lambda function that given a folder name should return true if it's a folder you want to be downloaded by offlineimap. I use:
folderfilter = lambda foldername: foldername in ['[Gmail]/All Mail', '[Gmail]/Sent Mail']
This downloads "All Mail" and "Sent Mail". This way I get everything in my Gmail without the folder structure.
I chose to add a
nametrans entry so that the downloaded folders in the Maildir have more relevant names.
nametrans is a lambda function that, given a folder name, returns the name that should be used locally for that folder. Here I translate "All Mail" to "all" and "Sent Mail" to "sent":
nametrans = lambda foldername: re.sub('^[Gmail]/All Mail$', 'all', re.sub('^[Gmail]/Sent Mail$', 'sent',foldername))
To connect to Gmail the following entries are used in the remote repository section:
type = Gmail remotehost = imap.gmail.com realdelete=no maxconnections=1 ssl = yes cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042 remoteport = 993 remoteuser = ... remotepass = ...
My local repository section is:
type = Maildir localfolders = ~/.Mail
To prevent having to run offlineimap for a long time on the initial sync I did it over a series of days. I used the
maxage setting in the
Account section. When set mail older than this number of days is not synced. So I'd set it for 100 days, do a sync. Then I'd increase it by a 100 the next day and do another sync. Over a series of days/weeks I have all my email. Once completely synced I removed the entry from the
.offlineimaprc file. I'm not sure what the best value is and maybe it doesn't matter but this worked for me.
My .offlineimaprc then looks like:
[general] accounts = gmail ui = TTY.TTYUI [Account gmail] localrepository = gmailLocal remoterepository = gmailRemote maxage = 1000 [Repository gmailLocal] type = Maildir localfolders = ~/.Mail [Repository gmailRemote] type = Gmail remotehost = imap.gmail.com realdelete=no maxconnections=1 ssl = yes cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042 remoteport = 993 remoteuser = ... remotepass = ... nametrans = ...show above... folderfilter = ...show above...
notmuch to process and search the
Maildir locally. By setting
synchronize_flags=true in my
.notmuch-config file I could read the offline email in notmuch, incrementally sync with offlineimap, and the 'read', 'replied', etc flags are synchronized between them.
To tag with
notmuch I run a script after syncing with
.offlineimap that tags based on certain criteria. Something like:
offlineimap notmuch new notmuch tag +sent -- folder:sent and not tag:sent notmuch tag +bugzilla -inbox -- tag:inbox and from:email@example.com notmuch tag +ats -inbox -- tag:inbox and to:ats-lang-users ...etc...