Getting csv files from your gmail

TL;DR

  • If you want to skip all the details in this post I’ve put together a Github repo that makes getting csv data from gmail as simple as:
from gmail import *
service = get_gmail_service()

# get all attachments from e-mails containing 'test'
search_query = "test"
service = get_gmail_service()
csv_dfs = query_for_csv_attachments(service, search_query)
print(csv_dfs)

Heres how it works under the hood – directly using google-api-python-client and oauth2client

  • Follow this link and click on the button: “ENABLE THE GMAIL API”

https://developers.google.com/gmail/api/quickstart/python

After the set up you will download a file called credentials.json

  • install the needed python packages

$ pip install --upgrade google-api-python-client oauth2client

  • The following code snippet will allow you to connect to your gmail account via python
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools

GMAIL_CREDENTIALS_PATH = 'credentials.json' # downloaded
GMAIL_TOKEN_PATH = 'token.json' # this will be created

store = file.Storage(GMAIL_TOKEN_PATH)
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets(GMAIL_CREDENTIALS_PATH, SCOPES)
    creds = tools.run_flow(flow, store)
service = build('gmail', 'v1', http=creds.authorize(Http()))

  • Now with this service you can read your emails and read any attachments you may have in your e-mails
  • First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments:
search_query = "ABCD"
result = service.users().messages().list(userId='me', q=search_query).execute()
msgs = results['messages')
msg_ids = [msg['id'] for msg in msgs]
  • now for each messageId you can find the associated attachments in the email.
  • This part is a little messy so bare with me. First we obtain a list of “attachment parts” (and attachment filenames) from the e-mail. These are components of the email that contain attachments:
messageId = 'XYZ'
msg = service.messages().get(userId='me', id=messageId).execute()
parts = msg.get('payload').get('parts')
all_parts = []
for p in parts:
    if p.get('parts'):
        all_parts.extend(p.get('parts'))
    else:
        all_parts.append(p)

att_parts = [p for p in all_parts if p['mimeType']=='text/csv']
filenames = [p['filename'] for p in att_parts]
  • Now we can obtain the attached csv from each part:
messageId = 'XYZ'
data = part['body'].get('data')
attachmentId = part['body'].get('attachmentId')
if not data:
    att = service.users().messages().attachments().get(
    userId='me', id=attachmentId, messageId=messageId).execute()
    data = att['data']
  • Now you have the csv data but its in an encoded format, so finally we change the encoding and convert the result into a pandas dataframe
import base64
import pandas as pd
from StringIO import StringIO
str_csv = base64.urlsafe_b64decode(data.encode('UTF-8'))
df = pd.read_csv(StringIO(str_csv))
  • and thats it! you have a pandas dataframe with the contents of the csv attachment. You can work with this dataframe. Or you could write it to disk with pd.DataFrame.to_csv if you simply want to download the csv. You can use the list of filenames we obtained earlier if you want to preserve the filename

Leave a comment