TL;DR
- If you want to skip all the details in this post I’ve put together a Github repo that makes getting csv data from gmail as simple as:
from gmail import * service = get_gmail_service() # get all attachments from e-mails containing 'test' search_query = "test" service = get_gmail_service() csv_dfs = query_for_csv_attachments(service, search_query) print(csv_dfs)
- here is the repo: https://github.com/robertdavidwest/google_api
- Just follow the instructions in the
READMEand have fun and please feel free to contribute!
Heres how it works under the hood – directly using google-api-python-client and oauth2client
- Follow this link and click on the button: “ENABLE THE GMAIL API”
https://developers.google.com/gmail/api/quickstart/python
After the set up you will download a file called credentials.json
- install the needed python packages
$ pip install --upgrade google-api-python-client oauth2client
- The following code snippet will allow you to connect to your gmail account via python
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
GMAIL_CREDENTIALS_PATH = 'credentials.json' # downloaded
GMAIL_TOKEN_PATH = 'token.json' # this will be created
store = file.Storage(GMAIL_TOKEN_PATH)
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(GMAIL_CREDENTIALS_PATH, SCOPES)
creds = tools.run_flow(flow, store)
service = build('gmail', 'v1', http=creds.authorize(Http()))
- Now with this service you can read your emails and read any attachments you may have in your e-mails
- First you can query your e-mails with a search string to find the e-mail ids you need that have the attachments:
search_query = "ABCD" result = service.users().messages().list(userId='me', q=search_query).execute() msgs = results['messages') msg_ids = [msg['id'] for msg in msgs]
- now for each
messageIdyou can find the associated attachments in the email. - This part is a little messy so bare with me. First we obtain a list of “attachment parts” (and attachment filenames) from the e-mail. These are components of the email that contain attachments:
messageId = 'XYZ'
msg = service.messages().get(userId='me', id=messageId).execute()
parts = msg.get('payload').get('parts')
all_parts = []
for p in parts:
if p.get('parts'):
all_parts.extend(p.get('parts'))
else:
all_parts.append(p)
att_parts = [p for p in all_parts if p['mimeType']=='text/csv']
filenames = [p['filename'] for p in att_parts]
- Now we can obtain the attached csv from each part:
messageId = 'XYZ'
data = part['body'].get('data')
attachmentId = part['body'].get('attachmentId')
if not data:
att = service.users().messages().attachments().get(
userId='me', id=attachmentId, messageId=messageId).execute()
data = att['data']
- Now you have the csv data but its in an encoded format, so finally we change the encoding and convert the result into a pandas dataframe
import base64
import pandas as pd
from StringIO import StringIO
str_csv = base64.urlsafe_b64decode(data.encode('UTF-8'))
df = pd.read_csv(StringIO(str_csv))
- and thats it! you have a pandas dataframe with the contents of the csv attachment. You can work with this dataframe. Or you could write it to disk with
pd.DataFrame.to_csvif you simply want to download the csv. You can use the list offilenameswe obtained earlier if you want to preserve the filename