【问题标题】:Automating Date Range while extracting提取时自动化日期范围
【发布时间】:2017-07-17 15:32:28
【问题描述】:

我使用以下脚本从 Google Analytics 中提取数据。在这里,我正在提取最后一周的数据。我想自动化日期范围,这样我就不必每周更改 date_range。 我还想避免 GA 对数据进行采样。请详细指导我正确的自动化方法。

作者 = 'test@gmail.com(测试)'

import argparse
import sys
import csv
import string
import datetime
import json
import time

from apiclient.errors import HttpError
from apiclient import sample_tools
from oauth2client.client import AccessTokenRefreshError

cam_name = sys.argv[1:]

class SampledDataError(Exception): pass

def main(argv):
  # Authenticate and construct service.
  service, flags = sample_tools.init(
      argv[0], 'analytics', 'v3', __doc__, __file__,
      scope='https://www.googleapis.com/analytics.readonly')

  # Try to make a request to the API. Print the results or handle errors.
  try:
    profile_id = profile_ids[profile]
    if not profile_id:
      print ('Could not find a valid profile for this user.')
    else:      
      metrics = argv[1]
      dimensions = argv[2]
      reportName = argv[3]
      sort = argv[4]
      filters = argv[5]

      for start_date, end_date in date_ranges:
        limit = ga_query(service, profile_id, 0,
                                 start_date, end_date, metrics, dimensions, sort, filters).get('totalResults')
        for pag_index in range(0, limit, 10000):
          results = ga_query(service, profile_id, pag_index,
                                     start_date, end_date, metrics, dimensions, sort, filters)
          # if results.get('containsSampledData'):

            # raise SampledDataError
          print_results(results, pag_index, start_date, end_date, reportName)

  except TypeError as error:    
    # Handle errors in constructing a query.
    print ('There was an error in constructing your query : %s' % error)

  except HttpError as error:
    # Handle API errors.
    print ('Arg, there was an API error : %s : %s' %
           (error.resp.status, error._get_reason()))

  except AccessTokenRefreshError:
    # Handle Auth errors.
    print ('The credentials have been revoked or expired, please re-run '
           'the application to re-authorize')

  except SampledDataError:
    # force an error if ever a query returns data that is sampled!
    print ('Error: Query contains sampled data!')


def ga_query(service, profile_id, pag_index, start_date, end_date, metrics, dimensions, sort, filters):

   return service.data().ga().get(
      ids='ga:' + profile_id,
      start_date=start_date,
      end_date=end_date,
      metrics=metrics,
      dimensions=dimensions,
      sort=sort,
      filters=filters,
      samplingLevel='HIGHER_PRECISION',
      start_index=str(pag_index+1),
      max_results=str(pag_index+10000)).execute()


def print_results(results, pag_index, start_date, end_date, reportName):
  """Prints out the results.

  This prints out the profile name, the column headers, and all the rows of
  data.

  Args:
    results: The response returned from the Core Reporting API.
  """

  # New write header
  if pag_index == 0:
    if (start_date, end_date) == date_ranges[0]:
      print  ('Profile Name: %s' % results.get('profileInfo').get('profileName'))
      columnHeaders = results.get('columnHeaders')
      cleanHeaders = [str(h['name']) for h in columnHeaders]
      writer.writerow(cleanHeaders)
    print (reportName,'Now pulling data from %s to %s.' %(start_date, end_date))


  # Print data table.
  if results.get('rows', []):
    for row in results.get('rows'):
      for i in range(len(row)):
        old, new = row[i], str()
        for s in old:
          new += s if s in string.printable else ''
        row[i] = new
      writer.writerow(row)

  else:
    print ('No Rows Found')

  limit = results.get('totalResults')
  print (pag_index, 'of about', int(round(limit, -4)), 'rows.')
  return None

# Uncomment this line & replace with 'profile name': 'id' to query a single profile
# Delete or comment out this line to loop over multiple profiles.

#Brands

profile_ids = {'abc-Mobile': '12345',
                'abc-Desktop': '23456',
                'pqr-Mobile': '34567',
                'pqr-Desktop': '45678',
                'xyz-Mobile': '56789',
                'xyz-Desktop': '67890'}

date_ranges = [
('2017-01-24','2017-01-24'),
('2017-01-25','2017-01-25'),
('2017-01-26','2017-01-26'),
('2017-01-27','2017-01-27'),
('2017-01-28','2017-01-28'),
('2017-01-29','2017-01-29'),
('2017-01-30','2017-01-30')
]

for profile in sorted(profile_ids):
  print("Sequence 1",profile)
  with open('qwerty.json') as json_data:
    d = json.load(json_data)
    for getThisReport in d["Reports"]:
      print("Sequence 2",getThisReport["ReportName"])
      reportName = getThisReport["ReportName"]
      metrics = getThisReport["Metrics"]
      dimensions = getThisReport["Dimensions"]
      sort = getThisReport["sort"]
      filters = getThisReport["filter"]

      path = 'C:\\Projects\\DataExport\\test\\' #replace with path to your folder where csv file with data will be written

      today = time.strftime('%Y%m%d')

      filename = profile+'_'+reportName+'_'+today+'.csv' #replace with your filename. Note %s is a placeholder variable and the profile name you specified on row 162 will be written here
      with open(path + filename, 'wt') as f:
        writer = csv.writer(f,delimiter = '|', lineterminator='\n', quoting=csv.QUOTE_MINIMAL)
        args = [sys.argv,metrics,dimensions,reportName,sort,filters]
        if __name__ == '__main__': main(args)
      print ( "Profile done. Next profile...")

print ("All profiles done.")

【问题讨论】:

    标签: python-2.7 google-analytics google-api google-analytics-api google-api-python-client


    【解决方案1】:

    就日期而言,Core Reporting API 支持一些有趣的事情。

    所有 Analytics 数据请求都必须指定日期范围。如果请求中没有包含开始日期和结束日期参数,服务器会返回错误。日期值可以使用模式 YYYY-MM-DD 来表示特定日期,也可以使用今天、昨天或 NdaysAgo 模式来表示相对日期。值必须匹配 [0-9]{4}-[0-9]{2}-[0-9]{2}|today|yesterday|[0-9]+(daysAgo)。

    所以做类似的事情

    start_date = '7daysAgo' 
    end_date   = 'today'
    

    请记住,数据在 24 到 48 小时内尚未完成处理,因此您今天、昨天和前一天的数据可能不是 100% 准确的。

    【讨论】:

    • 我不明白你所说的 Last & days 是什么意思。根据您帐户中的数据量和请求将返回的数据量,无法完全避免对其进行抽样。如果您仅使用 7 天的数据进行抽样,则您必须拥有一个非常大的帐户。
    • 如果我像上面提到的那样输入 start_date 和 end_date,GA 将对数据进行采样。我需要从昨天到过去 7 天的数据,是否可以将 date_range 设置如下: date_ranges = [ ('yesterday','yesterday'), ('2daysAgo','2daysAgo'), ('3daysAgo',' 3daysAgo'), ('4daysAgo','4daysAgo'), ('5daysAgo','5daysAgo'), ('6daysAgo','6daysAgo'), ('7daysAgo','7daysAgo')]
    • 您可以随心所欲地进行操作,但请记住,每一项都会影响您的配额。但是,您的建议是我目前为一些大客户使用的。
    • 注意:如果您切换到 Google Analytics API v4,则会返回一个名为 isgolden 的字段,该字段会告诉您请求中的数据是否已完成处理。
    • 那么当您像我上面提到的那样放置 date_range(手动 date_range 和自动 date_range )时,您遇到的数据有什么不同吗?
    猜你喜欢
    • 1970-01-01
    • 2019-01-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-07-29
    • 1970-01-01
    • 2018-10-29
    • 1970-01-01
    相关资源
    最近更新 更多