Using Google analytics as a database for api usage statistics



This is one of a series of articles about Going serverless with Firebase. You many want to read about Firebase cloud functions and Custom domains and ssl with Firebase hosting before this article.

The dashboard

Ephemeral Exchange provides a history of api usage through it's console. For now it's of academic instance, but if it were a chargeable service, you'd need some evidence to base those charges on. The previous architecture used Redis to store such historical data, since the overhead would be small with such an in memory cache. Using Firebase cloud functions introduces a new problem as these are stateless and the overhead of recording to a database that a transaction had happened could be more than the effort of doing and storing the transaction. 

The dashboard shows this kind of info for an account

along with a breakdown by the access key that generated the activity
Over time that can be quite a set of data, so I started to think about whether google analytics could be used as a free way to store this. Some advantages are
  • it's free
  • it's very fast with minimal latency
  • it has built in a real time reports for management and troubleshooting
In the end I used the Analytics measurement protocol to record events - where an event was any kind of api access. That means I can look at real time analytics like this, where I can even see the exact operation that's being performed and how busy the API is

Or I can use the regular analytics reports to get more detailed stats such as usage by account code

Custom definitions

This all becomes possible through the use of custom definitions, which you can find in the settings page of your analytics property.
I've set up these custom dimensions

and this custom metric

Hitting analytics

The measurement protocol  allows you send messages to Analytics with some standard parameters that are going to be used in canned Analytics reports, but also some custom values that you can access later in custom reports, or in my case, using the analytics reporting API. These custom dimensions and metrics are defined by their index number, mapping back to those defined earlier in the custom definition section.

The efx API Cloud function uses these definitions to map back to those index numbers.
  // map of custom dimensions
  const cd= {
    account:2,
    key:1,
    method:3,
    operation:4,
    status:5,
    sampleFloor:6
  };
  const cm = {
    size:1
  };

Requests to the measurement protocol are fired like this
  ns.hit =  (options) => {

    // now hit analytics - only one, but use batch mode anyway
    ns.batch().hit (options).commit();

  };


The batch function returns an object that can be used to send multiple requests in one go
  ns.batch = () => {

    const cloth = {
      batch_:null
    };
      
    cloth.hit = (options) => {
      cloth.batch_ = cloth.batch_ || [];
      cloth.batch_.push (ns.hitParams(options));
      return cloth;
    };
      
    cloth.clear= () => {
      cloth.batch_ = null;
      return cloth;
    };
      
    cloth.commit = ()=> {
      if (!cloth.batch_) return Promise.reject("no data in batch");
      
      return post_( cloth.batch_.map (d=> ns.joinParams (d)).join("\n"))
        .then (result=>{
          cloth.batch_ = null;
          return result;
        });
      };
      
    return cloth;
  };

and a utility to convert an object to url parameters
ns.joinParams = (params) => Object.keys(params).map(d=>d + '=' + params[d]).join("&");

Parameters are set according to the measurement protocol. Appropriate fields are sent using the standard parameters (remember that no personally identifiable data can be sent to Analytics), and the custom dimensions are used to send additional stats about API usage, which I'll retrieve later. 
ns.hitParams = (options) => {


    // many of those arguments will be missing , so fill them in
    const key = options.key  || "nokey";
    const accountId = options.accountId || "anon";
    const method = options.method || "get";
    const size = options.size || 0;
    const status = options.status || 0; // this'll mean unkown
    const action = options.action || "unknown";
    const eventDate = options.eventDate || new Date().getTime();
    
    const sampleFloor = Math.floor (eventDate/secrets.analytics.floorWidth) * secrets.analytics.floorWidth;
    
    // the parameters for measurement protocol
    const params = {
      v:"1",
      t:"event",
      tid:secrets.analytics.trackingCode,
      cid:accountId,
      ds:configs.platform,
      qt:0,
      uid:key,
      an:configs.apiName,
      av:configs.version,
      ec:method,
      ev:size,
      el:action + "-" + status,
      ea:action
    };
    
    // add custom dims & metrics
    params['cd'+cd.key] = key;
    params['cd'+cd.account] = accountId;
    params['cd'+cd.method] = method;
    params['cd'+cd.operation] = action;
    params['cd'+cd.status] = status; 
    params['cd'+cd.sampleFloor] = sampleFloor;
    params['cm'+cm.size] = size;

   
    return params;
  };

and finally a post request to analytics
  const post_ = (body) => {
    return axios.post (debug ? debugUrl : baseUrl, body)
      .then (result=>{
        if (result.status !== 200) console.error(result.status + " failed to write analytics " + body);
        if(debug) console.log (result.data);
        return result;
      });
  };

at this address
  const baseUrl = "https://www.google-analytics.com/collect";

Retrieving the data

There's a very handy node module which provides a wrapper to access all the google apis from Node. It's in Alpha but it works very well
https://www.npmjs.com/package/googleapis

With that it's just a case of authentication using a service account downloaded from my cloud console, using this code
  ns.auth = () => {
    
    // if we've been here before then nothing to do
    if (anapi_ )return Promise.resolve(manage.goodPack());
    
    // get the service account info and auth
    const gp = require('googleapis');
    const sa = require('./private/efxfbanalytics.json');
    jwt_ = new gp.auth.JWT (
      sa.client_email,null,sa.private_key, 
      ['https://www.googleapis.com/auth/analytics.readonly'],
      null
    );
    
    return new Promise ((resolve, reject) => 
      jwt_.authorize((err, tokens) => {
        const pack = manage.errify(!err , "UNAUTHORIZED" , err );
        if (pack.ok) anapi_ = gp.analyticsreporting('v4');
        resolve(pack);
      })
    );


  };

and then constructing the rather long winded reporting analytics request resource. I won't go into the details of that here, but note how the custom dimensions and metrics can be requested using their index numbers. This is how to get back all the data that was posted using the measurement protocol so it can be used for reporting to create the usage stats in the efx dashboard.
 ns.getStats = (params) => {

    const accountId = params.accountId;
    const start = params.start ? parseInt(params.start,10) : new Date (2017, 10 , 15).getTime();
    const finish = params.finish ? parseInt(params.finish,10) : new Date().getTime();
    const pack = manage.goodPack ({
      start:start,
      finish:finish,
      accountId: accountId || ""
    });
    
    return ns.auth()
      .then (result=> {
        
        if (!result.ok) return result;
        
        //  single request
        const rr = {
          viewId: secrets.analytics.viewId,
          samplingLevel: "LARGE"
        };
        
        // metrics
        rr.metrics = ["ga:metric" + cm.size,"ga:uniqueEvents" ].map (d=> {
          return {
            expression: d
          };
        });
        
        // dimensions
        rr.dimensions = ["account" , "key" , "method", "sampleFloor" ].map (d=> {
          return {
            name: "ga:dimension" + cd[d]
          };
        });

        // date ranges TODO - date ranges
        rr.dateRanges =  [{
          startDate: new Date(start).toISOString().split('T')[0],
          endDate: new Date(finish).toISOString().split('T')[0]
        }];
       
        // filters
        if (accountId) {
          rr.dimensionFilterClauses  = [{
            filters:[{
              dimensionName: "ga:dimension" + cd.account,
              expressions: [accountId],
              operator:"IN_LIST"
            }]
          }];
        }

      
        return new Promise ((resolve, reject) => {
          anapi_.reports.batchGet ({
            auth:jwt_,
            resource:{"reportRequests":[rr]}
          }, (e,r) => {
            if(e) {
              manage.errify (
                false,
                "INTERNAL",
                "failed to get analytics",
                pack);
              reject (pack);
            }
            else {
              
              // now we have the result, normalize them
              manage.errify (
                r.reports.length === 1,
                "INTERNAL",
                "got " + r.reports.length + " reports instead of 1",
                pack);
                
              if (!pack.ok) {
                reject (pack);
              }
              else {
                const rows = r.reports[0].data.rows;
                // may need to adjust of sampling happened
                const src = r.reports[0].data.samplesReadCounts;
                const sss = r.reports[0].data.samplingSpaceSizes;
                const sampleAdjust = sss && sss[0] && src && src[0] ? sss[0]/src[0] : 1;
                // pass in a compressed way using this map
                const columns = ["accountId", "coupon","method","slot","size","count","floorWidth"];
                
                const t = (rows || []).map (d =>{
                  return [
                    d.dimensions[0],
                    d.dimensions[1],
                    d.dimensions[2],
                    parseInt(d.dimensions[3],10),
                    Math.round(sampleAdjust*parseInt(d.metrics[0].values[0],10)),
                    Math.round(parseInt(d.metrics[0].values[1],10)),
                    secrets.analytics.floorWidth
                  ];
                });
                pack.value = {
                  sampleAdjust:sampleAdjust,
                  columns:columns,
                  rows:t
                };
                resolve (pack);
              }
            }
          });
        });

      })
      .catch (err=>Promise.resolve (manage.errify(false, "INTERNAL", err, pack)));
    
  };

The complete code for the cloud function, including the analytics piece is available on github

For other articles on this topic see Going serverless with Firebase



Comments