Fetching measurements from the API: Migrating from v1 to v2

How to leverage the API v2 ability to fetch recent measurements by Datasource without duplication, and to download large datasets as files in multiple formats.

January 2024

Introduction

In the v1 API a measurement time-series is in terms of the device that generated those measurements. That is problematic because it doesn’t recognize these possibilities:

  • The device may have failed and been replaced by another device. What we want is a time-series that is stable across replacements.
  • The device may have been deployed to another location. We want a time-series specific to one placement. Datasources can be used to track separate placements as independent time-series.
  • Different kinds of devices may generate differently shaped time-series of measurements. From a programming perspective, it is useful to have a common abstraction over all devices so that time-series are easy to work with, regardless of the source that generated those measurements.

For these reasons, in v2 we have the idea of an abstract datasource that is generating datasource-measurements. At different moments in time, the datasource may have been powered by a different source device.

Comparing the two APIs

v1

In v1 it made sense to list the available devices (if you did not already know the device codes) and then for a particular device code, fetch a time-series of measurements from that device.

Some endpoint parameters from v1 are changed in the nearest corresponding v2 endpoint. These details were advertised in an earlier article (March 2023) here:  What's next for the Clarity API

Fetching measurements in v1:

1:  Fetch devices   See API Guide

Endpoint    GET  {baseUrl}/v1/devices ? org=myorg1234

In the response you can find the code for each Clarity Node. For example: A1234567

2:  Fetch a device’s measurements   See API Guide

Endpoint   GET  {baseUrl}/v1/measurements ? org=myorg1234 & code=A1234567

Additional query parameters let you tune the request. The most common ones are choosing the output frequency and the time period.

Output frequency governs the metric id:

The actual attributes returned depend on the output frequency. For example if you asked for individual measurements (outputFrequency: minute) then you get back  pm2_5ConcMass.value which represents that moment’s value of PM2.5 Mass Concentration.  If you asked for an hourly aggregation measurement (outputFrequency: hour) you get back  the same attribute, pm2_5ConcMass.value, which now represents the 1-Hour mean of the previous. The name of the attribute is the same but means different things under different output frequencies. In v2 this changes.

v2

In v2 instead of individual devices, you can start by listing all the datasources. Now you can work with a time-series that spans device replacements, or is restricted to a datasource set up for a particular placement.  

For each datasource you can learn many things, like what actual source device was powering that datasource for different periods of time. Because a datasource can produce a time-series of measurements that came from different actual source devices over time, we call these datasource-measurements (to distinguish from Clarity node-measurements that are restricted to one device). The v2 API divides measurement fetching into two endpoints that align with two different use cases.

In the first use case, you are asking for a small number of recent measurements. This use case aligns with an application that is tracking along with what is happening today, and just wants a traditional JSON data reply. Not much data is returned.

In the second use case, you are analyzing a large historical time-series of measurements. You may be using a data science application or rolling your own analysis with a library like Pandas. In this use case, you are getting back a LOT of data and it makes sense to deliver a file in CSV or a consumable tabular format like Apache Parquet. 

A historical request is a request for a report - a dataset in a file. There are three phases to generating a report: requesting the report, checking to see if the report is ready, and then downloading it.

    Datasources

    Fetch datasource summary for the org   See API Guide

    Endpoint    GET  {baseUrl}/v2/datasources ? org=myorg1234

    In the response you can find the datasourceId for each datasource. For example: DDDYN3547

    Note that in v1 a device is identified by code and in v2 a device in a datasource is identified by currentSourceId. You can retrieve the source history of all devices in a datasource, over-time, using the per-datasource details endpoint

    Recent measurements

    Fetch recent measurements   See API Guide

    Endpoint   POST  {baseUrl}/v2/recent-datasource-measurements-query

    With a body as described in the Guide. At a minimum, you specify the org, and whether you want all the datasource in the org or you pass in a list of specific datasource IDs. Example responses are shown in the API Guide here.

    Historical  measurements

    1.  Request report   See API Guide

    Endpoint   POST  {baseUrl}/v2/report-requests

    with a body as described in the API Guide. At a minimum, you specify the org, and whether you want all the datasources in the org or a specific list of datasource IDs. In the response, you get back a report Job ID that you can then poll, to see when the report  is ready to be downloaded. Example Job ID: JBDLPB37C6

    2.  Poll report   See API Guide

    Endpoint   GET  {baseUrl}/v2/report-request/JBDLPB37C6

    The response will tell you the status of the request. If the request has succeeded, the response will also give you a list of URLs to download. The report would need to be huge before it exceeds the size of one file, so almost always, you’ll see a single URL in the list.

    3.  Pull down the report using the URL

    There is no endpoint for this, you just read the bytes from the URL you got back in step 2. There is example Python code showing this here: How to request and retrieve reports