Migrating to Current M-Lab Tables and Views
In March 2016, M-Lab launched new M-Lab BigQuery Fast tables which offered faster performance and a simpler data schema than the previous per-month legacy tables. In late 2017 to early 2018, M-Lab transitioned to new BigQuery tables as a part of updating to an open-source pipeline.
This guide walks users through the process of converting BigQuery SQL queries from Legacy Monthly or Fast Tables to take advantage of the new features of our most current tables and views. For reference, our past tables were stored in a special project plx.google
and had the structure below:
- Fast Tables - Example:
plx.google:m_lab.<TEST NAME>.all
- Legacy Monthly Tables -
Example: plx.google:m_lab.2016_01.all
Release Datasets and Views
M-Lab now publishes a series of three datasets for each version release:
- rc
- Beta test version of the next release of views.
- These views represent the most recent release-candidate.
- release_vX_Y_Z
- Recent stable releases.
- release
- An alias to current supported release.
- This is the set of views that most people should use.
M-Lab recommends using the release dataset for querying NDT data. This will make future table schema transitions less impactful since queries can be pinned to the release views instead of to a specific table name.
Beginning with the upcoming v3.1 release, in the rc, release_vX_Y_Z, and release datasets we will publish the following views:
- ndt_all
- ndt_all_legacysql
- ndt_downloads
- ndt_downloads_legacysql
- ndt_uploads
- ndt_uploads_legacysql
The views above ending in _legacysql
require you to use legacySQL queries, and those labeled without it (standardSQL)
require you the use of standardSQL queries. Additionally, the views which include ‘uploads’ or ‘downloads’ provide a subset of NDT data that are valid, completed tests which meet the criteria described on our page, Calculating Common Metrics. M-Lab highly recommends using standardSQL as this will be required to query all M-Lab tables in the future.
Sidestream and Paris Traceroute
If your queries are for Sidestream and Paris Traceroute data, please use the following tables:
plx.google.m_lab.sidestream.all
plx.google.m_lab.traceroute.all
Please note that Sidestream and Paris Traceroute data will eventually be published in the base_tables dataset:
measurement-lab:base_tables.sidestream
measurement-lab:base_tables.traceroute
Converting Queries from Legacy Montly or Fast Tables to M-Lab’s release Views
The converted examples below are written in standardSQL, indicated by #standardSQL
at the top of the code block for full example queries. If you are querying using the BigQuery website M-Lab recommends adding #standardSQL
at the top of your queries to set SQL Dialect option.
If your query uses the Legacy Monthly tables, the FROM portion of the query will look something like this:
FROM
plx.google:m_lab.2016_01.all
If your query uses Fast Tables, the FROM portion of the query will look something like this:
FROM
plx.google:m_lab.ndt.all
Update the table name
Replace the table name with the M-Lab release of interest. For a query over NDT data, this becomes:
FROM
`measurement-lab.release.ndt_all`
Note that there are multiple release views for your use, and provide the filtered results as described below:
measurement-lab.release.ndt_all
- all NDT data where
blacklist_flags
= 0 or NULL - StandardSQL syntax required.
- all NDT data where
measurement-lab.release.ndt_all_legacysql
- all NDT data where
blacklist_flags
= 0 or NULL - LegacySQL syntax required.
- all NDT data where
measurement-lab.release.ndt_downloads
- all NDT download tests where
blacklist_flags
= 0 or NULL - limited to tests matching optimal measurement conditions
- StandardSQL syntax required.
- all NDT download tests where
measurement-lab.release.ndt_downloads_legacysql
- all NDT download tests where
blacklist_flags
= 0 or NULL - limited to tests matching optimal measurement conditions
- LegacySQL syntax required.
- all NDT download tests where
measurement-lab.release.ndt_uploads
- all NDT download tests where
blacklist_flags
= 0 or NULL - limited to tests matching optimal measurement conditions
- StandardSQL syntax required.
- all NDT download tests where
measurement-lab.release.ndt_uploads_legacysql
- all NDT download tests where
blacklist_flags
= 0 or NULL - limited to tests matching optimal measurement conditions
- LegacySQL syntax required.
- all NDT download tests where
Update WHERE
statements to select data by date/time ranges
Legacy Monthly tables were published monthly, and making time series selection was done by querying multiple tables. Since v3.1, M-Lab’s BigQuery tables are date-partitioned making time series selection much easier.
Add a selector using the field partition_date
to the WHERE
portion of your query to limit selected data to a range of dates:
WHERE
partition_date > '2018-01-01' AND partition_date < '2018-01-24'
Limiting results to final test values
Many M-Lab tests gather TCP snapshots incrementally while running. While all interim snapshot values are available in our raw data, we currently only parse the final test values into our BigQuery tables. This meets the needs of most researchers. In future releases, we are exploring how to best add a repeating record to provide interim snapshot values.
If your previous queries included statements such as the one below, they can be removed from your query to release views:
WHERE
web100_log_entry.is_last_entry = TRUE
Limit results using the blacklist_flags field
The blacklist_flags
field was introduced to mark test results that could be impacted by site configuration issues, or otherwise communicate potentially relevant information about the state of the platform at the time of the test. This field was created to mark tests affected by the “switch discard issue” identified in 2015-2016, but M-Lab may use the field for other use cases in the future.
By default, M-Lab’s release views limit results to tests where blacklist_flags
is set to 0
or NULL
. If you wish to query data where the blacklist_flag is another value, you will need to query the NDT base table directly, measurement-lab:base_tables.ndt
and limit your results according to your needs.
Remove deprecated fields from WHERE
statements
M-Lab’s legacy monthly tables combined data for several different M-Lab projects (NDT, NPAD, SideStream, and Paris Traceroute) into the same table. As such, queries for a particular project’s data required the query author to add a WHERE project=XX
clause to restrict the query to a particular project. The field project
was deprecated in the transition to Fast tables in 2016, and our current tables deprecate additional fields as well.
Remove any reference to these deprecated fields in your queries:
project
web100_log_entry.is_last_entry
type
web100_log_entry.group_name
Remove limits for optimal download/upload test conditions in WHERE
statements
As mentioned above, you no longer need to check for optimal values for upload and download tests, as we describe on our page Calculating Common Metrics, if you are querying any of the following views:
measurement-lab.release.ndt_downloads
measurement-lab.release.ndt_downloads_legacysql
measurement-lab.release.ndt_uploads
measurement-lab.release.ndt_uploads_legacysql
If your query for download tests contains the following WHERE
conditions, these lines can be removed if you query the download views above:
AND IS_EXPLICITLY_DEFINED(web100_log_entry.connection_spec.remote_ip)
AND IS_EXPLICITLY_DEFINED (web100_log_entry.connection_spec.local_ip)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.HCThruOctetsAcked)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.SndLimTimeRwin)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.SndLimTimeCwnd)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.SndLimTimeSnd)
AND project = 0
AND IS_EXPLICITLY_DEFINED(connection_spec.data_direction)
AND connection_spec.data_direction = 1
AND IS_EXPLICITLY_DEFINED(web100_log_entry.is_last_entry)
AND web100_log_entry.is_last_entry = True
AND web100_log_entry.snap.HCThruOctetsAcked >= 8192
AND (web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd +
web100_log_entry.snap.SndLimTimeSnd) >= 9000000
AND (web100_log_entry.snap.SndLimTimeRwin +
web100_log_entry.snap.SndLimTimeCwnd +
web100_log_entry.snap.SndLimTimeSnd) < 3600000000
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.CongSignals)
AND web100_log_entry.snap.CongSignals > 0
AND (web100_log_entry.snap.State == 1 OR
(web100_log_entry.snap.State >= 5 AND
web100_log_entry.snap.State <= 11))
AND blacklist_flags == 0
Similarly, if your query for upload tests contains the following WHERE
conditions, these lines can be removed if you query the upload views above:
AND IS_EXPLICITLY_DEFINED(web100_log_entry.connection_spec.remote_ip)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.connection_spec.local_ip)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.HCThruOctetsReceived)
AND IS_EXPLICITLY_DEFINED(web100_log_entry.snap.Duration)
AND IS_EXPLICITLY_DEFINED(connection_spec.data_direction)
AND connection_spec.data_direction = 0
AND IS_EXPLICITLY_DEFINED(web100_log_entry.is_last_entry)
AND web100_log_entry.is_last_entry = True
AND web100_log_entry.snap.HCThruOctetsReceived >= 8192
AND web100_log_entry.snap.Duration >= 9000000
AND web100_log_entry.snap.Duration < 3600000000
AND (web100_log_entry.snap.State == 1
OR (web100_log_entry.snap.State >= 5
AND web100_log_entry.snap.State <= 11))
AND blacklist_flags == 0
Complete Example
To tie it all together, we will look at a complete example where we convert an existing query to take advantage of the new M-Lab release views.
The query below calculates the total number of NDT tests performed against M-Lab servers for each day in the last quarter of 2015.
Original query
SELECT
STRFTIME_UTC_USEC(web100_log_entry.log_time * 1000000,
'%Y-%m-%d') AS day,
COUNT(*) AS num_tests
FROM
[plx.google:m_lab.2015_10.all],
[plx.google:m_lab.2015_11.all],
[plx.google:m_lab.2015_12.all]
WHERE
project = 0
AND web100_log_entry.is_last_entry = TRUE
AND web100_log_entry.log_time IS NOT NULL
GROUP BY
day
ORDER BY
day ASC
Converted to use release view
#standardSQL
SELECT
COUNT(test_id) AS numtests,
partition_date AS day
FROM
`measurement-lab.release.ndt_all`
WHERE
partition_date >= '2015-10-01'
AND partition_date <= '2015-12-31'
GROUP BY
day
ORDER BY
day
Questions / Feedback
If you have questions or feedback about using M-Lab’s new tables, please send an email to support@measurementlab.net.