BigQuery Schema

Background

  • Each M-Lab tool consists of a client and a server.
  • Whenever an M-Lab user starts a test, the client and server interact to measure different aspects of that user’s connection.
  • A single user request triggers one or more tests (e.g., client-to-server test, server-to-client test).
  • For each test, a server collects a log, and the test can be uniquely identified by its log filename.
  • Timestamp fields are stored in our schema in UTC
  • Time only fields are stored in milliseconds (ms)

BigQuery Storage and Processing Datasets

For each release, M-Lab publishes tables and views in two groups of datasets. One group of datasets is used primarily for storage and processing, and another group which we recommended that most researchers use to query for M-Lab data.   Four storage and processing datasets are published for each release, beginning with v3.1:  

  • base_tables
    • Contains the raw NDT and Switch tables.
    • This dataset will eventually hold all M-Lab data, from 2009 to present, for NDT, Sidestream, Switch, and Paris Traceroute.  
  • batch
    • Destination for batch processed data prior to dedupping.  
  • intermediate_vX_Y_Z
    • Provides working, intermediate BigQuery Views for each release version
    • These views adapt data from tables in the base_tables dataset to make release views available.
    • Intermediate dataset views are publicly visible for transparency, but we discourage most people from using them directly.  
  • legacy
    • Contains exact copies of the data stored in our previous NDT fast tables, for use in the intermediate and release views.
    • This dataset will eventually include all legacy Sidestream and Paris Traceroute tables.

The datasets in the previous section outline where raw M-Lab test data is parsed and stored. While they are also queryable in the above locations, M-Lab recommends that most researchers instead begin querying the views published in our release dataset. This will make future table schema transitions less impactful since queries can be pinned to the release views instead of to a specific table name.   Three researcher query datasets are published for each new release:

  • rc
    • Beta test version of the next release of views.
    • These views represent the most recent release-candidate.  
  • rc_vX_Y_Z
    • Recent stable releases.  
  • release
    • An alias to current supported release.
    • This is the set of views that most people should use.
    • These views are updated to include any refinements to the recommended filtering, and may include changes to the schema over time.

Beginning with the upcoming v3.1 release, in the rc, rc_vX_Y_Z, and release datasets we will publish the following views:  

  • ndt_all
  • ndt_all_legacysql
  • ndt_downloads
  • ndt_downloads_legacysql
  • ndt_uploads
  • ndt_uploads_legacysql

The views above ending in _legacysql require you to use legacySQL queries, and those labeled without it (standardSQL) require you the use of standardSQL queries. Additionally, the views which include ‘uploads’ or ‘downloads’ provide a subset of NDT data that are valid, completed tests which meet the criteria described on our page, Calculating Common Metrics. We highly recommend using standardSQL as this will be required to query all M-Lab tables in the future.   Researchers interested in querying unfiltered NDT data should use the table in the base_tables dataset. Sidestream and Paris Traceroute tables will also be added to the base_tables dataset in the coming weeks:  

  • base_tables
    • ndt - measurement-lab.base_tables.ndt
    • sidestream - measurement-lab.base_tables.sidestream
    • switch - measurement-lab.base_tables.switch
    • traceroute - measurement-lab.base_tables.traceroute

Summary of BigQuery Tables (v3.1)

The current version of M-Lab BigQuery tables is v3.1, and are listed below. Tables are bolded and table views are listed underneath where applicable.

Base Tables for Historical and Current Data

If you have queries that reference historical M-Lab tables, please review Migrating Queries to learn how to convert them to use M-Lab’s release views.

Please also review the table/view schema version history and changelog at the end of this page for more information.

Equivalent BigQuery and Web100 Field Types

tcp-kis.txt defines each Web100 variable with a specific SNMP type. This table shows how to map each SNMP type to a BigQuery type.

BigQuery Type Corresponding SNMP Type
integer Integer32, Integer, INTEGER, Gauge32, ZeroBasedCounter32, Unsigned32, Unsigned16, Counter32, ZeroBasedCounter64
string Ip_Address
bool TruthValue

Blacklist Flags Field

The field blacklist_flags was used to mark test affected by the “switch discard issue” identified in 2015-2016. M-Lab NDT data from 2010-01-01 to 2015-10-02 was marked using this field. M-Lab may use the field for other use cases in the future for tests potentially impacted by site configuration issues, or otherwise communicate potentially relevant information about the state of the platform at the time of the test.

In the NDT v3.1 schema, we have added a sub-record, anomalies, which contains an equivalent field, anomalies.blacklist_flags.   Currently, the following values are present in NDT data from 2010-01-01 to 2015-10-02 in these fields:  

Field Name Value Description
blacklist_flags or anomalies.blacklist_flags 0 or NULL unaffected tests
  1 tests affected by switch discards
  2 tests not shown to be unaffected by switch discards

All views in the release dataset limit to tests where blacklist_flags is 0 or NULL.  

Table Schemas (v3.1)

The schemas for NDT, Sidestream, and Paris Traceroute are provided below. Note that the schemas for release views of NDT are largely the same as listed below with the exception of the date partition field.

NDT - measurement-lab.base_tables.ndt

Field name Type Description
_PARTITIONTIME timestamp This pseudo column contains a timestamp for the start of the day (in UTC) in which the data was loaded. For the YYYYMMDD partition, this pseudo column will contain the value TIMESTAMP(‘YYYY-MM-DD’).
test_id string ID of the test. It represents the filename of the log that contains the data generated during the test (e.g. 20090819T02:01:04.507508000Z_189.6.232.77:3859.c2s_snaplog.gz).
task_filename string The raw data file in Google Cloud Storage from which the test row was parsed.
parse_time timestamp Timestamp of when test data was parsed into BigQuery from Google Cloud Storage.
log_time timestamp Timestamp of when test log was created (in seconds since Unix epoch).

For NDT and NPAD, this is derived from the “Date/Time” field in the .meta file (for NDT and NPAD, prefer the web100_log_entry.log_time field, as it is more reliable).

For SideStream and Paris Traceroute, this is the timestamp as represented in the test log file’s filename.
blacklist_flags integer Deprecated field formerly used to store blacklist flags. Superceded by anomalies.blacklist_flags.
anomalies record A sub-record section containing anomalies fields.
anomalies.no_meta boolean If a test record did not contain a metadata file, this field is set to 1.
anomalies.snaplog_error boolean If there are erros in the snaplogs for a test, this field is set to 1.
anomalies.num_snaps integer A count of the number of snaplogs captured during a given test.
anomalies.blacklist_flags integer Used to mark test results that could be impacted by site configuration issues, or otherwise communicate potentially relevant information about the state of the platform at the time of the test.
connection_spec record Sub-record section containing fields describing the client which initiated a test and the M-Lab server through which the test was conducted.
connection_spec.client_af integer Address family of the client’s IP address. (This field is optional.)
AF_UNSPEC = 0
AF_INET (IPv4) = 2
AF_INET6 (IPv6) = 10
connection_spec.client_application string Client application that ran the test. (This field is optional.)
connection_spec.client_browser string Client’s browser. (This field is optional.)
connection_spec.client_hostname string Client’s hostname. (This field is optional.)
connection_spec.client_ip string IP address of the user’s client. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.remote_ip.)
connection_spec.client_kernel_version string Client’s kernel version. (This field is optional.)
connection_spec.client_os string Client’s operating system. (This field is optional.)
connection_spec.client_version string Client’s version. (This field is optional.)
connection_spec.data_direction integer Direction of the data sent during the test:
CLIENT_TO_SERVER = 0
SERVER_TO_CLIENT = 1
connection_spec.server_af integer Address family of the server’s IP address. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.local_af.)
AF_UNSPEC = 0
AF_INET (IPv4) = 2
AF_INET6 (IPv6) = 10
connection_spec.server_hostname string Server’s hostname. (This field is optional.)
connection_spec.server_ip string Server’s IP address. (This field is optional. It’s preferable to use web100_log_entry.connection_spec.local_ip.)
connection_spec.server_kernel_version string Server’s kernel version. (This field is optional.)
connection_spec.tls boolean  
connection_spec.websockets boolean  
connection_spec.client_geolocation record Sub-record section containing geolocation annotations for the client that initiated the test. Geolocation fields extracted from open dataset created by MaxMind and available at www.maxmind.com. (These fields are optional.)
client_geolocation.area_code integer  
client_geolocation.city string  
client_geolocation.continent_code string  
client_geolocation.country_code string  
client_geolocation.country_code3 string  
client_geolocation.country_name string  
client_geolocation.latitude float  
client_geolocation.longitude float  
client_geolocation.metro_code integer  
client_geolocation.postal_code string  
client_geolocation.region string  
connection_spec.server_geolocation record Sub-record section containing geolocation annotations for the M-Lab server that received the test. Geolocation fields extracted from open dataset created by MaxMind and available at www.maxmind.com. (These fields are optional.)
server_geolocation.area_code integer  
server_geolocation.city string  
server_geolocation.continent_code string  
server_geolocation.country_code string  
server_geolocation.country_code3 string  
server_geolocation.country_name string  
server_geolocation.latitude float  
server_geolocation.longitude float  
server_geolocation.metro_code integer  
server_geolocation.postal_code string  
server_geolocation.region string  
web100_log_entry record Sub-record section containing the web100 variables collected during the test. See tcp-kis.txt for more information about these fields.
web100_log_entry.log_time integer  
web100_log_entry.version string  
web100_log_entry.connection_spec.local_af integer IPv4 = 0
IPv6 = 1
web100_log_entry.connection_spec.local_ip string  
web100_log_entry.connection_spec.local_port integer  
web100_log_entry.connection_spec.remote_ip string  
web100_log_entry.connection_spec.remote_port integer  
web100_log_entry.snap.AbruptTimeouts integer  
web100_log_entry.snap.ActiveOpen integer  
web100_log_entry.snap.CERcvd integer  
web100_log_entry.snap.CongAvoid integer  
web100_log_entry.snap.CongOverCount integer  
web100_log_entry.snap.CongSignals integer  
web100_log_entry.snap.CountRTT integer  
web100_log_entry.snap.CurAppRQueue integer  
web100_log_entry.snap.CurAppWQueue integer  
web100_log_entry.snap.CurCwnd integer  
web100_log_entry.snap.CurMSS integer  
web100_log_entry.snap.CurRTO integer  
web100_log_entry.snap.CurReasmQueue integer  
web100_log_entry.snap.CurRetxQueue integer  
web100_log_entry.snap.CurRwinRcvd integer  
web100_log_entry.snap.CurRwinSent integer  
web100_log_entry.snap.CurSsthresh integer  
web100_log_entry.snap.CurTimeoutCount integer  
web100_log_entry.snap.DSACKDups integer  
web100_log_entry.snap.DataSegsIn integer  
web100_log_entry.snap.DataSegsOut integer  
web100_log_entry.snap.DupAcksIn integer  
web100_log_entry.snap.DupAcksOut integer  
web100_log_entry.snap.Duration integer  
web100_log_entry.snap.ECN integer  
web100_log_entry.snap.FastRetran integer  
web100_log_entry.snap.HCDataOctetsIn integer  
web100_log_entry.snap.HCDataOctetsOut integer  
web100_log_entry.snap.HCThruOctetsAcked integer  
web100_log_entry.snap.HCThruOctetsReceived integer  
web100_log_entry.snap.LimCwnd integer  
web100_log_entry.snap.LimRwin integer  
web100_log_entry.snap.LocalAddress string  
web100_log_entry.snap.LocalAddressType integer  
web100_log_entry.snap.LocalPort integer  
web100_log_entry.snap.MSSRcvd integer  
web100_log_entry.snap.MaxAppRQueue integer  
web100_log_entry.snap.MaxAppWQueue integer  
web100_log_entry.snap.MaxMSS integer  
web100_log_entry.snap.MaxRTO integer  
web100_log_entry.snap.MaxRTT integer  
web100_log_entry.snap.MaxReasmQueue integer  
web100_log_entry.snap.MaxRetxQueue integer  
web100_log_entry.snap.MaxRwinRcvd integer  
web100_log_entry.snap.MaxRwinSent integer  
web100_log_entry.snap.MaxSsCwnd integer  
web100_log_entry.snap.MaxSsthresh integer  
web100_log_entry.snap.MinMSS integer  
web100_log_entry.snap.MinRTO integer  
web100_log_entry.snap.MinRTT integer  
web100_log_entry.snap.MinRwinRcvd integer  
web100_log_entry.snap.MinRwinSent integer  
web100_log_entry.snap.MinSsthresh integer  
web100_log_entry.snap.Nagle integer  
web100_log_entry.snap.NonRecovDA integer  
web100_log_entry.snap.OctetsRetrans integer  
web100_log_entry.snap.OtherReductions integer  
web100_log_entry.snap.PostCongCountRTT integer  
web100_log_entry.snap.PostCongSumRTT integer  
web100_log_entry.snap.PreCongSumCwnd integer  
web100_log_entry.snap.PreCongSumRTT integer  
web100_log_entry.snap.QuenchRcvd integer  
web100_log_entry.snap.RTTVar integer  
web100_log_entry.snap.RcvNxt integer  
web100_log_entry.snap.RcvRTT integer  
web100_log_entry.snap.RcvWindScale integer  
web100_log_entry.snap.RecInitial integer  
web100_log_entry.snap.RemAddress string  
web100_log_entry.snap.RemPort integer  
web100_log_entry.snap.RetranThresh integer  
web100_log_entry.snap.SACK integer  
web100_log_entry.snap.SACKBlocksRcvd integer  
web100_log_entry.snap.SACKsRcvd integer  
web100_log_entry.snap.SampleRTT integer  
web100_log_entry.snap.SegsIn integer  
web100_log_entry.snap.SegsOut integer  
web100_log_entry.snap.SegsRetrans integer  
web100_log_entry.snap.SendStall integer  
web100_log_entry.snap.SlowStart integer  
web100_log_entry.snap.SmoothedRTT integer  
web100_log_entry.snap.SndInitial integer  
web100_log_entry.snap.SndLimBytesCwnd integer  
web100_log_entry.snap.SndLimBytesRwin integer  
web100_log_entry.snap.SndLimBytesSender integer  
web100_log_entry.snap.SndLimTimeCwnd integer  
web100_log_entry.snap.SndLimTimeRwin integer  
web100_log_entry.snap.SndLimTimeSnd integer  
web100_log_entry.snap.SndLimTransCwnd integer  
web100_log_entry.snap.SndLimTransRwin integer  
web100_log_entry.snap.SndLimTransSnd integer  
web100_log_entry.snap.SndMax integer  
web100_log_entry.snap.SndNxt integer  
web100_log_entry.snap.SndUna integer  
web100_log_entry.snap.SndWindScale integer  
web100_log_entry.snap.SpuriousFrDetected integer  
web100_log_entry.snap.StartTimeStamp integer  
web100_log_entry.snap.StartTimeUsec integer  
web100_log_entry.snap.State integer  
web100_log_entry.snap.SubsequentTimeouts integer  
web100_log_entry.snap.SumRTT integer  
web100_log_entry.snap.TimeStamps integer  
web100_log_entry.snap.Timeouts integer  
web100_log_entry.snap.WinScaleRcvd integer  
web100_log_entry.snap.WinScaleSent integer  
web100_log_entry.snap.X_OtherReductionsCM integer  
web100_log_entry.snap.X_OtherReductionsCV integer  
web100_log_entry.snap.X_Rcvbuf integer  
web100_log_entry.snap.X_Sndbuf integer  
web100_log_entry.snap.X_dbg1 integer  
web100_log_entry.snap.X_dbg2 integer  
web100_log_entry.snap.X_dbg3 integer  
web100_log_entry.snap.X_dbg4 integer  
web100_log_entry.snap.X_rcv_ssthresh integer  
web100_log_entry.snap.X_wnd_clamp integer  

Sidestream - measurement-lab.base_tables.sidestream

Field name Type Description
_PARTITIONTIME timestamp This pseudo column contains a timestamp for the start of the day (in UTC) in which the data was loaded. For the YYYYMMDD partition, this pseudo column will contain the value TIMESTAMP(‘YYYY-MM-DD’).
test_id string ID of the test. It represents the filename of the log that contains the data generated during the test (e.g. 20090819T02:01:04.507508000Z_189.6.232.77:3859.c2s_snaplog.gz).
project integer  
log_time timestamp  
type integer  
web100_log_entry.log_time integer  
web100_log_entry.version string  
web100_log_entry.group_name string  
web100_log_entry.connection_spec.local_af integer IPv4 = 0
IPv6 = 1
web100_log_entry.connection_spec.local_ip string  
web100_log_entry.connection_spec.local_port integer  
web100_log_entry.connection_spec.remote_ip string  
web100_log_entry.connection_spec.remote_port integer  
web100_log_entry.connection_spec.local_geolocation.area_code integer  
web100_log_entry.connection_spec.local_geolocation.city string  
web100_log_entry.connection_spec.local_geolocation.continent_code string  
web100_log_entry.connection_spec.local_geolocation.country_code string  
web100_log_entry.connection_spec.local_geolocation.country_code3 string  
web100_log_entry.connection_spec.local_geolocation.country_name string  
web100_log_entry.connection_spec.local_geolocation.latitude float  
web100_log_entry.connection_spec.local_geolocation.longitude float  
web100_log_entry.connection_spec.local_geolocation.metro_code integer  
web100_log_entry.connection_spec.local_geolocation.postal_code string  
web100_log_entry.connection_spec.local_geolocation.region string  
web100_log_entry.connection_spec.remote_geolocation.area_code integer  
web100_log_entry.connection_spec.remote_geolocation.city string  
web100_log_entry.connection_spec.remote_geolocation.continent_code string  
web100_log_entry.connection_spec.remote_geolocation.country_code string  
web100_log_entry.connection_spec.remote_geolocation.country_code3 string  
web100_log_entry.connection_spec.remote_geolocation.country_name string  
web100_log_entry.connection_spec.remote_geolocation.latitude float  
web100_log_entry.connection_spec.remote_geolocation.longitude float  
web100_log_entry.connection_spec.remote_geolocation.metro_code integer  
web100_log_entry.connection_spec.remote_geolocation.postal_code string  
web100_log_entry.connection_spec.remote_geolocation.region string  
web100_log_entry.snap.AbruptTimeouts integer  
web100_log_entry.snap.ActiveOpen integer  
web100_log_entry.snap.CERcvd integer  
web100_log_entry.snap.CongAvoid integer  
web100_log_entry.snap.CongOverCount integer  
web100_log_entry.snap.CongSignals integer  
web100_log_entry.snap.CountRTT integer  
web100_log_entry.snap.CurAppRQueue1 integer  
web100_log_entry.snap.CurAppWQueue integer  
web100_log_entry.snap.CurCwnd integer  
web100_log_entry.snap.CurMSS integer  
web100_log_entry.snap.CurRTO integer  
web100_log_entry.snap.CurReasmQueue integer  
web100_log_entry.snap.CurRetxQueue integer  
web100_log_entry.snap.CurRwinRcvd integer  
web100_log_entry.snap.CurRwinSent integer  
web100_log_entry.snap.CurSsthresh integer  
web100_log_entry.snap.CurTimeoutCount integer  
web100_log_entry.snap.DSACKDups integer  
web100_log_entry.snap.DataOctetsIn integer  
web100_log_entry.snap.DataOctetsOut integer  
web100_log_entry.snap.DataSegsIn integer  
web100_log_entry.snap.DataSegsOut integer  
web100_log_entry.snap.DupAckEpisodes integer  
web100_log_entry.snap.DupAcksIn integer  
web100_log_entry.snap.DupAcksOut integer  
web100_log_entry.snap.Duration integer  
web100_log_entry.snap.ECESent integer  
web100_log_entry.snap.ECN integer  
web100_log_entry.snap.ECNNonceRcvd integer  
web100_log_entry.snap.ECNsignals integer  
web100_log_entry.snap.ElapsedMicroSecs integer  
web100_log_entry.snap.ElapsedSecs integer  
web100_log_entry.snap.FastRetran integer  
web100_log_entry.snap.HCDataOctetsIn integer  
web100_log_entry.snap.HCDataOctetsOut integer  
web100_log_entry.snap.HCSumRTT integer  
web100_log_entry.snap.HCThruOctetsAcked integer  
web100_log_entry.snap.HCThruOctetsReceived integer  
web100_log_entry.snap.InRecovery integer  
web100_log_entry.snap.IpTosIn integer  
web100_log_entry.snap.IpTosOut integer  
web100_log_entry.snap.IpTtl integer  
web100_log_entry.snap.LimCwnd integer  
web100_log_entry.snap.LimMSS integer  
web100_log_entry.snap.LimRwin integer  
web100_log_entry.snap.LimSsthresh integer  
web100_log_entry.snap.LocalAddress string  
web100_log_entry.snap.LocalAddressType integer  
web100_log_entry.snap.LocalPort integer  
web100_log_entry.snap.MSSRcvd integer  
web100_log_entry.snap.MSSSent integer  
web100_log_entry.snap.MaxAppRQueue integer  
web100_log_entry.snap.MaxAppWQueue integer  
web100_log_entry.snap.MaxCaCwnd integer  
web100_log_entry.snap.MaxMSS integer  
web100_log_entry.snap.MaxPipeSize integer  
web100_log_entry.snap.MaxRTO integer  
web100_log_entry.snap.MaxRTT integer  
web100_log_entry.snap.MaxReasmQueue integer  
web100_log_entry.snap.MaxRetxQueue integer  
web100_log_entry.snap.MaxRwinRcvd integer  
web100_log_entry.snap.MaxRwinSent integer  
web100_log_entry.snap.MaxSsCwnd integer  
web100_log_entry.snap.MaxSsthresh integer  
web100_log_entry.snap.MinMSS integer  
web100_log_entry.snap.MinRTO integer  
web100_log_entry.snap.MinRTT integer  
web100_log_entry.snap.MinRwinRcvd integer  
web100_log_entry.snap.MinRwinSent integer  
web100_log_entry.snap.MinSsthresh integer  
web100_log_entry.snap.Nagle integer  
web100_log_entry.snap.NonRecovDA integer  
web100_log_entry.snap.NonRecovDAEpisodes integer  
web100_log_entry.snap.OctetsRetrans integer  
web100_log_entry.snap.OtherReductions integer  
web100_log_entry.snap.PipeSize integer  
web100_log_entry.snap.PostCongCountRTT integer  
web100_log_entry.snap.PostCongSumRTT integer  
web100_log_entry.snap.PreCongSumCwnd integer  
web100_log_entry.snap.PreCongSumRTT integer  
web100_log_entry.snap.QuenchRcvd integer  
web100_log_entry.snap.RTTVar integer  
web100_log_entry.snap.RcvNxt integer  
web100_log_entry.snap.RcvRTT integer  
web100_log_entry.snap.RcvWindScale integer  
web100_log_entry.snap.RecInitial integer  
web100_log_entry.snap.RemAddress integer  
web100_log_entry.snap.RemPort integer  
web100_log_entry.snap.RetranThresh integer  
web100_log_entry.snap.SACK integer  
web100_log_entry.snap.SACKBlocksRcvd integer  
web100_log_entry.snap.SACKsRcvd integer  
web100_log_entry.snap.SampleRTT integer  
web100_log_entry.snap.SegsIn integer  
web100_log_entry.snap.SegsOut integer  
web100_log_entry.snap.SegsRetrans integer  
web100_log_entry.snap.SendStall integer  
web100_log_entry.snap.SlowStart integer  
web100_log_entry.snap.SmoothedRTT integer  
web100_log_entry.snap.SndInitial integer  
web100_log_entry.snap.SndLimBytesCwnd integer  
web100_log_entry.snap.SndLimBytesRwin integer  
web100_log_entry.snap.SndLimBytesSender integer  
web100_log_entry.snap.SndLimTimeCwnd integer  
web100_log_entry.snap.SndLimTimeRwin integer  
web100_log_entry.snap.SndLimTimeSnd integer  
web100_log_entry.snap.SndLimTransCwnd integer  
web100_log_entry.snap.SndLimTransRwin integer  
web100_log_entry.snap.SndLimTransSnd integer  
web100_log_entry.snap.SndMax integer  
web100_log_entry.snap.SndNxt integer  
web100_log_entry.snap.SndUna integer  
web100_log_entry.snap.SndWindScale integer  
web100_log_entry.snap.SoftErrorReason integer  
web100_log_entry.snap.SoftErrors integer  
web100_log_entry.snap.SpuriousFrDetected integer  
web100_log_entry.snap.SpuriousRtoDetected integer  
web100_log_entry.snap.StartTimeStamp integer  
web100_log_entry.snap.State integer  
web100_log_entry.snap.SubsequentTimeouts integer  
web100_log_entry.snap.SumOctetsReordered integer  
web100_log_entry.snap.SumRTT integer  
web100_log_entry.snap.ThruOctetsAcked integer  
web100_log_entry.snap.ThruOctetsReceived integer  
web100_log_entry.snap.TimeStampRcvd integer  
web100_log_entry.snap.TimeStampSent integer  
web100_log_entry.snap.TimeStamps integer  
web100_log_entry.snap.Timeouts integer  
web100_log_entry.snap.WAD_CwndAdjust integer  
web100_log_entry.snap.WAD_IFQ integer  
web100_log_entry.snap.WAD_MaxBurst integer  
web100_log_entry.snap.WAD_MaxSsthresh integer  
web100_log_entry.snap.WAD_NoAI integer  
web100_log_entry.snap.WillSendSACK integer  
web100_log_entry.snap.WillUseSACK integer  
web100_log_entry.snap.WinScaleRcvd integer  
web100_log_entry.snap.WinScaleSent integer  
web100_log_entry.snap.X_OtherReductionsCM integer  
web100_log_entry.snap.X_OtherReductionsCV integer  
web100_log_entry.snap.X_Rcvbuf integer  
web100_log_entry.snap.X_Sndbuf integer  
web100_log_entry.snap.X_dbg1 integer  
web100_log_entry.snap.X_dbg2 integer  
web100_log_entry.snap.X_dbg3 integer  
web100_log_entry.snap.X_dbg4 integer  
web100_log_entry.snap.X_rcv_ssthresh integer  
web100_log_entry.snap.X_wnd_clamp integer  
web100_log_entry.snap.ZeroRwinRcvd integer  
web100_log_entry.snap.ZeroRwinSent integer  

Paris Traceroute - measurement-lab.base_tables.traceroute

Field name Type Description
_PARTITIONTIME timestamp This pseudo column contains a timestamp for the start of the day (in UTC) in which the data was loaded. For the YYYYMMDD partition, this pseudo column will contain the value TIMESTAMP(‘YYYY-MM-DD’).
test_id string ID of the test. It represents the filename of the log that contains the data generated during the test (e.g. 20090819T02:01:04.507508000Z_189.6.232.77:3859.c2s_snaplog.gz).
project integer  
log_time integer  
connection_spec.client_af integer  
connection_spec.client_application string  
connection_spec.client_browser string  
connection_spec.client_hostname string  
connection_spec.client_ip string  
connection_spec.client_kernel_version string  
connection_spec.client_os string  
connection_spec.client_version string  
connection_spec.data_direction integer  
connection_spec.server_af integer  
connection_spec.server_hostname string  
connection_spec.server_ip string  
connection_spec.server_kernel_version string  
connection_spec.client_geolocation.area_code integer  
connection_spec.client_geolocation.city string  
connection_spec.client_geolocation.continent_code string  
connection_spec.client_geolocation.country_code string  
connection_spec.client_geolocation.country_code3 string  
connection_spec.client_geolocation.country_name string  
connection_spec.client_geolocation.latitude float  
connection_spec.client_geolocation.longitude float  
connection_spec.client_geolocation.metro_code integer  
connection_spec.client_geolocation.postal_code string  
connection_spec.client_geolocation.region string  
connection_spec.server_geolocation.area_code integer  
connection_spec.server_geolocation.city string  
connection_spec.server_geolocation.continent_code string  
connection_spec.server_geolocation.country_code string  
connection_spec.server_geolocation.country_code3 string  
connection_spec.server_geolocation.country_name string  
connection_spec.server_geolocation.latitude float  
connection_spec.server_geolocation.longitude float  
connection_spec.server_geolocation.metro_code integer  
connection_spec.server_geolocation.postal_code string  
connection_spec.server_geolocation.region string  
paris_traceroute_hop.protocol string  
paris_traceroute_hop.src_ip string  
paris_traceroute_hop.src_af integer
AF_UNSPEC = 0
AF_INET (IPv4) = 2
AF_INET6 (IPv6) = 10
paris_traceroute_hop.src_hostname string  
paris_traceroute_hop.src_geolocation.area_code integer  
paris_traceroute_hop.src_geolocation.city string  
paris_traceroute_hop.src_geolocation.continent_code string  
paris_traceroute_hop.src_geolocation.country_code string  
paris_traceroute_hop.src_geolocation.country_code3 string  
paris_traceroute_hop.src_geolocation.country_name string  
paris_traceroute_hop.src_geolocation.latitude float  
paris_traceroute_hop.src_geolocation.longitude float  
paris_traceroute_hop.src_geolocation.metro_code integer  
paris_traceroute_hop.src_geolocation.postal_code string  
paris_traceroute_hop.src_geolocation.region string  
paris_traceroute_hop.dest_ip string  
paris_traceroute_hop.dest_af integer
AF_UNSPEC = 0
AF_INET (IPv4) = 2
AF_INET6 (IPv6) = 10
paris_traceroute_hop.dest_hostname string  
paris_traceroute_hop.dest_geolocation.area_code integer  
paris_traceroute_hop.dest_geolocation.continent_code string  
paris_traceroute_hop.dest_geolocation.country_code string  
paris_traceroute_hop.dest_geolocation.country_code3 string  
paris_traceroute_hop.dest_geolocation.country_name string  
paris_traceroute_hop.dest_geolocation.latitude float  
paris_traceroute_hop.dest_geolocation.longitude float  
paris_traceroute_hop.dest_geolocation.metro_code integer  
paris_traceroute_hop.dest_geolocation.postal_code string  
paris_traceroute_hop.dest_geolocation.region string  
paris_traceroute_hop.dest_geolocation.rtt float (repeated)  

Switch - measurement-lab.base_tables.switch

The switch table schema (also known as “DISCO”, named after the “DIScard COllection” service that records the switch data).

Field name Type Description
_PARTITIONTIME timestamp This pseudo column contains a timestamp for the start of the day (in UTC) in which the data was loaded. For the YYYYMMDD partition, this pseudo column will contain the value TIMESTAMP(‘YYYY-MM-DD’).
test_id string ID of the test. It represents the filename of the log that contains the data generated during the test (e.g. 20180608T05:00:00-to-20180608T06:00:00-switch.json.gz).
task_filename string The raw data file in Google Cloud Storage from which the test row was parsed.
parse_time timestamp Timestamp of when test data was parsed into BigQuery from Google Cloud Storage.
parser_version string The version of the parser that created this row.
log_time timestamp Never set for the switch data. The sample.timestamp should be used instead for the sample collection time.
sample record A repeated record with the value and timestamp of each 10 second observation. Typically, there will be 360 samples per hour. Due to system maintenance, or machine restarts, some intervals may contain more or less samples.
sample.timestamp timestamp Timestamp of the beginning of the 10 second time bin.
sample.value float Delta value of the metric during this 10 second time bin.
metric string The canonical metric name for samples, e.g. switch.discards.uplink.tx
hostname string The fully qualified domain name of the machine that collected the data, e.g. mlab2.abc01.measurement-lab.org.
experiment string The fully qualified domain name of the switch that produced the data, e.g. s1.abc01.measurement-lab.org.

v2 - March 2016

  • Began the publication of per project “fast tables” for NDT, NPAD, Paris Traceroute, and Sidestream.
    • plx.google:m_lab.ndt.all
    • plx.google:m_lab.npad.all
    • plx.google:m_lab.paris-traceroute.all
  • Continued the publication of v1 monthly tables, and published a migration guide.
  • Deprecated fields in v2 “fast tables”:
    • type
    • project
    • web100_log_entry.is_last_entry
    • web100_log_entry.group_name

v2.1 - November 2016

v3 - May 2017

  • Began publication to new date partitioned table and updated schema to support the new, open source, ETL pipeline.
  • Data publication to v2 tables stopped at this time.

v3.0.1 - October / November 2017

  • The schema for v3.0.1 tables was updated, removing an alpha feature called deltas, which attempted to log the differences between test snaplogs instead of the final test values. This feature will be revisited in future schema updates.
  • Newly released data annotation engine added geolocation and some metadata to tests from 2016 to present.
  • Published a series of beta BigQuery views for NDT data, to allow data queries across both v2 and v3.0.x tables.
  • Published traceroute and sidestream table to replace v2 versions, migrated data, re-annotated data.

v3.0.2 - December 2017

  • Standardized the naming scheme for BigQuery table and view names to be consistent with new semantic versioning.
  • All tables and views must be queried using StandardSQL, except for views with “legacysql” in the name.
  • Views for tests other than NDT may be published in the future using the same format:
    • <test>_all_<version> (standardSQL)
    • <test>_all_legacysql_<version>
  • Complete documentation for tables, views, the contents of views, and what data they limit (where applicable) will be published on this page.
  • Views will be published concurrently with new table schemas, such that all table versions will have corresponding views.
  • Previous versions of our tables will be referenced by versions 1.0, 2.0, etc. in our documentation but actual table names will not be changed.
  • Re-ran historical annotations for traceroute, npad, and sidestream data due to a bug where some geolocation annotations was not present in all past test data.

v3.1 - February 2018

  • First official release of v3 tables, with all historical data re-parsed, and annotated with geolocation metadata.

v3.1.1 - July 2018

  • Publish official Switch tables from the DISCO dataset.

Published tables and views are:

  • measurement-lab.legacy.ndt (data ~ 2015-01-01 - 2017-05-10)
  • measurement-lab.legacy.ndt_pre2015 (data ~ 2009-02-18 - 2014-12-31)
  • measurement-lab.base_tables.ndt
  • measurement-lab.base_tables.switch

  • measurement-lab.rc
  • measurement-lab.release_v3_1
  • measurement-lab.release
    • measurement-lab.release.ndt_all
    • measurement-lab.release.ndt_all_legacysql
    • measurement-lab.release.ndt_downloads
    • measurement-lab.release.ndt_downloads_legacysql
    • measurement-lab.release.ndt_uploads
    • measurement-lab.release.ndt_uploads_legacysql

Planned Additions/Enhancements for Future Releases

  • Begin publishing new Sidestream and Paris Traceroute data to tables in the base_tables dataset:
    • measurement-lab.base_tables.sidestream
    • measurement-lab.base_tables.traceroute
  • Publish release BigQuery Views for Sidestream and Paris Traceroute
  • Publish Sidestream and Paris Traceroute collected prior to the switch to base_tables in the legacy dataset
    • measurement-lab.legacy.sidestream
    • measurement-lab.legacy.traceroute
  • Move historical NPAD data into the legacy dataset
    • measurement-lab.legacy.npad
Back to Top