Overview
Measurement data from many experiments hosted on M-Lab are processed via the ETL pipeline and published in two forms:
- Google Cloud Storage
- M-Lab publishes raw output from many measurement tests on Google Cloud Storage as file archives.
- See M-Lab Google Cloud Storage documentation for more information.
- Google BigQuery
- M-Lab parses data for a subset of tests and publishes the data on BigQuery so that users can run SQL queries on the data.
- See M-Lab BigQuery QuickStart for more information.
Some M-Lab hosted tests do not use our ETL pipeline. Data for these tests are published independently by the test developers.
There is typically at least a 24-hour delay between data collection and data publication. Below we provide links to data for our Active tests and archival data from Inactive tests.
M-Lab also publishes public data sets about the M-Lab Platform, listed below.
Measurement Data (Active Tests)
- BISmark
- BISmark measures Internet service provider (ISP) performance and traffic inside home networks. BISmark data is not processed by the M-Lab ETL Pipeline.
- More information is availabla on the Project BISmark website
- BISmark Raw Data
- MobiPerf
- MobiPerf is an open source application for measuring network performance on mobile platforms. MobiPerf data is not processed by the M-Lab ETL Pipeline.
- More information is available on the MobiPerf website
- MobiPerf Raw Data
- Neubot
- Neubot measures the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship. Neubot data is processed by the M-Lab ETL Pipeline.
- More information is available at Nexa Center and Github.
- Neubot Raw Data
- NDT
- Network Diagnostic Tool (NDT) measures characteristics of a TCP connection under heavy load. NDT data is processed by the M-Lab ETL Pipeline.
- More information is available at Internet2 and Github.
- NDT Raw Data - NDT BigQuery Dataset
- NPAD
- Network Path and Application Diagnosis (NPAD) diagnoses issues in a network path that can degrade network performance. NPAD data is processed by the M-Lab ETL Pipeline.
- More information is available from archived UCAR pages and Github.
- NPAD Raw Data
- OONI
- OONI measures censorship, surveillance, and traffic manipulation on the Internet. OONI data is not processed by the M-Lab ETL Pipeline.
- More information is available at OONI
- OONI Raw Data
- Paris Traceroute
- Paris Traceroute maps network topology between two points on the Internet. Paris Traceroute data is processed by the M-Lab ETL Pipeline.
- More information is available at Paris Traceroute
- Paris Traceroute Raw Data - Paris Traceroute BigQuery Dataset
- Reverse Traceroute
- Reverse traceroute measures the network path back to a user from selected network endpoints, and provides a rich source of information on network routing and topology. Reverse Traceroute data is not processed by the M-Lab ETL Pipeline.
- More information is available at Reverse Traceroute
- Reverse Traceroute Raw Data
- SamKnows
- The SamKnows performance testing platform is used by the USA’s Federal Communications Commission (FCC), European Commission, UK government (Ofcom), Brazilian government (Anatel), Singapore’s IDA and other government-backed studies worldwide.
- SamKnows infrastructure includes off-net test servers hosted by M-Lab, and the M-Lab and SamKnows teams coordinate regularly to support the various regulatory reporting periods of data collection conducted by SamKnows.
- More information is available at the SamKnows website
- SideStream
- SideStream collects TCP state information about completed TCP connections on a system. Sidestream data is processed by the M-Lab ETL Pipeline.
- More information is available on Github.
- Sidestream Raw Data - Sidestream BigQuery Dataset
Measurement Data (Platform Data)
- M-Lab Collectd
- M-Lab Collectd is a monitoring tool for M-Lab slices, which collects resource utilization information about all M-Lab servers.
- More information is available on Github.
- M-Lab Collectd Raw Data.
- M-Lab DISCO Switch Telemetry Data
- Since June 2016, M-Lab has collected high resolution switch telemetry for each M-Lab server and site uplink and published it as the DIScard COllection (a.k.a. DISCO) dataset.
- More information is available in the blog post announcing this dataset provides more information about the DISCO dataset.
- M-Lab DISCO Raw Data - M-Lab DISCO BigQuery Dataset
Historical Data Sets (e.g. Retired Tests)
- Glasnost
- Glasnost detected prioritization or censorship of network traffic.
- More information is available at MPI SWS and Github.
- Glasnost Raw Data (archived)
- Pathload2
- Pathload2 measured the available bandwidth of an Internet connection.
- More information is available at https://code.google.com/p/pathload2-gatech/.
- Pathload2 Raw Data (archived)
- ShaperProbe
- ShaperProbe detected prioritization of network traffic.
- Shaperprobe Raw Data (archived)
- Windrider
- WindRider attempted to detect whether your mobile provider was performing application- or service-specific differentiation.
- More information is available at Windrider.
Data License and Citing M-Lab Data
All data collected by M-Lab tests are available to the public without restriction under a No Rights Reserved Creative Commons Zero Waiver.
Please cite M-Lab data sets as follows:
The M-Lab test name Data Set, date range used. M-Lab test URL
For example:
The M-Lab NDT Data Set 2009-02-11–2015-12-21. https://measurementlab.net/tests/ndt