New ETL Pipeline and Transition to New BigQuery Tables

Posted by Chris Ritzo on 2018-02-09
pipeline, bigquery, versioning

Since May 2017, the M-Lab team has been working on an updated, open source pipeline, which pulls raw data from our servers, saves it to Google Cloud Storage, and then parses it into our BigQuery tables. The team is particularly excited about this update because it means that the pipeline no longer relies on closed source libraries.

Transitioning to a New Backend Pipeline and Data Availability

Posted by Chris Ritzo on 2017-05-02
bigquery, data, data analysis, gcs, performance, pipeline, research, platform

M-Lab data is collected from distributed experiments hosted on servers all over the world, processed in a pipeline, and published for free in both raw and parsed (structured) formats. The back end processing component for this has served us well for many years, but it’s been showing its age recently. As M-Lab collects an increasing amount of data thanks to new partnerships, we have been concerned that it will not be as reliable.

Bbr (2)
Bigquery (10)
Bug (2)
Community (4)
Consumer internet (2)
Data (16)
Data analysis (7)
Dataviz (1)
Event (4)
Features (1)
Gcs (2)
Gsoc (1)
Interconnection (6)
Kernel (1)
Microbursts (3)
Neubot (1)
Observatory (5)
Open source (4)
Paris traceroute (3)
Performance (11)
Pipeline (2)
Platform (6)
Research (14)
Ripe (3)
Switch discard (3)
Tcp (1)
Tcp-info (1)
Tos (1)
Traffic congestion (1)
Transparency (2)
Upgrades (1)
Versioning (1)
Virtualization (1)
Visualization (5)
Web100 (1)

M-Lab

Measurement Lab is a partnership between New America's Open Technology Institute, Google Inc., Princeton University's PlanetLab, and other supporting partners.

Learn more about M-Lab

New ETL Pipeline and Transition to New BigQuery Tables

Transitioning to a New Backend Pipeline and Data Availability

Categories

Archive