by Tianjia Liu
figures by Catherine Ding

On December 7, 1972, the Apollo 17 crew captured the iconic “Blue Marble” photo, a bird’s-eye view of the Earth from 29,000 kilometers above ground. We have long been fascinated by space and the unknown, but here we stopped and looked back at our home. This snapshot marks the first time a person took a photo of Earth that neatly fits within the field of view.

White wispy clouds swirl above dark blue oceans and the Antarctic ice sheet, lush green tropical forests contrast with barren deserts and the fire-scarred African Sahel–this new perspective of Earth lets us see the interplay between geography, ecosystems, and humans at a continental scale. New satellite technology allows us to generate images that capture these geospatial dynamics at much finer detail.

This article describes how satellite data can be used for geospatial studies, and how policies and technical advances over the years have facilitated these studies to bring maximal social benefits.

The Landsat program

The year of 1972 also marked the start of the successful Landsat program, which encompasses a series of satellites launched by the U.S. Geological Survey (USGS) and NASA to image Earth’s land surface. Each pixel in a Landsat image corresponds to 30 meters by 30 meters on the ground, roughly the size of the baseball infield. So, even orbiting thousands of kilometers above Earth’s surface, Landsat can resolve small land features like city neighborhoods and agricultural fields.

Similar to how you take a panorama by slowly moving your phone across a landscape, Landsat sweeps Earth from pole to pole. Landsat images Earth entirely every 16 days, and its orbit is tuned to Earth’s rotation so that it always takes pictures in daylight, at midday. 

We can use Landsat observations in visible light to produce true-color images that capture how the land surface appears to the naked eye, much like how cameras work. What sets Landsat apart from our eyes is its ability to see in infrared wavelengths, which scientists use to more clearly distinguish land features such as wildfires or flooded fields. This is because haze and thin clouds interfere more with data in visible than infrared wavelengths. The combination of visible and infrared bands relays valuable information to scientists about land cover, crop yield, and deforestation.

Unlocking Landsat’s full potential: open data

For a long time, Landsat data were locked up in bulky tapes in government data centers (Figure 1). Each Landsat scene–a subset of a Landsat image that is smaller to facilitate downloading and processing–covers an area of 185 km x 185 km, or roughly the size of Vermont and New Hampshire, and costs between $600-$4,000 for end users to download. For example, it would have cost over $300,000 to procure enough Landsat images from 1999-2008 to cover the contiguous U.S. This high cost largely limited Landsat-based research to those scientists with top funding and resources. In 2008, however, USGS released all Landsat scenes to the public through an online portal at no cost. 

Figure 1. Blast to the past: storing Landsat data in tapes at a government data center.

Making Landsat open access started to level the playing field across research institutions. The transition to an open-access data policy fueled a 20-fold growth in Landsat-related publications. Additionally, a study estimated that users reaped $1.8 billion in economic benefits for public good from the ~2.4 million Landsat scenes downloaded for a variety of applications from wetland restoration to monitoring volcanic hazards.  

Growth and bottlenecks of satellite data

Nearly half a century after the launch of the first Landsat satellite, Landsat 1, we’re still taking pictures of our home planet from space. Landsat 9 is set to be launched in 2021. The human population has doubled, altering Earth’s land surface at breakneck speed through urban growth, industrialization, and agricultural expansion. We have slowly injected a large constellation of satellites other than Landsat into Earth’s orbit to collect data for weather forecasting, disaster response, and ecosystem monitoring while feeding our insatiable appetite for beautiful images.

The research bottleneck of working with Landsat data is our capacity to store, process, and analyze the large number of images, totaling more than 1 petabyte, or 3.4 years of 24/7 HD videos.

Each Landsat scene requires hundreds of megabytes of local storage, and it takes thousands of scenes to cover Earth’s land surface for each time step. Even today, commercial hard drives are a few terabytes – hundreds of times smaller than the total size of Landsat archive. For decades, research using Landsat data had been largely limited to regional studies using one to several scenes, even after 2008.

Beyond limited data storage, personal computers are too slow to efficiently scale up geospatial analysis. Researchers use Geographic Information Systems (GIS) software to analyze geospatial data. The most popular GIS software, ArcGIS, can crash even working with a few scenes. Still, using satellite data is a cheap and efficient way to fill in gaps where we have sparse on-the-ground information.

Let’s say we want to make a simple land cover map of New York City. Using GIS software and a single Landsat scene, we can use a few training points to make a detailed landscape map of New York City, classifying the land cover into 3 classes: water, vegetation, and urban (Figure 2). New cloud-based technology makes this task trivial with central data storage and fast computational speed.

Figure 2. Classification of land cover in New York City. The place marks represent training points for the three land cover classes: vegetation (green), water (blue), and urban (red). The bottom map shows the land cover map for the three classes extrapolated from the training points to the entire New York City area by using Landsat data (modified from the Google Earth Engine code example for San Francisco).

Paradigm shift to cloud computing with big data

In 2010, Google developed a cost-free, web-based GIS platform called Google Earth Engine to tackle these data and computing challenges. Google centrally stores publicly available satellite data on their servers, including Landsat data, so users can immediately access any Landsat scene without the need to download anything themselves. Over the past few years, Earth Engine’s public data catalog has grown to over 20 petabytes of climate, land, ocean, and atmospheric datasets. This vast data catalog is coupled with Google’s computing resources to rapidly and efficiently perform geospatial analyses. In fact, Google used Earth Engine technology to create a composite of millions of Landsat pixels called Blue Marble composite for Google Earth.

Best of all, a web browser and internet connection are the only prerequisites for using Earth Engine. Unlike ArcGIS, there’s no need to worry about buying a license or getting a personal computer with enough storage and computing power. This gives many more students and researchers the opportunity to learn GIS software and use satellite data.

Concluding remarks

From the original Blue Marble photo to Google Earth’s Blue Marble composite, we have come a long way in using satellite data. We now use dozens of satellites to do the work of taking thousands upon thousands of photos for us each day. As a result, we have petabytes of data at our fingertips. In the past decade and a half, Landsat’s open data policy and cloud-based technology like Google Earth Engine have helped make remote sensing accessible to everyone with a web browser.

Satellite data can bring stories related to pressing issues, such as climate change, human conflict, and natural disasters, to life by giving readers a palpable sense of changes in a landscape. With the expected exponential growth in satellite data over the next decade, we stand to reap the benefits as long as technology can also keep data storage and analysis in stride.

Tianjia (Tina) Liu is a third-year PhD student in the Atmospheric Chemistry Modeling Group in the Department of Earth and Planetary Sciences. She studies fires and air quality in India and Indonesia. You can find her on Twitter at @TheRealPyroTina.

Catherine (Xiaoxiao) Ding is a second-year Applied Math Ph.D. student in the School of Engineering and Applied Sciences at Harvard University, where she is studying programmable materials.

For More Information:

2 thoughts on “A Bird’s-Eye View of Earth: Petabytes of satellite data at our fingertips

Leave a Reply

Your email address will not be published. Required fields are marked *