Improved notebook to download buildings from Microsoft

2024-02-14·
Ka Ming FUNG
Ka Ming FUNG
· 3 min read

This notebook works with the Microsoft Building Footprint Dataset to download building footprints for a given area of interest (AOI). I modified the original notebook to download the data to output a parquet file of a GeoDataFrame of buildings, instead of GeoJSON. 🏗️

To find the Jupyter notebook version, please visit the improved notebook on GitHub.

# # """install packages if not already installed"""
# %pip install --upgrade pandas geopandas shapely mercantile tqdm folium

set up environment and local variables

import pandas as pd
import geopandas as gpd
import shapely.geometry
import mercantile
from tqdm import tqdm

output_fn = "example_building_footprints.parquet"

Step 1 - Define our area of interest (AOI)

We define our area of interest (or AOI) as a GeoJSON geometry, then use the shapely library to get the bounding box.

Note: the coordinate reference system for the GeoJSON should be “EPSG:4326”, i.e. in global lat/lon format.

# Geometry copied from https://geojson.io
aoi_geom = {
    "coordinates": [
        [
            [-122.16484503187519, 47.69090474454916],
            [-122.16484503187519, 47.6217555345674],
            [-122.06529607517405, 47.6217555345674],
            [-122.06529607517405, 47.69090474454916],
            [-122.16484503187519, 47.69090474454916],
        ]
    ],
    "type": "Polygon",
}
aoi_shape = shapely.geometry.shape(aoi_geom)
minx, miny, maxx, maxy = aoi_shape.bounds

Step 2 - Determine which tiles intersect our AOI

"""Get tiles for a given bounding box and zoom level"""

quad_keys = set()
for tile in list(mercantile.tiles(minx, miny, maxx, maxy, zooms=9)):
    quad_keys.add(int(mercantile.quadkey(tile)))

quad_keys = list(quad_keys)

print(f"The input area spans {len(quad_keys)} tiles: {quad_keys}")
The input area spans 1 tiles: [21230030]

Step 3 - Download the building footprints for each tile that intersects our AOI and crop the results

This is where most of the magic happens. ✨ We download all the building footprints for each tile that intersects our AOI, then only keep the footprints that are contained by our AOI.

Note: this step might take awhile depending on how many tiles your AOI covers and how many buildings footprints are in those tiles.

# read the quads catalog
df = pd.read_csv(
    "https://minedbuildings.blob.core.windows.net/global-buildings/dataset-links.csv"
)

# extract urls for the quads we want
urls = df.pipe(lambda x: x[x["QuadKey"].isin(quad_keys)])["Url"].unique()

# download the quads and read as one consolidated pandas dataframe
buildings_df_list = [pd.read_json(url, lines=True) for url in tqdm(urls)]

# concatenate the list of dataframes
buildings_df = pd.concat(buildings_df_list)

# concvert to geopandas dataframe
buildings_gdf = (
    buildings_df.pipe(
        lambda x: gpd.GeoDataFrame(
            data=x,
            crs=4326,
            geometry=x["geometry"].apply(shapely.geometry.shape),
        )
    )
    # retain the height information
    .assign(
        height_m=lambda x: x["properties"].apply(
            lambda p: p["height"],
        ),
    )
    # drop the raw column for height and type
    .drop(
        columns=[
            "properties",
            "type",
        ],
        errors="ignore",
    )
)
  0%|          | 0/1 [00:00<?, ?it/s]
# filter to buildings that intersect the AOI
buildings_in_aoi_gdf = buildings_gdf[buildings_gdf.intersects(aoi_shape)]

Step 4 - Save the resulting footprints to a parquet file

# write to parquet
buildings_in_aoi_gdf.to_parquet(output_fn)

Optional sense check

# interactive map
# import folium

# m = buildings_in_aoi_gdf.explore(
#     attr="Google",
#     tiles="https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}",
# )

# folium.TileLayer("OpenStreetMap").add_to(m)

# folium.LayerControl().add_to(m)

# m
# static map
buildings_in_aoi_gdf.plot()

png