PyrOSM: working with Open Street Map data

Efficient geospatial manipulations for OSM map data

Dea Bardhoshi
Towards Data Science

--

Photo by Tabea Schimpf on Unsplash

If you’ve worked with OSM data before, you know it’s not the easiest to extract. OSM data can be huge, and finding performant solutions for what you want to analyze is often a challenge. PyrOSM is a package that makes the process of reading in and working with OSM data much more efficient. How? Well, PyrOSM is build on Cython (C Python) and it uses faster libraries for deserializing OSM data as well as smaller optimizations like numpy arrays which allows it to process data fast. Especially if you’ve used OSMnx before (for very similar usecases), you know that large datasets take a very long time to load into memory, which is where PyrOSM can help you work with them. Let’s get into what this library can do!

🌎 PBF Data

Let’s talk a bit about the specific file format that OSM data comes in. PBF stands for “Protocolbuffer Binary Format” and it is very efficient for working with OSM data is stored. OSM data is organized in “fileblocks”, which are groups of data that can be independently encoded or decoded. Fileblocks contain PrimitiveGroups, which in turn include thousands of OSM entities, like nodes, ways and relations.

The data can be scaled according to the user’s desired level of granularity. For instance, the current OSM database’s resolution is around ~1 cm. In fact, if you wanted, you could download the entirety of Open Street Maps data into one file, known as Planet (around 1000 Gb of data)!

👩‍💻 PyrOSM Basics: reading in datasets

PyrOSM is a package that reads in Open Street Map’s PBF data based on two main data distributors: Geofabrik (world and country-level data) and BBBike (city-level data). The package allows the user to access many types of features:

  • Buildings, POIs (points of interest), Land Use
  • Street Networks
  • Custom Filters
  • exporting as networks
  • and more!

There are 235 cities across the world currently supported by BBBike, and you can get access to the full list easily by calling the “sources.cities.available” method. Getting started is easy enough, you simply initialize an OSM reader object and load in the data you want:

From this point, you would need to be using the OSM object to interact with the Berkeley data. Now let’s get the Berkeley street network for driving:

Dataframe for Berkeley’s OSM street network

Printing out the actual street_network object shows it is stored in a GeoPandas GeoDataFrame with all the OSM attributes like length, highway, maxspeed etc., which can be very handy for further analysis.

Side Note: BBBikes (the source provider of this data) has many more data formats of different sizes, including Organic Maps OSM, Garmin OSM or SVG Mapnik depending on what your use case is.

🔍 Better Filtering

The results of the data loading above include all of Berkeley’s data and in fact even data from the cities neighboring it, which is not ideal. What if you want a much smaller or more specific area? That’s where using a bounding box comes in. To make a bounding box you can either:

  • Manually specify a list of 4 coordinates in the format of [minx, miny, maxx, maxy]
  • pass in Shapely geometries (e.g a LineString or Multipolygon)

To find bounding box coordinates, I typically use this bbox finder website that lets you make rectangles and then copy the coordinates. Here’s how to bound the area around UC Berkeley’s campus and get its walking network:

Street network using a bounding box

🎯 Exporting and Working with Graphs

Another good thing about PyrOSM is how it allows for network processing and connecting to other network analysis libraries. In addition to saving street networks as geodataframes, PyrOSM lets you extract nodes and edges by storing them in 2 separate dataframes. Here’s the nodes one:

Dataframe of nodes from the street network

If you have these graph representations, it’s very easy to save them in various formats: OSMnx, igraph and Pandana and work with them there.

💭 Parting Thoughts

This was a short summary of what pyrosm can do for you in your geospatial work! I touched on some methods that can be very useful, like downloading specific datasets from an area, or through bounding the area of interest and also how this relates to other libraries. I think the best things about pyrosm is exactly this: the fact it bridges the gap between huge OSM datasets and the engineering or analytics questions you can answer with it.

Thanks for reading!

--

--

👩‍💻 Data Science UC Berkeley '23 | 🏙 Data Science, Urban Planning, Civic Technology | ✍️ Newsletter: https://deabardhoshi.substack.com/