, , ,

Exploring Graphs in Rust. Yikes.

I’ve been a dog licking my wounds for some time now. Over on my Substack newsletter, I’ve been doing a small series on DSA (Data Structures and Algorithms). I tackled some of the easier stuff first, like Linked Lists, Binary Search, and the like. What’s more, I actually did most of it in Rust, since I’ve possibly, maybe slightly, every so slightly, fallen in love with Rust.

Like most relationships, it vacillates between pure adoration and utter hatred, depending on the problem at hand. When I did a recent article on Graphs, Queues, and BSF, I attempted it in Rust, and was struck a mighty blow, that borrow checker had me down. It seemed doable, but at the time, under time pressure to get the Newsletter out, I reverted to Python and moved on.

Alas, I’m back again, a glutton for punishment. This time I thought I should try another crack at parsing a graph with Rust, but in a real-life situation, no more made-up stuff.  Actual data, actual graph, here we go. All code is on GitHub.

First, a real-life Graph.

In my quest to do a graph in Rust, I started by looking for a real-life data set to work on. Easy enough, Divvy Bike Trips provides an open-source data set, once that includes station_id s (that will become our nodes), as well as the station_ids that started and ended a bike trip (which is our edges).

Here is what the raw data looks like.

you can see the start_station_id and end_station_id are the starting and ending of each bike ride, these will be our nodes, and an edge will be the connection between the two. It’s going to take some massaging to turn this into a graph of data with Rust, especially with me, so let’s get started.

This is the entry point to my main Rust script that downloads and processes this Divvy Bike Trip data into a dataset that can be used for a network/graph.

Generally what happens is we download the zip file, and we then write and extract the zip file. We read the underlying csv file, throw out some duplicates, and write a csv file that will be used as our graph.

Here is my both horrid and beautiful (to me), Rust code to do all that.

Most of the code is downloading the zip file from s3, and then unpacking that zip file. I’m then using the wonderful Polars Rust Dataframe library to read and mess around with the data to get the final csv output I need for my graph/network.

I then put a little Python script together using pyviz to visualize the graph.

Here is the result, the html file output of pyviz that can be opened in a browser. I used the complete graph image above as the introduction image. Here is a zoomed-in version of one of the corners of the graph.

So with all that work out of the way, the real work can begin. Let’s try to solve a play problem with our new network/graph, and hopefully use a popular DSA graph search algorithm to do so.

Let’s pretend we have a use case to find out what is the “route” we can use to get from x node to y node with the least amount of moves. Say we want to suggest to a user who wants to go from one station to another, and we want to suggest a route that includes other stations that past users have taken.

Let’s pick two nodes.

There you have our problem. Node 20119 to node 20104, how can we get there with the least amount of jumps? It’s clear from the network diagram that you can get there in one jump, but that there are also other paths to get there as well that take more jumps.

Let’s say that we don’t know this beforehand though … we need a graph algorithm that can solve this problem. Two of the most popular are BSF (breath search first) and DSF (death search first). Do we start at a node and check every connection node and follow each node to its terminus, or do we scan horizontally so to say, working our way across the breadth?

The problem is, I’m not very good at Rust. In fact, most of the learning I’ve done so far is simply the data munging that I wrote in Rust to simply download the data and transform it into a graph/network in the first place.

This is most of the takeaway for me. I enjoy using Rust for a wide variety of tasks, downloading files, unpacking them, and using Polars for Dataframe stuff. The walking of the graph / network in Rust to find the shortest path? Eh. Either way, here is what I did.

I don’t think the result was/is totally correct.

warning: `graphrs` (bin “graphrs”) generated 4 warnings (run `cargo fix –bin “graphrs”` to apply 4 suggestions)
Finished release [optimized] target(s) in 1.36s
Running `target/release/graphrs`
[“20119”, “627”, “20104”, “20105”, “627”]

From 20119 to 627 to 20104This is a valid path if you look at the above picture of the graph. Code is on GitHub.

Musings

I always enjoy it when I get to play with Rust in real life, just to see in the dirty world of data munging, what it’s actually like to use it. What did I get to do doing this full little project?

  • some async functions
  • some s3 data stuff with aws.
  • zip archives and unpacking
  • dataframes with polars
  • structs vectors strings

I find learning a new language like Rust can be most difficult because you can read, re-read, and try to understand concepts … but I rarely can get those ideas to take root in my mind without doing some “real” task that trips me up and cements into my mind the concept.

I personally have a hard time seeing the average Data Engineering or Data practitioner picking up Rust. The future of Rust in the data world is with those Software Engineers who build distributed systems and other fast tools with Rust and wrap them in a Python layer. The truth is, 90% of data people will only ever touch Python.

 

3 replies
  1. Bart Massey
    Bart Massey says:

    As you say, two ways of exploring an unweighted graph are Depth-First Search (DFS) and Breadth-First Search (BFS). DFS is not guaranteed to find shortest paths, and generally won’t. BFS will. So the choice of BFS was correct.

    I took this on of an evening to see what I ended up with. https://github.com/BartMassey/bfs-rs is the result. Hope it will inspire some ideas for your code.

Comments are closed.