Implementing a flight search engine

Graph traversal with an IDFFS algorithm

Mar 17, 2026

The full code for Skymesh is publicly available at https://github.com/adriangabardo/skymesh

This article’s code specifically is available at https://github.com/adriangabardo/skymesh/releases/tag/v3.0.1

Overview

In previous articles in the series, we described the abstract idea of what we were trying to achieve, then laid the foundations for ingesting aviation data (airports, airlines, planes) and representing it as a graph model.

The graph structure is a great starting point, but it is ultimately just a way to organise data. On its own, it does not do anything. In this article, we take the next step and give the graph a concrete, real-world use case: building a simple flight search engine.

The goal is deliberately modest. Given an origin airport X and a destination airport Y, we want to enumerate all reasonable flight routes that connect the two. Not the cheapest routes, not the fastest routes, and not even the “best” routes - just routes that make structural sense in the context of the network.

Later in the series, we will introduce additional constraints such as travel dates, maximum journey duration, and mock cost models. And ultimately, we will use live data for these API calls. For now, we focus on the most fundamental capability: turning a static graph into a system that can answer questions.

What “Flight Search” Means at This Stage

At this stage in the project, “flight search” has a very specific and deliberately limited meaning.

We are not trying to determine the cheapest, fastest, or best route. Instead, we are answering a simpler question:

Given an origin and a destination, which routes through the network are reasonable enough to consider at all?

In a large aviation graph, the number of possible paths grows extremely quickly. Without constraints, a traversal will happily return routes that are technically valid but completely unrealistic from a human perspective.

To keep the search grounded, the implementation applies a small set of explicit constraints.

The caller can control:

the minimum number of routes to return
the maximum number of routes to return
the maximum number of legs per route

In addition, the system enforces a hard upper bound on the number of legs, regardless of user input. This prevents the search from blowing up in dense parts of the graph.

Together, these constraints ensure that direct routes are discovered first, multi-stop routes are explored gradually, and the result set stays small and usable.

Further constraints - such as minimum layover times and maximum total travel duration - are intentionally deferred and will be introduced later in the series, once this foundational search behaviour is in place.

The Final Version of the Search Algorithm

Without diving further into depth-first search (DFS), which is well documented by others, the figure below illustrates the order in which routes are explored and where branches are discarded.

Graph traversal using an Iterative Deepening Depth-First Search (IDDFS) strategy.

When told to find up to 3 routes, of up to 2 legs away from the starting node, the traversal will do the following:

Start in SYD, visit MEL, find SYD→MEL as a direct route
Start in SYD, visit ADL, visit MEL, find SYD→ADL→MEL as a route within the constraints
Start in SYD, visit AKL, then:
- Visit ZQN, discard the route as we are already 2 legs away and haven’t reached the desired destination
- Visit MEL, find SYD→AKL→MEL as a route within the constraints

The final version of the search algorithm is designed to produce sensible results without introducing any notion of optimisation or ranking.

It follows a few simple principles:

routes are discovered in order of increasing number of legs
the search stops as soon as enough routes are found
the total number of routes returned is capped
cycles are explicitly disallowed

The implementation lives in a single function, find_flight_routes(). Given an origin, a destination, and a small set of bounds, it returns a list of FlightRoute objects representing candidate routes through the network.

Rather than searching “up to N legs” in one pass, the algorithm iterates over leg counts and performs a constrained depth-first search for each one. Direct routes are explored first, followed by one-stop routes, then longer connections if required.

At a high level, the algorithm looks like this:

for legs in range(1, max_legs + 1):
    run DFS that only accepts routes with exactly `legs` hops
    collect routes that reach the destination

    if enough routes have been found:
        stop searching

    if hard maximum route count reached:
        stop immediately

This approach keeps the traversal predictable and prevents longer, lower-quality routes from overwhelming shorter and more obvious ones.

The search algorithm as presented in this article is available here: https://github.com/adriangabardo/skymesh/blob/v3.0.1/src/services/path_finder.py

Observing the Algorithm in Practice

With the implementation in place, the most useful thing to do is simply run it and inspect the output.

By running the search engine with different combinations of origin and destination airports, we can see how the algorithm behaves as the network changes. Direct routes appear first, followed by one-stop routes, then longer connections only when necessary.

$ ./run_local.sh
Skymesh graph loaded
Airports (nodes): 6072
Routes (edges): 37042

============================================================
Searching routes: SYD -> MEL
Search for flight routes complete.
SYD -> MEL (1 legs)
SYD -> ADL -> MEL (2 legs)
SYD -> AKL -> MEL (2 legs)


============================================================
Searching routes: YYC -> SYD
Search for flight routes complete.
YYC -> LAX -> SYD (2 legs)
YYC -> NRT -> SYD (2 legs)
YYC -> SFO -> SYD (2 legs)


============================================================
Searching routes: LHR -> JFK
Search for flight routes complete.
LHR -> JFK (1 legs)
LHR -> TXL -> JFK (2 legs)
LHR -> DEL -> JFK (2 legs)


============================================================
Searching routes: CDG -> DXB
Search for flight routes complete.
CDG -> DXB (1 legs)
CDG -> HAM -> DXB (2 legs)
CDG -> IST -> DXB (2 legs)

What’s Next

At this point, we have a working flight search engine in the most literal sense. Given an origin and a destination, the system can traverse the aviation graph and return a small, sensible set of candidate routes.

The next set of challenges are no longer about correctness, but about cost of computation.

So far, route discovery has been relatively cheap. Routes are purely structural, and each candidate can be evaluated with minimal work. That will change quickly as we start layering real-world constraints on top of the search.

In the next articles in the series, we will deliberately make the search engine heavier.

First, we’ll introduce temporal constraints. Routes will become time-aware paths with departure and arrival times, minimum connection windows, and an overall travel duration. Each candidate route will require additional validation, and many structurally valid routes will be rejected late in the process.

Next, we’ll introduce mock pricing and cost functions. Instead of simply enumerating routes, the system will start attaching weights to them, allowing us to reason about trade-offs between different options. At that point, route evaluation stops being trivial and becomes meaningfully expensive.

Only once this additional complexity is in place will we turn our attention to performance.

By first adding computationally heavy features and only then optimising, we get a clear before-and-after comparison. We can observe where the system slows down, which parts of the algorithm dominate runtime, and which optimisations actually matter.

That sets the stage for the next phase of the series: improving performance through caching, memoisation, and selective precomputation, without changing the core traversal logic.

Before moving forward, however, it’s worth reflecting on how this implementation evolved and why some of the earlier, more naive approaches fell short.

Lessons Along the Way

The Naive First Attempt

The first version of the search algorithm was a straightforward depth-first search with a maximum depth.

Starting from the origin airport, the DFS would explore outgoing edges, recursively expand paths, stop once a given number of legs was reached, and record any path that ended at the destination.

From a correctness standpoint, this worked. All valid paths up to the maximum number of legs were found.

In practice, however, the output quickly became unusable.

As a concrete example, querying a simple pair like SYD → MEL - which has plenty of direct connectivity - returned 45 different routes within a four-leg limit. Most of these routes were technically valid graph paths, but many of them went around the globe before eventually reaching Melbourne.

From the algorithm’s point of view, this behaviour was expected. A depth-first search with a depth limit will happily enumerate every path that fits the constraint. From a user’s point of view, however, the result was clearly not useful.

Rethinking the Search Strategy

The problem was not that DFS was the wrong tool, but that the search strategy was too permissive.

A depth limit alone does not meaningfully capture what “reasonable” means in the context of flight search. Treating all paths up to a given depth as equally interesting allows long, low-quality routes to drown out shorter and more obvious ones.

The key shift was to stop thinking in terms of “up to N legs” and instead search in order of increasing complexity. Shorter routes should always be discovered first, and longer routes should only be considered when necessary.

Bounding the Search in Practice

This change led naturally to the current implementation, which combines progressive deepening with a small set of hard bounds.

By enforcing:

a capped maximum number of legs
a minimum number of routes before early termination
a hard maximum on returned routes

the search becomes predictable and tractable, even in dense parts of the network.

These bounds are not optimisations in the traditional sense. They are structural limits that define the shape of the search space and keep the system aligned with real-world expectations.

Gabardo Engineering

Discussion about this post

Ready for more?