<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Gabardo Engineering]]></title><description><![CDATA[Gabardo Engineering is a technical publication showcasing applied software engineering work. Topics include system design, algorithm implementation and optimisation, and the practical trade-offs encountered in real-world systems.]]></description><link>https://writing.gabardo.engineering</link><image><url>https://substackcdn.com/image/fetch/$s_!kbes!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F457be54d-f4c6-40bf-bd5a-b96250eb2e4c_675x675.png</url><title>Gabardo Engineering</title><link>https://writing.gabardo.engineering</link></image><generator>Substack</generator><lastBuildDate>Fri, 10 Apr 2026 17:40:01 GMT</lastBuildDate><atom:link href="https://writing.gabardo.engineering/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Adrian Gabardo]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[gabardoengineering@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[gabardoengineering@substack.com]]></itunes:email><itunes:name><![CDATA[Adrian Gabardo]]></itunes:name></itunes:owner><itunes:author><![CDATA[Adrian Gabardo]]></itunes:author><googleplay:owner><![CDATA[gabardoengineering@substack.com]]></googleplay:owner><googleplay:email><![CDATA[gabardoengineering@substack.com]]></googleplay:email><googleplay:author><![CDATA[Adrian Gabardo]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The S3 at Scale Runbook]]></title><description><![CDATA[Detecting and fixing cardinality explosions in production buckets]]></description><link>https://writing.gabardo.engineering/p/the-s3-at-scale-runbook</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/the-s3-at-scale-runbook</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 31 Mar 2026 22:01:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1SAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the previous article we analysed a production system that accumulated <strong>1.56 trillion objects in a single S3 bucket</strong>. The architecture scaled perfectly from a functional perspective &#8212; but nearly triggered a <strong>$7.2M lifecycle transition event</strong>.</p><p>The root cause was not storage volume. It was <strong>object cardinality</strong>.</p><p>This article is the operational companion: a runbook for diagnosing and fixing <strong>small-object explosions in production S3 systems</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1SAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1SAq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 424w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 848w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1SAq!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:15047943,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/189961404?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1SAq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 424w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 848w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 1272w, https://substackcdn.com/image/fetch/$s_!1SAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa7c16ea-9a5d-4472-b65a-e3438df00fa6_4096x2304.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@magicunsplash?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Magic Fan</a> on <a href="https://unsplash.com/photos/a-pile-of-brown-paper-packages-WYJrRinnABY?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><h1>The Operational Model</h1><p>Operating S3 at scale follows a simple lifecycle:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R3Ea!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R3Ea!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 424w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 848w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 1272w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R3Ea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png" width="728" height="138.01790073230268" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:233,&quot;width&quot;:1229,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:44028,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/189961404?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R3Ea!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 424w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 848w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 1272w, https://substackcdn.com/image/fetch/$s_!R3Ea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb89cca3e-f545-41d5-a520-e158beeaea1d_1229x233.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Most large S3 cost failures occur because <strong>observation is missing</strong>.</p><h1>Quick Triage</h1><p>When S3 costs increase unexpectedly, start with one simple check.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{AverageObjectSize} = \\frac{\\text{BucketSizeBytes}}{\\text{NumberOfObjects}}&quot;,&quot;id&quot;:&quot;SDDMCGIGKD&quot;}" data-component-name="LatexBlockToDOM"></div><p>If this number becomes too small, the system is accumulating fragmented artifacts.</p><h3>Rule of thumb</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">&gt; 10 MB     &#8594; Healthy
1&#8211;10 MB     &#8594; Acceptable
&lt; 1 MB      &#8594; Fragmentation risk
&lt; 128 KB    &#8594; CRITICAL</code></pre></div><p><strong>128 KB matters</strong> because:</p><ul><li><p>by default, lifecycle rules do not apply to objects sized &#8805;128KB</p></li><li><p>some storage classes bill small objects as if they were <strong>128 KB minimum</strong></p></li></ul><p>Once the average object size drops below this boundary, the system shifts into <strong>object-dominated pricing behaviour</strong>.</p><h1>Diagnosis</h1><h2>Step 1 &#8212; Check bucket metrics</h2><p>The quickest way to detect a cardinality issue is to compute <strong>average object size</strong> directly from CloudWatch metrics.</p><p>CloudWatch already exposes the required metrics:</p><ul><li><p>AWS/S3 BucketSizeBytes</p></li><li><p>AWS/S3 NumberOfObjects</p></li></ul><h3>CloudWatch Dashboard Widget</h3><p>Paste the following JSON into a CloudWatch dashboard (<strong>Source view</strong>) and replace the bucket name.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">{
  &#8220;metrics&#8221;: [
    [ &#8220;AWS/S3&#8221;, &#8220;NumberOfObjects&#8221;, &#8220;BucketName&#8221;, &#8220;example-bucket&#8221;, &#8220;StorageType&#8221;, &#8220;AllStorageTypes&#8221;, { &#8220;id&#8221;: &#8220;m1&#8221;, &#8220;stat&#8221;: &#8220;Sum&#8221;, &#8220;label&#8221;: &#8220;objects&#8221;, &#8220;visible&#8221;: false } ],
    [ &#8220;.&#8221;, &#8220;BucketSizeBytes&#8221;, &#8220;.&#8221;, &#8220;.&#8221;, &#8220;.&#8221;, &#8220;StandardStorage&#8221;, { &#8220;id&#8221;: &#8220;m2&#8221;, &#8220;yAxis&#8221;: &#8220;right&#8221;, &#8220;label&#8221;: &#8220;size&#8221;, &#8220;visible&#8221;: false, &#8220;stat&#8221;: &#8220;Maximum&#8221; } ],
    [ { &#8220;expression&#8221;: &#8220;(m2/m1) / 1024&#8221;, &#8220;label&#8221;: &#8220;average object size (KB)&#8221;, &#8220;id&#8221;: &#8220;e1&#8221; } ]
  ],
  &#8220;view&#8221;: &#8220;timeSeries&#8221;,
  &#8220;stacked&#8221;: false,
  &#8220;region&#8221;: &#8220;us-east-1&#8221;,
  &#8220;period&#8221;: 86400,
  &#8220;stat&#8221;: &#8220;Average&#8221;
}</code></pre></div><p>This widget will compute average object size and display it in <strong>kilobytes</strong>.</p><blockquote><p><strong>This is the single most useful metric for detecting S3 cost drift.</strong></p></blockquote><p>If the graph trends downward toward <strong>128 KB</strong>, the bucket is likely accumulating fragmented artifacts faster than storage volume is growing.</p><h2>Step 2 &#8212; Verify lifecycle eligibility</h2><p>Average object size tells you <strong>that fragmentation exists</strong>.</p><p>The next step is determining whether <strong>lifecycle remediation will actually work</strong>. This requires analysing the <strong>distribution of object sizes</strong>.</p><p>Why this matters:</p><p>Many Glacier storage classes <strong>do not transition objects smaller than 128 KB by default</strong> (but can be explicitly configured to do so).</p><p>If most objects fall below this threshold, either lifecycle transitions won&#8217;t be triggered with default behaviour, or risk incurring large costs if configured for smaller object sizes.</p><h3>Example Athena query</h3><p>Using <strong>S3 Inventory</strong>:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">SELECT
  CASE
    WHEN size &lt; 32768 THEN &#8216;&lt;32KB&#8217;
    WHEN size &lt; 131072 THEN &#8216;32KB&#8211;128KB&#8217;
    WHEN size &lt; 1048576 THEN &#8216;128KB&#8211;1MB&#8217;
    ELSE &#8216;&gt;1MB&#8217;
  END AS size_bucket,
  COUNT(*) AS object_count
FROM s3_inventory_table
GROUP BY 1
ORDER BY 1;</code></pre></div><p>Example output:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">&lt;32 KB        : 420B objects
32KB&#8211;128KB    : 230B objects
128KB&#8211;1MB     : 50B objects
&gt;1MB          : 20B objects</code></pre></div><p>Interpretation:</p><pre><code>&#8594; ~650B objects below 128 KB
&#8594; lifecycle transitions will be largely ineffective</code></pre><p>In this scenario, lifecycle transitions would generate massive <strong>transition costs</strong> while producing minimal storage savings.</p><h3>Monitoring</h3><p>Once the issue is diagnosed, implement continuous monitoring. The same average object size metric applies.</p><p>Tracking this value over time provides an <strong>early signal of fragmentation</strong> long before cost increases appear on billing dashboards.</p><h3>Alerting</h3><p>A practical alert should trigger when: <code>AverageObjectSize &lt; 128 KB </code>for <strong>three consecutive days</strong>.</p><p>Requiring multiple evaluation periods avoids alerts caused by temporary ingestion spikes or batch jobs.</p><h4>AWS CDK example</h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;typescript&quot;,&quot;nodeId&quot;:&quot;e092b84f-4672-4ef3-91e3-107e24a8f4ec&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-typescript">const bucketName = 'your-bucket-name';

// Core S3 metrics
const objects = new cloudwatch.Metric({
  namespace: 'AWS/S3',
  metricName: 'NumberOfObjects',
  dimensionsMap: { BucketName: bucketName, StorageType: 'AllStorageTypes' },
  statistic: 'Sum',
  period: Duration.days(1),
});

const bytes = new cloudwatch.Metric({
  namespace: 'AWS/S3',
  metricName: 'BucketSizeBytes',
  dimensionsMap: { BucketName: bucketName, StorageType: 'StandardStorage' },
  statistic: 'Maximum',
  period: Duration.days(1),
});

// Average object size = bytes / objects
const avgSize = new cloudwatch.MathExpression({
  expression: 'bytes / objects',
  usingMetrics: { bytes, objects },
  label: 'avg object size (bytes)',
});

// Alert if &lt;128 KB for 3 days
new cloudwatch.Alarm(this, 'S3CardinalityAlarm', {
  metric: avgSize,
  threshold: 128 * 1024,
  evaluationPeriods: 3,
  datapointsToAlarm: 3,
  comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
});</code></pre></div><h1>Remediation</h1><p>Once fragmentation is confirmed, remediation must be planned carefully. At large scale, <strong>fixing the dataset can itself be expensive</strong>.</p><h3>Step 1 &#8212; Estimate lifecycle transition cost</h3><p>Lifecycle transitions are priced <strong>per 1,000 objects</strong>. Estimate remediation cost before enabling transitions:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\text{TransitionCost} = \\frac{\\text{ObjectCount}}{1000} \\times \\text{PricePer1000}&quot;,&quot;id&quot;:&quot;OVVVEAHCOA&quot;}" data-component-name="LatexBlockToDOM"></div><p>Example:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;5e81a886-82d3-41cf-a3e0-aa1aa2d45423&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">objects = 720_000_000_000
price_per_1000 = 0.01

transition_cost = (objects / 1000) * price_per_1000
print(f&#8221;${transition_cost:,.0f}&#8221;)</code></pre></div><p>Output:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;$7,200,000&quot;,&quot;id&quot;:&quot;VLKWGTRXRE&quot;}" data-component-name="LatexBlockToDOM"></div><p>This calculation prevented a multi-million-dollar lifecycle event in the system analysed in the previous article.</p><h3>Step 2 &#8212; Execute remediation safely</h3><p>Avoid custom scripts for billion-object operations.</p><p>Instead use <strong>S3 Batch Operations</strong>, which provides:</p><ul><li><p>automatic parallelisation</p></li><li><p>retry handling</p></li><li><p>distributed execution across AWS infrastructure</p></li></ul><p>Batch Operations is designed specifically for <strong>large-scale object-level changes</strong>.</p><h1>Prevention</h1><p>If the system is still evolving, implement patterns to prevent future cardinality explosions.</p><h3>Aggregate small artifacts</h3><p>Bad pattern: <code>request &#8594; write 200 objects</code></p><p>Better pattern: <code>request &#8594; buffer &#8594; write 1 aggregated object</code></p><p>If artifacts are generated in the <strong>KB range</strong>, introduce a buffering layer before writing to S3.</p><h2>Design operational prefixes</h2><p>Structure buckets around operational boundaries.</p><p>Example:</p><pre><code>s3://bucket/
  service-a/
  service-b/
  telemetry/
  snapshots/</code></pre><p>Benefits:</p><ul><li><p>targeted lifecycle rules</p></li><li><p>efficient Athena queries</p></li><li><p>easier remediation</p></li></ul><h2>Prefer metadata for read-time inspection</h2><p>If object attributes must be inspected during reads: Use <strong>object metadata</strong>.</p><p>Use <strong>tags</strong> primarily for:</p><ul><li><p>lifecycle rules</p></li><li><p>governance policies</p></li></ul><p>Metadata avoids additional API calls when reading large numbers of objects.</p><h2>Operational Checklist</h2><h4>When S3 costs spike unexpectedly</h4><p>1&#65039;&#8419; Check object count<br>2&#65039;&#8419; Compute average object size<br>3&#65039;&#8419; If <strong>&lt;128 KB &#8594; fragmentation detected</strong><br>4&#65039;&#8419; Verify lifecycle eligibility (size distribution)<br>5&#65039;&#8419; Estimate remediation cost<br>6&#65039;&#8419; Use Batch Operations for large-scale fixes</p><h2>Conclusion</h2><p>The system analysed in the previous article scaled perfectly. It simply became <strong>economically unstable</strong>.</p><p>At petabyte scale, object storage is no longer purely a storage problem. It becomes a <strong>cardinality management problem</strong>. </p><p><strong>Treat</strong> <strong>object count as a first-class operational metric</strong>.</p><p>Otherwise, a perfectly functioning system can quietly drift into a <strong>multi-million-dollar bill</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Gabardo Engineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Cost Modelling as an Architectural Constraint: An S3 Case Study]]></title><description><![CDATA[A trillion-object storage architecture that scaled functionally, and nearly triggered a $7.2M lifecycle event.]]></description><link>https://writing.gabardo.engineering/p/cost-modelling-as-an-architectural</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/cost-modelling-as-an-architectural</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 31 Mar 2026 22:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!pHJL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS S3 is widely viewed as inexpensive, effectively unbounded object storage. At moderate scale, this assumption holds. At extreme object cardinality, it fails.</p><p>This article analyses a production system that accumulated 5.6 PB of data across 1.56 trillion objects in a single bucket. Within one year, monthly storage cost increased from approximately $100k to over $400k, with forecasts exceeding $1M per month. The root cause was not data volume alone, but architectural fragmentation misaligned with S3&#8217;s pricing model.</p><p>By consolidating objects to align with S3&#8217;s economic structure, equivalent logical data could have been stored at up to 37&#215; lower monthly cost. This case demonstrates that cost modelling must be treated as an architectural constraint.</p><h2>A More Accurate AWS S3 Cost Model Representation</h2><p>Object storage is frequently approximated as purely volumetric:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;C \\propto V&quot;,&quot;id&quot;:&quot;WGZDBKVZVZ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Where:</p><ul><li><p><em>C</em> = monthly cost</p></li><li><p><em>V</em> = stored volume</p></li></ul><p>This mental model is incomplete. A more accurate representation of S3 pricing is:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;C =\nV \\cdot P_{GB}\n+\nN_{req} \\cdot P_{req}\n+\n\\frac{N_{transition}}{1000} \\cdot P_{1000}&quot;,&quot;id&quot;:&quot;JZWWIEBSTV&quot;}" data-component-name="LatexBlockToDOM"></div><p>Where:</p><ul><li><p><em>C</em> = Total monthly S3 cost</p></li><li><p><em>V</em> = Total stored volume (GB)</p></li><li><p><em>Pgb</em><strong>&#8203;</strong> = Price per GB-month of storage</p></li><li><p><em>Nreq&#8203;</em><strong>&#8203;</strong> = Total request count (PUT / GET / LIST)</p></li><li><p><em>Preq&#8203;</em><strong>&#8203;</strong> = Price per request</p></li><li><p><em>s&#713;</em> = Average object size (GB)</p></li><li><p><em>P1000</em><strong>&#8203;</strong> = Price per 1,000 object-level operations (e.g., lifecycle transitions)</p></li></ul><p>At small scale, object count is negligible relative to volume but at larger scales, it becomes a first-order variable. The structural quantity that determines whether object count matters is average object size:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\bar{s} = \\frac{V}{N}&quot;,&quot;id&quot;:&quot;EMUKEXRDFH&quot;}" data-component-name="LatexBlockToDOM"></div><p>Where:</p><ul><li><p><em>N</em> = total object count</p></li></ul><p>Inversely, we can express object count in terms of total volume and average object size:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;N = \\frac{V}{\\bar{s}}&quot;,&quot;id&quot;:&quot;QEQUGPIIPN&quot;}" data-component-name="LatexBlockToDOM"></div><p>This allows us to restate the object-driven components of the cost model directly in terms of architectural granularity. For a fixed total volume, reducing average object size necessarily increases object count &#8212; and therefore amplifies any per-object pricing terms.</p><p>We can now use the following cost equation to calculate total cost explicitly as a function of average object size:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;C =\nV \\cdot P_{GB}\n+\nN_{req} \\cdot P_{req}\n+\n\\frac{1}{1000}\n\\cdot\n\\left(\n\\frac{V}{\\bar{s}}\n\\right)\n\\cdot\nP_{1000}&quot;,&quot;id&quot;:&quot;TYQYFRCSUS&quot;}" data-component-name="LatexBlockToDOM"></div><p>This formulation makes object granularity a first-class variable in the cost model. Rather than treating object count as an opaque operational metric, it becomes a direct consequence of architectural design.</p><p>When <em>s&#713;</em> falls into the kilobyte regime, per-object pricing dominates. The system no longer behaves like bulk storage, instead it&#8217;s more akin to a massively distributed index - except you are paying storage-layer economics for index-layer behavior.</p><h2>A Real-World Example</h2><p>In the system analyzed:</p><ul><li><p>5.6 PB stored</p></li><li><p>1.56 trillion objects</p></li><li><p>~3.5 KB average object size</p></li><li><p>$400k/month storage cost</p></li><li><p>~$50k/month in request charges</p></li></ul><p>The architecture generated hundreds of small snapshot artifacts per back end request. Over time, fragmentation compounded. Object count grew faster than volume.</p><p>An empirical consolidation experiment showed that equivalent logical data could be stored in artifacts averaging ~2.5 MB.</p><p>This would have reduced storage cost from $457,000 to $11,900 per month for the same volume of data. This represents a 37&#215; structural reduction.</p><p>This reduction was not due to compression or deletion of data. The total logical volume remained constant. Only object granularity changed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pHJL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pHJL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 424w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 848w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pHJL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png" width="1400" height="1000" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca422df6-6444-480a-aedb-10d90f136114_1400x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1000,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101448,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/188994721?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pHJL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 424w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 848w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!pHJL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca422df6-6444-480a-aedb-10d90f136114_1400x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>The $7M Lifecycle Event</h2><p>Another (nearly) expensive lesson came from trying to fix our exponentially increasing storage costs. Without permanently losing important insights into our service, the only cost saving alternative we believed to have was resorting to lifecycle transition policies. </p><p>This solution could have had a seven figure cost, as lifecycle transitions are priced per 1,000 objects. At one point, a single bucket contained approximately 720 billion objects. Meaning:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;720,000,000,000 / 1,000 &#215; $0.01 = $7,200,000&quot;,&quot;id&quot;:&quot;PVVHYQCOGX&quot;}" data-component-name="LatexBlockToDOM"></div><p>Approximately $7.2 million &#8212; for a single lifecycle rule on a single bucket.</p><p>This excludes ongoing storage cost and retrieval penalties.</p><p>The transition did not execute because most objects were smaller than 128 KB, which do not transition by default.</p><p>Ironically, the same fragmentation pattern that caused excessive steady-state cost also prevented an even larger transition bill.</p><h2>A Cost Calculation Framework</h2><p>To prevent similar failures, object storage systems should be evaluated across three explicit budgets.</p><h4>1. Volume Budget</h4><p>Projected monthly storage cost.</p><h4>2. Cardinality Budget</h4><p>Total object count and average object size.</p><p>If average object size falls below a defined threshold (e.g. 1&#8211;10 MB for snapshot systems), object count becomes a risk indicator.</p><h4>3. Remediation Budget</h4><p>Cost of rewriting, transitioning, or migrating all objects.</p><p>Before implementing lifecycle rules or structural migrations, compute:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;C_{\\text{transition}} = \\frac{\\text{Objects}}{1000} \\cdot \\text{Price}_{1000}&quot;,&quot;id&quot;:&quot;TJVVMDRNGL&quot;}" data-component-name="LatexBlockToDOM"></div><p>If remediation cost exceeds acceptable monthly spend, the architecture is already broken.</p><p>Object count must be monitored alongside stored bytes. Divergence between the two is architectural drift, not growth.</p><h2>Conclusion</h2><p>Our system scaled flawlessly. It simply became unaffordable, under:</p><ul><li><p>Multi-petabyte scale</p></li><li><p>Trillion-object cardinality</p></li><li><p>Monthly cost growth from $100k to $400k</p></li><li><p>Forecast exceeding $1M/month</p></li><li><p>A potential $7M lifecycle event</p></li></ul><p>Object storage is not purely volumetric. It is priced across bytes, objects, and operations.</p><p>At extreme scale, pricing semantics become architectural constraints.</p><p><strong>Cost modelling must be treated as a first-class design discipline.</strong></p><p><em>Disclaimer: Based on public AWS pricing and production experience. Not an official AWS statement.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Oncall Nihilism]]></title><description><![CDATA[Why Your Pager is a Design Failure]]></description><link>https://writing.gabardo.engineering/p/oncall-nihilism</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/oncall-nihilism</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 24 Mar 2026 22:01:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RDFq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>2:17am. The pager goes off.</p><p>An alarm fires: <em>SQS message delay &gt; 5 minutes</em><strong>.</strong></p><p>A batch job somewhere in the system just spiked the queue from 100k messages per minute to 1 million. Your workers are auto scaling, but it takes 10&#8211;15 minutes to catch up.</p><p>Nothing is broken. The processing rate is exactly what it was five minutes ago&#8212;stable, efficient, and maxed out. Nothing needs fixing. But someone once decided that a five-minute delay is worthy of waking another human being.</p><p>You acknowledge the alarm. You watch the queue drain. After all, you cannot make the auto-scaling go any faster, and messages will remain delayed until the ingestion rate catches up to the new queue size. You are there simply to witness the inevitable. Ten minutes later, the system heals itself.</p><p>You go back to sleep &#8212; or try to &#8212; and spend the next 45 minutes staring at the ceiling while the rest of tomorrow quietly collapses.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RDFq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RDFq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 424w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 848w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RDFq!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png" width="1200" height="509.34065934065933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:618,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:9153402,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/190788278?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RDFq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 424w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 848w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!RDFq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1f5c250-c83e-48d5-8c56-12ea02e3f7f6_3168x1344.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The On-Call Knight consulting the runbooks while the Pager Demon demands tribute &#8212; a modern myth of toil, entropy, and alarms.</figcaption></figure></div><p>After enough nights like this, a certain mindset begins to form.</p><p>You start acknowledging alarms with less urgency. You observe the system more than you intervene in it. You learn to distinguish between problems that require action and problems that simply require time. Usually, the system corrects itself.</p><p>Eventually you arrive at a quiet realisation:</p><p>Many pages exist not because the system is failing, but because someone once mistook temporary discomfort for catastrophe.</p><p>This is what I call a <strong>nihilist on-call</strong>.</p><p>Not laziness. Not negligence. Just a gradual understanding that many pages do not correspond to real failures, and that intervention often changes little. You watch the system. You wait. And most of the time, it fixes itself.</p><h2>The Heat Death of the Service</h2><p>Spend enough time on-call and another thought begins to surface. Most services do not fail catastrophically. They decay.</p><p>Dependencies drift. Dashboards fall out of date. Runbooks turn into archaeological artifacts documenting systems that no longer exist. This is operational entropy, and given enough time, every sufficiently complex service approaches its own version of heat death.</p><p><strong>This is where the nihilism takes root.</strong></p><p>When you realize the service is in a state of slow-motion decay, the urgency of the pager starts to feel like a lie. You arrive at the nihilistic realization that you aren&#8217;t &#8220;saving&#8221; the system; you are merely performing an act of penance for a design you didn&#8217;t choose. Many alerts imply that the system is moments away from collapse, but the reality is a slow, gray fade into obsolescence.</p><p>But heat death is only inevitable if you accept the role of the bystander.</p><p>The sense of futility is a protective layer of scar tissue, but it is also a choice. We treat entropy as a law of nature, but in software, entropy is a choice of priority. Most pages are not preventing the heat death of the service because they focus on the symptoms of the decay rather than the decay itself. To move past this futility, we have to stop treating the pager as a death knell and start using it as a diagnostic tool for restoration.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Gabardo Engineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Signal Dilution</h2><p>There is another consequence of this dynamic that only becomes obvious at scale. When a system produces enough meaningless pages, the meaningful ones begin to dissolve into the noise.</p><p>The pager still rings. The alarm still says SEV-2. But psychologically it carries far less weight than it should. Not because engineers have become careless, but because the system itself has trained them to treat alerts with skepticism.</p><p>Alarm inflation is almost inevitable in large organizations. Every team adds alerts intended to protect their own services. Very few alarms are ever removed. Self-healing systems page simply because a metric briefly crossed an arbitrary threshold. Over time the system begins to generate a constant background hum of operational noise.</p><p>This is the real danger of alarm fatigue. Reliability is rarely destroyed by a single catastrophic failure. Much more often it is eroded gradually by a loss of trust in the signals meant to protect the system. The fastest way to make engineers ignore an alarm is to page them repeatedly for events that resolve themselves.</p><h2>The Myth of the Pager</h2><p>In the myth of Sisyphus, a man is condemned to push a boulder up a hill forever, only for it to roll back down each time he reaches the top.</p><p>On-call sometimes feels similar. The pager rings. You acknowledge the alarm. You click through dashboards. The system stabilizes. And somewhere in the background, another alert is already preparing to wake you tomorrow night.</p><p>But unlike Sisyphus, engineers are not actually condemned to this cycle. Most on-call pain is not inevitable. <strong>It is designed.</strong></p><p>To cure the futility of the on-call, we should refuse absurd labor. We should stop engineering better ways to push the boulder and start questioning why it exists at all.</p><h3>Practical Rules for On-Call</h3><h4><strong>No Human Intervention, No Page.</strong></h4><p>If an alert fires and the resolution is simply &#8220;wait for it to clear&#8221; or &#8220;restart the service,&#8221; the rock has rolled back to the bottom.</p><p><em>The Rule:</em> If a human doesn&#8217;t need to make a unique, creative decision to fix it, a computer should be doing it. Do not wake a human for a task a script could do.</p><h4><strong>Page on Symptoms, Not Causes.</strong></h4><p>We should let the system decay in silence if that decay doesn&#8217;t hurt the user.</p><p><em>The Rule:</em> High CPU is a <em>cause</em>, not a <em>symptom</em>. If your system is still serving requests and users aren&#8217;t experiencing errors or noticeable latency, the pager should stay silent. Only wake someone up when the system is actually broken.</p><h4><strong>Delete the &#8220;Flappy&#8221; Alerts.</strong></h4><p>Every recurring alert that you &#8220;acknowledge and ignore&#8221; is a high-priority bug in your monitoring system.</p><p><em>The Rule:</em> If an alert fires three times in a shift and requires no action, <strong>delete it.</strong> Don&#8217;t &#8220;tune&#8221; it. Kill it. If the system doesn&#8217;t break when the alert is gone, it was never an alert. It was noise.</p><h4><strong>Protect the Sleep of Others.</strong></h4><p>High-signal hygiene is a collective pact.</p><p><em>The Rule:</em> Before you create an alarm, ask yourself: <em>&#8220;Am I willing to wake up my best friend at 3:00 AM for this?&#8221;</em> If the answer is no, it shouldn&#8217;t be pageable.</p><h3>Conclusion</h3><p>We should refuse to treat the pager as a tool of penance. The struggle itself toward the heights is enough to fill a man&#8217;s heart, but only if the heights actually exist.</p><p>If the system is going to reach its heat death anyway, we might as well get some sleep.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Gabardo Engineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Implementing a flight search engine]]></title><description><![CDATA[Graph traversal with an IDFFS algorithm]]></description><link>https://writing.gabardo.engineering/p/implementing-a-flight-search-engine</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/implementing-a-flight-search-engine</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 17 Mar 2026 22:00:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!n6_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The full code for Skymesh is publicly available at <a href="https://github.com/adriangabardo/skymesh">https://github.com/adriangabardo/skymesh</a></p><p>This article&#8217;s code specifically is available at <a href="https://github.com/adriangabardo/skymesh/releases/tag/v3.0.1">https://github.com/adriangabardo/skymesh/releases/tag/v3.0.1</a></p><h2>Overview</h2><p>In previous articles in the series, we described the abstract idea of what we were trying to achieve, then laid the foundations for ingesting aviation data (airports, airlines, planes) and representing it as a graph model.</p><p>The graph structure is a great starting point, but it is ultimately just a way to organise data. On its own, it does not <em>do</em> anything. In this article, we take the next step and give the graph a concrete, real-world use case: building a simple flight search engine.</p><p>The goal is deliberately modest. Given an origin airport <strong>X</strong> and a destination airport <strong>Y</strong>, we want to enumerate all <em>reasonable</em> flight routes that connect the two. Not the cheapest routes, not the fastest routes, and not even the &#8220;best&#8221; routes - just routes that make structural sense in the context of the network.</p><p>Later in the series, we will introduce additional constraints such as travel dates, maximum journey duration, and mock cost models. And ultimately, we will use live data for these API calls. For now, we focus on the most fundamental capability: <strong>turning a static graph into a system that can answer questions</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What &#8220;Flight Search&#8221; Means at This Stage</h2><p>At this stage in the project, &#8220;flight search&#8221; has a very specific and deliberately limited meaning.</p><p>We are not trying to determine the cheapest, fastest, or best route. Instead, we are answering a simpler question:</p><blockquote><p><em>Given an origin and a destination, which routes through the network are reasonable enough to consider at all?</em></p></blockquote><p>In a large aviation graph, the number of possible paths grows extremely quickly. Without constraints, a traversal will happily return routes that are technically valid but completely unrealistic from a human perspective.</p><p>To keep the search grounded, the implementation applies a small set of explicit constraints.</p><p>The caller can control:</p><ul><li><p>the <strong>minimum number of routes</strong> to return</p></li><li><p>the <strong>maximum number of routes</strong> to return</p></li><li><p>the <strong>maximum number of legs</strong> per route</p></li></ul><p>In addition, the system enforces a hard upper bound on the number of legs, regardless of user input. This prevents the search from blowing up in dense parts of the graph.</p><p>Together, these constraints ensure that direct routes are discovered first, multi-stop routes are explored gradually, and the result set stays small and usable.</p><p>Further constraints - such as minimum layover times and maximum total travel duration - are intentionally deferred and will be introduced later in the series, once this foundational search behaviour is in place.</p><h2>The Final Version of the Search Algorithm</h2><p>Without diving further into depth-first search (DFS), which is well documented by others, the figure below illustrates the order in which routes are explored and where branches are discarded.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n6_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n6_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 424w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 848w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 1272w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n6_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png" width="1456" height="1487" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1487,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377796,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/187498645?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n6_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 424w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 848w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 1272w, https://substackcdn.com/image/fetch/$s_!n6_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c111211-366c-4e16-bb07-6df733bf646b_2144x2189.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Graph traversal using an Iterative Deepening Depth-First Search (IDDFS) strategy.</figcaption></figure></div><p>When told to find up to 3 routes, of up to 2 legs away from the starting node, the traversal will do the following:</p><ul><li><p>Start in SYD, visit MEL, find SYD&#8594;MEL as a direct route</p></li><li><p>Start in SYD, visit ADL, visit MEL, find SYD&#8594;ADL&#8594;MEL as a route within the constraints</p></li><li><p>Start in SYD, visit AKL, then:</p><ul><li><p>Visit ZQN, discard the route as we are already 2 legs away and haven&#8217;t reached the desired destination</p></li><li><p>Visit MEL, find SYD&#8594;AKL&#8594;MEL as a route within the constraints</p></li></ul></li></ul><p>The final version of the search algorithm is designed to produce sensible results without introducing any notion of optimisation or ranking.</p><p>It follows a few simple principles:</p><ul><li><p>routes are discovered in order of increasing number of legs</p></li><li><p>the search stops as soon as enough routes are found</p></li><li><p>the total number of routes returned is capped</p></li><li><p>cycles are explicitly disallowed</p></li></ul><p>The implementation lives in a single function, <code>find_flight_routes()</code>. Given an origin, a destination, and a small set of bounds, it returns a list of <code>FlightRoute</code> objects representing candidate routes through the network.</p><p>Rather than searching &#8220;up to N legs&#8221; in one pass, the algorithm iterates over leg counts and performs a constrained depth-first search for each one. Direct routes are explored first, followed by one-stop routes, then longer connections if required.</p><p>At a high level, the algorithm looks like this:</p><pre><code><code>for legs in range(1, max_legs + 1):
    run DFS that only accepts routes with exactly `legs` hops
    collect routes that reach the destination

    if enough routes have been found:
        stop searching

    if hard maximum route count reached:
        stop immediately</code></code></pre><p>This approach keeps the traversal predictable and prevents longer, lower-quality routes from overwhelming shorter and more obvious ones.</p><p>The search algorithm as presented in this article is available here: <a href="https://github.com/adriangabardo/skymesh/blob/v3.0.1/src/services/path_finder.py">https://github.com/adriangabardo/skymesh/blob/v3.0.1/src/services/path_finder.py</a></p><h2>Observing the Algorithm in Practice</h2><p>With the implementation in place, the most useful thing to do is simply run it and inspect the output.</p><p>By running the search engine with different combinations of origin and destination airports, we can see how the algorithm behaves as the network changes. Direct routes appear first, followed by one-stop routes, then longer connections only when necessary.</p><pre><code><code>$ ./run_local.sh
Skymesh graph loaded
Airports (nodes): 6072
Routes (edges): 37042

============================================================
Searching routes: SYD -&gt; MEL
Search for flight routes complete.
SYD -&gt; MEL (1 legs)
SYD -&gt; ADL -&gt; MEL (2 legs)
SYD -&gt; AKL -&gt; MEL (2 legs)


============================================================
Searching routes: YYC -&gt; SYD
Search for flight routes complete.
YYC -&gt; LAX -&gt; SYD (2 legs)
YYC -&gt; NRT -&gt; SYD (2 legs)
YYC -&gt; SFO -&gt; SYD (2 legs)


============================================================
Searching routes: LHR -&gt; JFK
Search for flight routes complete.
LHR -&gt; JFK (1 legs)
LHR -&gt; TXL -&gt; JFK (2 legs)
LHR -&gt; DEL -&gt; JFK (2 legs)


============================================================
Searching routes: CDG -&gt; DXB
Search for flight routes complete.
CDG -&gt; DXB (1 legs)
CDG -&gt; HAM -&gt; DXB (2 legs)
CDG -&gt; IST -&gt; DXB (2 legs)</code></code></pre><h2>What&#8217;s Next</h2><p>At this point, we have a working flight search engine in the most literal sense. Given an origin and a destination, the system can traverse the aviation graph and return a small, sensible set of candidate routes.</p><p>The next set of challenges are no longer about <em>correctness</em>, but about <em>cost of computation</em>.</p><p>So far, route discovery has been relatively cheap. Routes are purely structural, and each candidate can be evaluated with minimal work. That will change quickly as we start layering real-world constraints on top of the search.</p><p>In the next articles in the series, we will deliberately make the search engine heavier.</p><p>First, we&#8217;ll introduce <strong>temporal constraints</strong>. Routes will become time-aware paths with departure and arrival times, minimum connection windows, and an overall travel duration. Each candidate route will require additional validation, and many structurally valid routes will be rejected late in the process.</p><p>Next, we&#8217;ll introduce <strong>mock pricing and cost functions</strong>. Instead of simply enumerating routes, the system will start attaching weights to them, allowing us to reason about trade-offs between different options. At that point, route evaluation stops being trivial and becomes meaningfully expensive.</p><p>Only once this additional complexity is in place will we turn our attention to performance.</p><p>By first adding computationally heavy features and only then optimising, we get a clear before-and-after comparison. We can observe where the system slows down, which parts of the algorithm dominate runtime, and which optimisations actually matter.</p><p>That sets the stage for the next phase of the series: improving performance through caching, memoisation, and selective precomputation, without changing the core traversal logic.</p><p>Before moving forward, however, it&#8217;s worth reflecting on how this implementation evolved and why some of the earlier, more naive approaches fell short.</p><h2>Lessons Along the Way</h2><h3>The Naive First Attempt</h3><p>The first version of the search algorithm was a straightforward depth-first search with a maximum depth.</p><p>Starting from the origin airport, the DFS would explore outgoing edges, recursively expand paths, stop once a given number of legs was reached, and record any path that ended at the destination.</p><p>From a correctness standpoint, this worked. All valid paths up to the maximum number of legs were found.</p><p>In practice, however, the output quickly became unusable.</p><p>As a concrete example, querying a simple pair like <code>SYD &#8594; MEL</code> - which has plenty of direct connectivity - returned 45 different routes within a four-leg limit. Most of these routes were technically valid graph paths, but many of them went around the globe before eventually reaching Melbourne.</p><p>From the algorithm&#8217;s point of view, this behaviour was expected. A depth-first search with a depth limit will happily enumerate every path that fits the constraint. From a user&#8217;s point of view, however, the result was clearly not useful.</p><h3>Rethinking the Search Strategy</h3><p>The problem was not that DFS was the wrong tool, but that the search strategy was too permissive.</p><p>A depth limit alone does not meaningfully capture what &#8220;reasonable&#8221; means in the context of flight search. Treating all paths up to a given depth as equally interesting allows long, low-quality routes to drown out shorter and more obvious ones.</p><p>The key shift was to stop thinking in terms of &#8220;up to N legs&#8221; and instead search in order of increasing complexity. Shorter routes should always be discovered first, and longer routes should only be considered when necessary.</p><h3>Bounding the Search in Practice</h3><p>This change led naturally to the current implementation, which combines progressive deepening with a small set of hard bounds.</p><p>By enforcing:</p><ul><li><p>a capped maximum number of legs</p></li><li><p>a minimum number of routes before early termination</p></li><li><p>a hard maximum on returned routes</p></li></ul><p>the search becomes predictable and tractable, even in dense parts of the network.</p><p>These bounds are not optimisations in the traditional sense. They are structural limits that define the shape of the search space and keep the system aligned with real-world expectations.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Engineering an Aviation Graph: Data Structures and Design Decisions]]></title><description><![CDATA[The Graph Series, part 2]]></description><link>https://writing.gabardo.engineering/p/getting-started-and-ingesting-data</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/getting-started-and-ingesting-data</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 03 Mar 2026 22:01:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WPwl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The full code for Skymesh is publicly available at <a href="https://github.com/adriangabardo/skymesh">https://github.com/adriangabardo/skymesh</a></p><p>This article&#8217;s code specifically is available at <a href="https://github.com/adriangabardo/skymesh/releases/tag/v2.0.0">https://github.com/adriangabardo/skymesh/releases/tag/v2.0.0</a></p><h2>Overview</h2><p>The first article in The Graph Series framed the aviation industry as a graph problem: airports as nodes, flights as edges, and routing as constrained path optimisation. This article turns that abstraction into something concrete.</p><p>Here we focus on project setup, data ingestion, and domain modelling - the unglamorous but decisive groundwork that determines whether graph algorithms remain elegant on paper or survive contact with real data. Before any shortest paths, cost functions, or optimisations can exist, the graph must be constructed correctly, consistently, and with an understanding of its limitations.</p><p>We will walk through how raw aviation datasets - airports, routes, schedules, and metadata - are transformed into a graph-ready representation. This includes decisions around node identity, edge directionality, temporal attributes, and how much of the real world to encode upfront versus defer to later computation. These choices directly affect correctness, performance, and extensibility in later stages of the system.</p><p>This article also introduces the data ingestion pipeline that underpins the rest of the series: how data is sourced, normalised, validated, and loaded in a way that supports iterative experimentation. The goal is not just to build a graph, but to build one that can evolve - supporting recalculation, enrichment, and re-modelling without collapsing under its own assumptions.</p><p>By the end of this article, we will have a working, queryable graph representation of the aviation network. It will be intentionally incomplete in terms of optimisation and routing intelligence - but structurally sound enough to support everything that follows: pathfinding algorithms, memoisation strategies, pre-computation, and dynamic updates.</p><p>This is the foundation. Every optimisation in later articles either benefits from, or is constrained by, the choices made here.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Project Structure and Separation of Concerns</h2><p>At this point, we have started the implementation of the foundations of the project. I have given it a name, <strong>Skymesh</strong>, simply to make it easier to reference from here onwards. Right now, the project is intentionally small.</p><p>The goal at this stage is not to solve routing problems or optimise anything yet. It is to put a real system in place that we can build on incrementally. That means having a concrete codebase, real data, and something we can execute, inspect, and reason about.</p><p>What follows is a walkthrough of what has been implemented so far, starting from raw data acquisition and ending with a working, inspectable graph.</p><h2>Data Gathering</h2><p>Skymesh uses the OpenFlights dataset as its initial data source. Rather than pulling data dynamically or wrapping an API, the decision here is to work with static, versioned input files. This makes experimentation reproducible and keeps ingestion simple.</p><p>The OpenFlights data lives in a public GitHub repository and is provided as a set of flat <code>.dat</code> files. Each file represents a different part of the aviation domain, such as airports, routes, airlines, and aircraft.</p><p>The files are downloaded directly into a local <code>data/</code> directory. The data sets have been downloaded with <code>curl</code> as follows:</p><pre><code><code>$ curl -L -o airports.dat   https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat

$ curl -L -o routes.dat https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat

$ curl -L -o airlines.dat https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat

$ curl -L -o planes.dat https://raw.githubusercontent.com/jpatokal/openflights/master/data/planes.dat

$ curl -L -o countries.dat https://raw.githubusercontent.com/jpatokal/openflights/master/data/countries.dat</code></code></pre><p>Once downloaded, the directory looks roughly like this:</p><pre><code><code>$ tree ./data/
./data/
&#9500;&#9472;&#9472; airlines.dat
&#9500;&#9472;&#9472; airports.dat
&#9500;&#9472;&#9472; countries.dat
&#9500;&#9472;&#9472; planes.dat
&#9492;&#9472;&#9472; routes.dat

1 directory, 5 files</code></code></pre><p>At this stage, no preprocessing or cleaning is performed. The data is consumed in its raw form so that modelling decisions remain explicit in the code rather than hidden in one-off scripts.</p><h2>Project Layout and Separation of Concerns</h2><p>With the data in place, the implementation itself lives under the <code>src/</code> directory:</p><pre><code><code>$ tree ./src/
./src/
&#9500;&#9472;&#9472; graph_build.py
&#9500;&#9472;&#9472; graph_viz.py
&#9492;&#9472;&#9472; main.py

1 directory, 3 files</code></code></pre><p>Each file has their own responsibility.</p><ul><li><p><code>graph_build.py</code> contains all logic related to data ingestion and graph construction</p></li><li><p><code>graph_viz.py</code> contains utilities for inspecting the graph visually</p></li><li><p><code>main.py</code> acts as the entry point and orchestration layer</p></li></ul><p>This separation is deliberate. Graph construction should not depend on visualisation, and visualisation should not be required for the graph to exist. Keeping these concerns isolated makes the code easier to reason about and easier to extend later.</p><p>At this stage, the structure may feel slightly heavier than necessary, but this pays off once optimisation, caching, or alternative graph backends are introduced.</p><h2>Graph Initialisation</h2><p>The core of the system lives in <code>graph_build.py</code>. This is where raw OpenFlights data is turned into a graph structure.</p><p>Graph construction begins by initialising a directed graph using NetworkX:</p><pre><code><code>graph = nx.DiGraph()</code></code></pre><p>Airports are ingested first. Each row in <code>airports.dat</code> is parsed, validated, and turned into a node in the graph. Only airports with a valid IATA code are included.</p><p>Routes are ingested next. Each route creates a directed edge from a source airport to a destination airport, but only if both airports already exist in the graph. This avoids implicit node creation and makes ingestion deterministic.</p><p>All of this logic is wrapped in a single function:</p><pre><code><code>def build_graph() -&gt; nx.DiGraph:
    graph = nx.DiGraph()
    load_airports(graph)
    load_routes(graph)
    return graph</code></code></pre><p>Running the project at this point constructs the full aviation graph and prints some basic diagnostics:</p><pre><code><code>$ python src/main.py
Skymesh graph loaded
Airports (nodes): 3366
Routes (edges): 67663

Sample airport:
GKA {
    "name": "Goroka Airport",
    "city": "Goroka",
    "country": "Papua New Guinea",
    "icao": "AYGA",
    "latitude": -6.081689834590001,
    "longitude": 145.391998291,
    "altitude": 5282,
    "timezone": "10"
}

Sample route:
('GKA', 'HGU') {
    "airline": "CG",
    "airline_id": "1308",
    "codeshare": false,
    "stops": 0,
    "equipment": [
        "DH8",
        "DHT"
    ]
}</code></code></pre><h2>Graph Visualisation</h2><p>Attempting to visualise the entire graph immediately is neither practical nor especially helpful. We are working with thousands of nodes and tens of thousands of edges, and a naive render quickly turns into an unreadable cluster.</p><p>Instead, <code>graph_viz.py</code> provides a constrained visualisation focused on the most connected airports. We extract a hub-centric subgraph and project it directly onto real geographic coordinates. Because latitude and longitude were ingested as node attributes earlier, we can render the graph against an actual cartographic background rather than relying on an artificial layout algorithm.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WPwl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WPwl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 424w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 848w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 1272w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WPwl!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png" width="1200" height="416.2087912087912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:505,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:346787,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://writing.gabardo.engineering/i/187498086?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WPwl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 424w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 848w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 1272w, https://substackcdn.com/image/fetch/$s_!WPwl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83b3977e-b2e7-40a7-8888-8cd07061c237_1578x547.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Visualisation of 50 most connected nodes (airports) on a cartographic background</figcaption></figure></div><p>With the rendering layered on top of a cartographic background, we can now visualise how our graph structure connects real-world airports based on the modelling decisions we made earlier. What was previously an abstract network of nodes and edges now maps directly onto the physical world. We can see transatlantic arcs forming naturally, dense European clusters emerging around major hubs, and the strong east&#8211;west connectivity across North America. This visual gives us confidence that the data modelling choices were sound.</p><p>We intentionally limit the visualisation to a subset of hub airports. Rendering the entire network would obscure structure rather than clarify it. At this stage, our goal is not completeness but coherence. We want to ensure that the foundation we have built is structurally correct before we begin asking more demanding questions of it.</p><h2>Data Modelling Decisions</h2><p>Now that we have a breakdown of the implementation so far, lets step back and talk about the modelling decisions that shaped the graph.</p><h3>Node Identity</h3><p>OpenFlights provides multiple identifiers for airports, including numeric IDs, ICAO codes, and IATA codes. Skymesh uses <strong>IATA codes as node identifiers</strong>.</p><p>This is a deliberate trade-off. IATA codes are human-readable, widely used, and make the graph much easier to inspect and debug. A path such as <code>LHR &#8594; JFK &#8594; LAX</code> is immediately meaningful.</p><p>The downside is that some airports do not have IATA codes and are therefore excluded. At this stage, Skymesh optimises for clarity and interoperability rather than exhaustive coverage.</p><h3>Nodes as Data Carriers</h3><p>Nodes in Skymesh are not just identifiers. Each airport node carries metadata such as geographic coordinates, country, and timezone.</p><p>Some of this information is not used immediately. It is ingested early to preserve optionality. Latitude and longitude, for example, will later enable distance calculations and spatial heuristics without requiring a second ingestion pass.</p><h3>Directionality</h3><p>Routes are modelled as directed edges. This reflects the reality of aviation networks, where routes are not necessarily symmetric. Treating the graph as undirected would simplify the structure, but it would also introduce incorrect assumptions that would surface later during routing and optimisation.</p><p>At this stage, edges are unweighted. Cost functions and constraints are intentionally deferred to the next article.</p><h2>What&#8217;s Next</h2><p>At this point, Skymesh has a structurally sound representation of the aviation network. We can ingest real data, construct a directed graph with meaningful identifiers, and perform basic inspection to verify that the model matches our expectations.</p><p>What we do not yet have is any notion of <em>cost</em>.</p><p>All routes are currently treated as equal. There is no concept of distance, time, price, feasibility, or optimisation beyond the existence of a path. This is intentional. Before introducing algorithms, it is important that the underlying graph is trustworthy and easy to reason about.</p><p>In the next article, the focus will shift from construction to computation. We will begin asking questions of the graph rather than just building it. That includes introducing pathfinding algorithms, defining cost functions, and exploring why naive shortest-path approaches quickly become insufficient in real-world networks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[The Aviation Industry as a Graph problem]]></title><description><![CDATA[The Graph Series, part 1]]></description><link>https://writing.gabardo.engineering/p/the-aviation-industry-as-a-graph</link><guid isPermaLink="false">https://writing.gabardo.engineering/p/the-aviation-industry-as-a-graph</guid><dc:creator><![CDATA[Adrian Gabardo]]></dc:creator><pubDate>Tue, 17 Feb 2026 10:24:04 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Overview</h2><p>In this article, we introduce a project that models the global aviation industry as a graph-theory problem. Airports are represented as nodes and direct flight routes as edges, forming a large, sparse, and highly non-uniform network.</p><p>The objective of this project is to explore how common aviation questions - such as route reachability, optimal paths between airports, and network-level efficiency - can be expressed as graph computations. As the project evolves, we will progressively introduce increasingly realistic constraints and cost functions, and examine how these affect both correctness and computational performance.</p><p>This article serves as the foundation for The Graph Series. Subsequent articles will build on this model to investigate optimisation techniques for graph traversal and path-finding, including algorithmic trade-offs, memoisation strategies, and performance improvements at scale.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Gabardo Engineering is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Aviation Industry</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3786" height="2130" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2130,&quot;width&quot;:3786,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a large jetliner sitting on top of an airport tarmac&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a large jetliner sitting on top of an airport tarmac" title="a large jetliner sitting on top of an airport tarmac" srcset="https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1545488286-6fe608f23485?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxhaXJidXN8ZW58MHx8fHwxNzcwNzE5NDk3fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@kommumikation">Mika Baumeister</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>The aviation industry is broad and never still. On any given day, tens of thousands of commercial flights operate worldwide, connecting several thousand passenger airports and transporting millions of people across the globe. Beyond leisure travel, aviation underpins a wide range of other sectors &#8212; corporate travel, cargo and retail logistics, emergency services, and military operations, to name a few.</p><p>Despite this breadth, the scope of this series is intentionally narrow. For the purposes of this project, the focus is limited to <strong>leisure-oriented passenger flights</strong>, and specifically to the structure of the global flight network itself. Cargo operations, private aviation, and military routes are treated as out of scope.</p><p>Even within commercial passenger aviation, there is an overwhelming number of variables that could be modelled. Airlines operate different fleets with varying ranges and capacities, routes are constrained by aircraft performance and permitted airspace, and hub airports play an outsized role in shaping global connectivity. Additional factors such as weather, crew availability, regulatory constraints, and geopolitical considerations further complicate the picture.</p><p>This project does not attempt to model all of these factors upfront. Instead, it treats the aviation industry as a layered system: starting with the existence of routes between airports, then progressively introducing additional dimensions such as airline-specific routes, aircraft constraints, and &#8212; later in the series &#8212; temporal availability and scheduling. This allows individual modelling choices to be examined in isolation, while still grounding the work in a recognisably real-world system.</p><p>By clearly defining which parts of the aviation industry are being simulated, and which are intentionally ignored, we can focus on the graph problems themselves without losing sight of the domain they are inspired by.</p><h2>Methodology</h2><p>For this project, I have picked Python as the language of choice. The graph itself is implemented using <strong><a href="https://networkx.org/en/">NetworkX</a></strong>, and the underlying data is sourced from <strong><a href="https://openflights.org/data">OpenFlights</a></strong>, which provides publicly available datasets covering airports, airlines, and direct flight routes.</p><p>The ingestion process focuses on building a clean and extensible representation of the aviation network. Airports are mapped to graph nodes, while direct routes between airports are represented as directed edges. At this stage, the emphasis is on establishing a structurally correct graph that can be easily extended with additional attributes and constraints in later iterations of the project.</p><p>Whilst the project is in active development, I plan to interact directly with the graph through a simple <code>__init__.py</code> entry point that exposes the graph as a first-class object. This allows for quick experimentation, ad-hoc inspection, and iterative refinement during development. Once the project reaches the benchmarking stages, interaction with the graph will shift to scripted, reproducible workflows designed to generate consistent and comparable performance measurements over time.</p><p>Alongside the core graph construction, a lightweight visualisation setup is introduced to generate visual representations of the network. These visualisations are not intended to be exhaustive or perfectly scaled, but rather to provide intuition around graph structure, connectivity, and the emergence of hubs within the aviation network. They also serve as a useful sanity check during development and a visual aid when discussing results later in the series.</p><p>This methodological foundation is intentionally kept simple. As the series progresses, the same setup will be reused to explore alternative cost functions, traversal strategies, and optimisation techniques, without changing the underlying data source or tooling.</p><h2>Assumptions &amp; Simplifications</h2><p>It goes without saying that the aviation industry is far more complex than the features implemented in this project. This is a deliberately simplified, real-world-inspired example that allows us to explore graph representations, calculations, and optimisation techniques in a concrete setting.</p><p>The following assumptions and simplifications are made as part of this exercise:</p><ul><li><p><strong>Direct flights only</strong><br>Only direct flight routes are represented. Multi-leg journeys are expressed implicitly through graph traversal.</p></li><li><p><strong>Single edge per route (initially)</strong><br>Routes between two airports are represented by a single directed edge, independent of airline or aircraft. Later in the project, this will be expanded into a multi-edge model to capture airline, aircraft, and other route-level properties.</p></li><li><p><strong>No temporal dimension (initially)</strong><br>The initial graph is static. Routes represent existence, not schedule or availability. Temporal constraints and availability will be introduced after the transition to multi-edge routes.</p></li><li><p><strong>Uniform edge behaviour</strong><br>All edges are treated equivalently at this stage. Attributes such as cost, duration, or reliability are deferred to later cost functions.</p></li><li><p><strong>No capacity or congestion modelling</strong><br>Airports and routes are assumed to have unlimited capacity. Operational constraints such as congestion or delays are out of scope.</p></li><li><p><strong>Airports as atomic nodes</strong><br>Each airport is represented as a single node, without modelling internal structure.</p></li><li><p><strong>Dataset limitations</strong><br>Route data is sourced from <strong>OpenFlights</strong>, which relies on a third-party provider that ceased updates in June 2014. As a result, the <em>routes dataset</em> is of historical value only. The other datasets (airports, airlines) appear to be maintained and are treated as current for the purposes of this project.<br>As of June 2014, the routes dataset contains <strong>67,663 routes</strong> connecting <strong>3,321 airports</strong> across <strong>548 airlines</strong> worldwide, which is sufficient for structural analysis and optimisation experiments.</p></li></ul><p>These assumptions define the baseline model used in the early stages of the series and will be relaxed incrementally as additional complexity is introduced.</p><h2>What&#8217;s next</h2><p>Following on from this project abstract, we will look into the initial project&#8217;s setup including the necessary data ingestion and modelling, followed by implementing the essential algorithms we will be experimenting with.</p><p>Thanks for reading! Subscribe for free to receive new posts and support my work.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://writing.gabardo.engineering/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Gabardo Engineering! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>