Commercial

Some software projects I have delivered in the last few years through my company Isogonal.

I'm interested in systems that take actions autonomously, as opposed to delivering 'insights' that require humans to interpret and action. I'm really excited about the challenges in integrating LLMs into automated systems over the next several years.

Pi Historian

Automated logistics cost optimization

Linear programming
Combinatorial optimization
Reinforcement learning
SAP and Kinaxis integration
Multi-tenant enterprise SaaS

I built a multi-tenant enterprise SaaS platform that lowers costs in large transportation networks by smoothing shipping volumes over a month-long horizon. This is in use by multiple Fortune 500 organisations.

The platform integrates with client ERP systems and ingests planned stock transfers across hundreds of locations and thousands of lanes. The optimization engine creates incentives for reducing carrier costs, utilizing capacity effectively, constant daily shipping volumes and delivering on time.

It utilizes constraints for daily location transport capacity, location storage capacity, carrier and mode capacity on each lane, as well as information such as locations being closed on given days. It can accomodate unique penalities for individual STRs and custom penalties and incentives for days early and late.

As well as incentives, the optimization engine manages hundreds of thousands of hard and soft constraints for time-dependent location, carrier, mode, and lane capacities.

At the core of the optimization engine is a linear program, constructed and solved using the low-level CPLEX interface. The code also uses a hand-written 3D bin packing heuristic to match pallets to trucks.

The platform is deployed on GCP, using GKE, GCS, BigQuery, Cloud Functions, and PubSub. The API is Python/Flask, the web application is React, and authentication including SSO is handled with self-hosted Keycloak.

Power Lines

Automated industrial work order prioritization

Active learning
Natural language processing
Uncertainty quantification
Hierarchical modeling

The national electrical grid operator in New Zealand approached me to investigate the possibility for automatically prioritising incoming maintenance work orders from external service providers, for equipment across the national grid.

This research project turned into a sever application, web application, and a conference paper. The software runs daily, reading information from the Maximo maintenance database, running the calculations, and feeding priorities back into the database.

The web application allows Transpower to update training data with new equipment, re-train, and investigate accuracy without external intervention. This is crucial for an ML system to continue to provide value long after the original authors have left and years have passed since the creation of the original training data - entire new types of equipment can be introduced into the grid!

This system was implemented in 2018, is still in use in 2023, has prioritized tens of millions of dollars in maintenance expenditure, and annual reviews have demonstrated significantly positive all-up ROI.

For each work order (100,000+) the system estimates a probability of failure across service performance, direct cost, public safety, worker safety, and environmental impact, as well as estimated cost to remediate, which are eventually mapped to an overall work order priority.

Developing equipment and defect ontologies was key to this project. The existing equipment database was at the wrong granularity - units of equipment that fail often had a one-many relationship with parts in the equipment database. Unsupervised approaches were heavily used to explore the data and iteratively build the ontologies.

Another key piece of the project was capturing organization-wide experiential knowledge. We developed a web app for data entry along with an active learning approach to make the best use of the available input time. Among other outputs, this was used to aggregate subjective evaluations of the likelihood of failure for an asset/defect combination into a quantitative estimate

The NLP module involves a hierarchical system of Bayesian generalized linear models, quantifying uncertainty at each step, to generate an equipment and fault ontology node. Subsequent ML modules then estimate the likelihood of failure, repair cost, and calculate the final priority score.

Soil

Geological data platform

Multi-tenant enterprise web app
Correcting X-Ray fluorescence measurements

This multi-tenant SaaS platform for managing the data produced by minerals exploration companies evolved out of an app I built for my friend, who was maintaining master spreadsheets of canonical data, merging in new data from several new spreadsheets produced every day by staff.

This also involved doing complex calculations with copied and derived data. Chasing down three-week old off-by-one errors through each master data spreadsheet and all the derived calculation data was time-consuming.

This app allows all staff to upload their own spreadsheets with upload-time error checking, applies calculation pipelines automatically, and makes the latest processed data available for download to anyone with permissions anywhere.

Its a normalized database with all the advantages that brings, but it behaves like a spreadsheet for the users. Plus, it offloads much of the complex UI work to Excel. Win-Win. Built with React, Flask, Postgres, self-hosted Keycloak (auth) behind nginx deployed to a single small VM.

Beams

Lab results delivery portal

PII health information
HL7 and FHIR healthcare data interoperability
White-box penetration testing

I developed a results delivery portal for a commercial toxicology lab with thousands of MAU.

This project involved a deep dive into the HL7 and FHIR health data interoperability standards. I had to reverse-engineer 397 tables in the LabEclair LIS system, figure out where the information relating to patient results was stored, and then store these into a downstream database for rendering patient results in PDF form at request time.

It also includes some nice git-like functionality enabling lab users to edit reports downstream of the original source, and then interleaving new data from the source with existing user edits, including the ability to roll back to any previous version while maintaining the edit history.

Due to a desire to keep dependencies to an absolute minmum, I wrote the authentication/ authorization and user management code from scratch, which subsequently passed an externally contracted white-box penetration test. Also, this project is built to be easily maintained for as long as possilble, so the framework stack was kept minimimal - Node.js, Passport, EJS templates, and hand-written CSS.

Open Source

Almost all of the code I've written in the last decade is closed-source. These are a few of the open-source projects

Pi Historian

OSISoft PI Historian bulk download tool - 2018

.NET Core

OSISoft PI Historian (now AVEVA Server) is a time-series database used by heavy industry worldwide. Donwloading bulk data (millions of series, terabytes of data) for analytics or ML use is challenging - think Excel integrations with a maximum of 1,000 points per request.

It's now 2023 and this may have changed with an AVEVA cloud solution. However, last I checked, this tool was the only available/open source solution for the bulk download use case. It has been used by engineers at large firms worldwide.

Beams

Interpolating a surface point cloud - 2010

Matlab

I wrote this code because I had a raster scan of a very old contour plot - pressure contours from a physical wind tunnel test of a large industrial building. I wanted to turn these pressure contours into distributed loads and finally bending moments along the roof trusses.

First I digitized the contours into a 3D point cloud, with points along the contour lines. Then I made a Delaunay triangulation of the point cloud. Finally I took a set of points along the beam and interpolated the TIN surface at these points. This code (Matlab) conducts the Delaunay triangulation and then efficiently locates the triangle an arbitrary point falls inside and interpolates the point value from the triange surface.