37

The solution for this one isn't nearly as amusing as the journey.

I was working for one of the largest retailers in NA as an architect. Said retailer had over a thousand big box stores, IT maintenance budget of $200M/year. The kind of place that just reeks of waste and mismanagement at every level.

They had installed a system to distribute training and instructional videos to every store, as well as recorded daily broadcasts to all store employees as a way of reducing management time spend with employees in the morning. This system had cost a cool 400M USD, not including labor and upgrades for round 1. Round 2 was another 100M to add a storage buffer to each store because they'd failed to account for the fact that their internet connections at the store and the outbound pipe from the DC wasn't capable of running the public facing e-commerce and streaming all the video data to every store in realtime. Typical massive enterprise clusterfuck.

Then security gets involved. Each device at stores had a different address on a private megawan. The stores didn't generally phone home, home phoned them as an access control measure; stores calling the DC was verboten. This presented an obvious problem for the video system because it needed to pull updates.

The brilliant Infosys resources had a bright idea to solve this problem:

- Treat each device IP as an access key for that device (avg 15 per store per store).
- Verify the request ip, then issue a redirect with ANOTHER ip unique to that device that the firewall would ingress only to the video subnet
- Do it all with the F5

A few months later, the networking team comes back and announces that after months of work and 10s of people years they can't implement the solution because iRules have a size limit and they would need more than 60,000 lines or 15,000 rules to implement it. Sad trombones all around.

Then, a wild DBA appears, steps up to the plate and says he can solve the problem with the power of ORACLE! Few months later he comes back with some absolutely batshit solution that stored the individual octets of an IPV4, multiple nested queries to the same table to emulate subnet masking through some temp table spanning voodoo. Time to complete: 2-4 minutes per request. He too eventually gives up the fight, sort of, in that backhanded way DBAs tend to do everything. I wish I would have paid more attention to that abortion because the rationale and its mechanics were just staggeringly rube goldberg and should have been documented for posterity.

So I catch wind of this sitting in a CAB meeting. I hear them talking about how there's "no way to solve this problem, it's too complex, we're going to need a lot more databases to handle this." I tune in and gather all it really needs to do, since the ingress firewall is handling the origin IP checks, is convert the request IP to video ingress IP, 302 and call it a day.

While they're all grandstanding and pontificating, I fire up visual studio and:

- write a method that encodes the incoming request IP into a single uint32
- write an http module that keeps an in-memory dictionary of uint32,string for the request, response, converts the request ip and 302s the call with blackhole support
- convert all the mappings in the spreadsheet attached to the meetings into a csv, dump to disk
- write a wpf application to allow for easily managing the IP database in the short term
- deploy the solution one of our stage boxes
- add a TODO to eventually move this to a database

All this took about 5 minutes. I interrupt their conversation to ask them to retarget their test to the port I exposed on the stage box. Then watch them stare in stunned silence as the crow grows cold.

According to a friend who still works there, that code is still running in production on a single node to this day. And still running on the same static file database.

#TheValueOfEngineers

Comments
  • 0
    I dint understand half of what you wrote, but the parts that I did understand were fascinating.
  • 1
    That private megaWAN is just.. Christ, and I thought that 200 VPN connections from each client device to 3 edge nodes was horrible (now reduced to 2 VPN connections by a single VPN gateway). Enterprisey indeed XD
Add Comment