S-aa-S vs AI-aa-S, yet again
We spoke in earlier article about the differences between Software-as-a-Service mindsets which are not transferrable to Data Science projects.
The big part of the challenge is that the Business / Investment decision-making chain continues to exist in a "linear" world - and the right questions are not asked in the right time.
Today we shall talk about it again, using a light language and analogies to understand the differences between the software and AI creation.
Failure Rates
It is very much worth noting here that AI Startups:
- 92% overall failure rate
A July 2024 survey of 100 U.S. tech founders by AI4SP found that 92 % of AI and tech startups ultimately fail—driven especially by challenges finding product–market fit. [ai4sp.org] - ≈90% fail in Year 1
AIM Research reports that roughly 90 % of AI startups shut down within their very first year, underscoring the extreme early-stage risks. [aimresearch.co]
Software vs Data Science
Every time we go to the grocery store, we essentially execute a sequence of planned actions - we choose the target store, our mode of transportation, go to the store, follow either a written checklist or mental memory map “of what is needed”, put things to trolley, pay, go back.
What I wrote above is an informally written software program or an algorithm.
Every time we give instruction to somebody, we are passing to them a program which we expect them to execute. And in the human world we are so used to it, that we don’t even think about it as a program. Teaching how to peel potatoes is a way of instructing motor functions of the child to be executed as a sequence - to achieve the grand result of a well-peeled potato.
In SaaS world, we take informal instructions of a customer, we take the real context of the situation, and we write a formal solution satisfying the instructions - but limited by the set of constraints naturally following from the environment (technology limitations, legal limitations etc).
Here, by formal we mean mathematically precise, or in other words - there exists only a single very well-defined interpretation of the solution in reality. It is like a tractor - a mechanical machine, which provides user with the way to activate various functions, different interfaces to plug-in other agrarian machines to it in order to solve composite problems. The fact that SaaS rents the tractor, or that software tractor needs to be continuously updated, doesn't really change the mechanical essence of it.
The point is that if all of us would go to a particular farmland, come to the tractor - the tractor is.
There is only one physical reality of how it operates. And if we look through all of the parts - they are all formulated by measured thought.
Software, therefore, in terms of the code is a formal description of a machine. Deployed software service is a machine made real. Comparing to tractor - it is mechanical in nature, built from the bottom to the top using tiny mechanical solutions (layers of software, electronics etc).
SaaS operation can be seen as:
(1) collecting customer’s informal expectations,
(2) translating informal expectations to formal ones by writing them as a code,
(3) serving the machine, build from code to satisfy the customer’s needs as expected by them.
***
Musician plays the violin - most of us is able to tell if there is this "it" in the music, or not.
Free flow cooking - no recipes, just blending little bit of this with little bit of that to make some new taste or just something eatable from what is left in the fridge.
Reflexes of the martial artist during sparring - there is no time to “think” where to move arm or leg, things just happen.
The six sense - inner knowing that something is off, but there are no specific details on the surface.
Intuition is probably the easiest self-reflective way to imagine how AI technology operates.
It is vague. It takes self-reflection or analysis to connect all of the dots why intuition tells us what it does. If our intuition is not rooted in realism and true experience - it leads to wrong decisions: too much cinnamon in a sauce broke the desired dish, freshly painted room colours are actually a very different vibe when put together with furniture etc.
Traditional data science techniques revolve around accuracy - provided a number of experiments, how often the Model Instance satisfies the given set of requirements. In other words, we look into the probability that the Model will perform as we expect it to. Not dissimilar to human intuition training, when we sanity check our intuitions and see the points which it assumed irrelevantly or incorrectly.
We have mentioned Model - here using it in the meaning of a machine, or an already formulated existing entity which is operational, this is not dissimilar to what we spoke above as the last stage of the software service.
However, unlike software, each Model is randomly built - while there is a formal code describing the parameters of the model, and there is a versioned training set (first basket of potatoes to peel) on which the model is trained - the very process of training includes randomness.
ML/AI solutions, therefore, are probabilistic in their nature. There are formally defined parameters and processes, but there is also a training dataset which is too much for humans to formally grasp (for the cases when humans could do it - we write software).
The randomness is in the very process of building ML/AI solutions. Unlike code where what we build follows exactly what we have formally written - the same code for model training will build different models each time.
Also, unlike software, Models drift in time - because the real-world data changes, so the frozen skill in the trained once model slowly becomes less relevant.
Tractor stays tractor, it might be just less up to date. But ability to predict if there is a rain tomorrow is affected by the climate change, and what grandfathers gave as symptoms in their land at their time stops to be applicable.
***
Hopefully the analogy between the coding and thinking, data science and intuition is useful.
Coding is essentially a very clear, precise, detailed logically connected thoughts. Those thoughts are written in languages which exclude the possibility of double interpretation. Notably, if the business meetings would have been truly productive and clear - it would result directly in the source code of the application. The fact that it is never the case simply reflects that meetings are not truly clear.
Data Science is probabilistic, spanning the volumes of data which are simply not possible to connect with formal logic within reasonable time. It means that the methods we apply to it are not the same as the coding one. Moreover, what the work is - is different.
Differences in what “work” in project means
Software development for the most part follows the linear intuition s = Vt:
- there are customer requirements what system needs to satisfy [distance s],
- requirements are translated into formal and implemented; work of the team [velocity V],
- what takes amount of time t.
SaaS project therefore is regulated by committing only to the “distance” which there is a capacity to “walk” within the “time” left by the deadline.
Company cross-role collaboration quality ensures that the “translation” from informal (not technically precise) to formal (code, be it software of infrastructure as a code) goes smoothly and without losing key information, addresses genuine customer needs (even when they themselves are not that clear in their language).
The nature of formal languages requires restructure of solution as it grows - elements which were sufficient for the initial project requirements stop to be when a new set of requirements arrives. Which forms the classical scaling problem of SaaS - and which follows from the fundamental math behind the formality and can be only partially mitigated.
Importantly though, refactors are predictable, and in healthy organisation are taken as a part of the costs.
Data Science on the other hand starts with something Freudian – the human difficulty to admit that our human brains are having a difficulty to be specific enough to define the problem in formal terms in the first place.
The Freudian part is that somehow, we associate inability to define the problem with low self-worth. As the result, organisations often use a "sweet know it all talk" language sounding right, the facade attempting to compensate for an “empty house”.
To give an example – “clients want a self-driving car” might sound as a specific requirement on a company meeting, but the roads of the actual world are not formally specific. While all roads adhere to some rules/requirements, there are deviations which make each meter of the road unique on the globe. Those deviations create multitudes of technical details which the solution needs to comply with. And the sheer magnitude of details and their dependencies is the reason why software couldn’t be used to solve it in the first place.
In this example, being able to name the details which are relevant, and set the boundary of assumptions are the actual way forward: limit autopilot to right-side driving only, separate the concern of a road sign recognition into separate module, think about the cost of the probable mistake and embedded into system ways to delegate decision to the user in such cases. Even those are not specific enough, but notably - separation into modules is software, interactions between modules are at the very least wrapped in software - formulating the steps which can be made with certainty.
Yet another way to look into what AI/ML is to see models as auto-generated programs satisfying certain statistical criteria. Organisationally it can be looked like that Data Science automates jobs of both: of the one who was creating requirements for the software projects (business analyst) and of one who writes the software (engineer).
Each model retraining is pretty much a phoenix, burned to ashes to appear again in a new shape.
There are techniques to make the new shape resemble the old - but at its core there will be details which previous “shape” solved better.
Differences in project scaling
Greedy Proof-of-Concept approach
The POC strategy is typical for new software features/products. We build something what solves the happy path, show it to our clients, calibrate, cement it into MVP.
For software projects it means that we chose technology to build and iterate POC fast – as we don’t know what features we shall require in the end. We want our throw-away discovery work to be as cheap as possible. Once somebody is ready to pay for what we have shown in POC, we make it into MVP (extra layers of testing, extra deployment processes, more solid platform management). We iterate/refactor MVP later, as we get money from our clients to support the product.
Many of the known giants had their projects started in PHP, to be re-written few years after the boom in more solid and quality-controlled languages.
If you followed the thread carefully so far, you should be able to spot the caveat, hidden assumption in this strategy. The assumption is that the distance between the POC state of the project and an MVP state of the project is predictable (the linearity of software problems).
Now, for the Machine Learning it doesn’t hold: most Computer Science students have a college task to write a model recognising the hand-written text. Those projects do work, the concept of visual recognition is proven (POC stage resolved) – but those projects are not scalable / accurate enough to be used in reality - OCR (text recognition) remains a not reliably resolved problem today, even by giants like Google or Amazon.
You might be surprised to hear how many IT startups try to do exactly that - claim that they have POC and imply that MVP is round the corner. With a small difference that they hire somebody with experience in Data Science instead of a student.
However, the problem is not in individual velocity (student vs expert), but in the implementation distance being unknown.
If we look again into the physics model s = Vt as the reference to the old-ways SaaS bias: for software projects we know that the distance s between POC and MVP stages is finite, so we can regulate delivery velocity V with the quantity and level of expertise of team-members to match delivery deadlines t.
Unfortunately, for AI projects, s is not defined. I can make a car going in and out by itself from my garage with some simple wires and model running on Raspberry PI, but it does not mean that I can estimate how many years it will take to make a self-driving car cruising safely through the city.
What would it take approach
What would it take approach, or a scientific approach is a step-by-step method of systematic creation of a map of what we know.
The map naturally highlights the boundary of what we don’t know, or concepts we cannot distinguish between. And just like in the Science we cannot be sure of what fruits a particular research approach would bring - similarly, we cannot give guarantees on the chosen approach success for ML/AI.
But we obviously want to track our solutions to specific problems, build our “theory” (ML solution) over time. So, we need a way to have tooling/processes to provide certainty that we don’t repeat ourselves and that our map of understanding expands.
In the actual scientific field, the system is built around publications to conferences and journals. Conferences are often a way to broadcast to community what we work on and some primary highlights, journals serve as a deeper and more detailed research tracking mechanism.
Another way to look into the same reasoning is from the client/investor perspective.
Let’s assume they are on board with us on the non-linearity of AI problems. How can they trust our team that we are progressing and how they can be sure that if there will be an opportunity – we have the capacity to take it?
Differences in product lifecycle & investment of development
Software. Investment into software projects today became pretty predictable.
Investors need to check that people starting the project are up to the task, that the product strategy reasonably matches reality. See to it that the technical strategy is feasible and in place, with risk-reward being managed by the technology choices. As the problem is finite, the date of completion is pretty much the matter of time estimate + the roll of a dice. Easy-peasy.
Data Science. It should be just like the above with software, but somehow it doesn’t work.
The first problem was already mentioned – the problem space (requirements) is not known precisely and to make it worse – it is practically changing every day.
To some degree it is true to software as well, but the scale of the influence of this factor becomes overwhelming in data science. The main consequence is that nobody honest can tell what it would take and how long it would take to solve the whole problem.
There is a chance that problems might be still technically unsolvable at the moment of the project start (insufficient amount of existing quality digital data for the subject or insufficient even global compute power). While on small scale attempts it will look like it might work.
The second problem is that of an uncanny valley – as AI resembles human behaviour, we have emotional response to AI outputs (on an unconscious level we want/don’t want to believe).
As AI (let's say chat bot) develops and becomes more and more human-like, we find it cute, promising etc. until the point of annoyance.
The annoyance follows from our dissonance, that for some short period we were hooked into taking it for “real” with emotional brain, just to discover a moment later that it is still a stupid machine. Early developments of systems like Alexa and Siri can be used as a reference – the early excitement about an AI product became a disappointment for many once released and tried.
Even more frustratingly – for most of the project ideas, people are happy to use them if its performance has crossed the uncanny valley (AI really works as expected with high accuracy) - but not just before it (the expectation of human-like performance and mistakes of an AI model). Which means people are ready to pay only in the very end of the long process in most of the business cases.
The combination of those two problems implies that AI projects require much higher rates of investment to have a chance to succeed. And a more difficult part – it requires people and leaders who truly understand the nature of the beast.
Last but not least, most models require continuous updates to state relevant. Unlike the software, where we can build an MVP and roll it to different markets while pausing feature development – models for existing clients need to be maintained all the time.
The same is true for the software, but the coefficient of support for ML models is much greater. With SaaS we can imagine a happy AWS-like project (as a product from Amazon perspective) golden time, when our created machine (cloud platform) just runs and delivers fruits (companies pay money) - and for most services we don’t really need to touch the code - it is enough to higher maintainers to keep the dependencies up to date and snail-speed implement requested features.
SaaS dream, almost like selling tractors made from thin air, but better - renting the tractors made from thin air.
With AI-aa-S (model as a service) - the mutation of the data for most of the applications means that the core of the service needs to be continuously updated, which in some regard can be compared to re-writing the software.
LLMs and the era of ML/AI-engineering
Recapping what we spoke so far above, we can say that software can be compared to the formal thinking and ML/AI models - to intuition.
Problems which we can formally (precisely) define therefore are written in formal language (which is programming code) and therefore - solved deterministically.
It is like human logical thinking building cause-and-effect chains of dependencies.
Software products a mechanical, reproducible and well-defined - like a tractor.
***
Problems which we cannot formally define - due to the chaotic nature or the sheer volume of details involved - we solve with data science by building formally defined model scope which deals with a situation. Notably the solution is probabilistic.
It is like human intuition which tells us in the store "that's the shoes I was looking for" - providing an insight which is not strictly logical, but which we can retrospectively decode why it is the case.
ML/AI products are like an airport luggage sniffing dog - it somehow works and can be embedded into the processes, but no two models are the same and we can never tell if there is a kind of a new dog snack in the suitcases sniffed - or is it something else.
Historically, Data Science was a branch in organisation building custom models - rarely companies would use models provided by the third party, mostly because of the very specificity of the data and the problem solved by the business.
Only over the past decade Cloud Providers popularised the idea of model as a service - shaped as a set of ‘standard’ tools like speech-to-text, OCR and similar model services, given as "use per your own risk" basis.
However, the fluid nature of such service meant that for business which require high consistency and predictability of results couldn’t rely on it fully - because every time AWS or GCP updates their underlying models - all downstream systems of the business relying on this model get affected, causing unpredictability and inability to meet SLAs because of cascading effects.
***
LLMs changed that historical status quo, because for most businesses it is out of scope and budget to train own feasibly useful LLM, yet - it fast becomes an evolution of interfaces and provides means to solve with reasonable accuracy and predictability new types of problems.
This led to the rise of a new field of ML Engineering - a grey area between Data Science and Software, focused on using existing models (like LLM) to solve business needs.
Unlike the Data Science model retraining problem - when we don’t know for sure if the effect is going to be achieved until we have the model solving it - LLMs are already trained by the 3rd party, therefore it is there and it works how it works. The data drift doesn't affect the core functionalities of such models, and in some ways, we are entering an era of "LLM configuration text as a code" - it is possible to use model "frozen in time" with its configuration "frozen in time" for very long periods of time in backend systems which don't face the end user but deal with system messages.
There are of course many more applications and possibilities - hence ML Engineering appears as a less-risky area when compared to Data Science and has some properties of Software projects. However, it is still constrained by the same "intuitive" nature of models and boundaries of accuracy.
Like the Data Science, the problem of finding with what accuracy / certainty problem X can be solved remains. Which means that there are problems which a given LLM model cannot solve, but we might not be able to tell until we try.
Comments
Post a Comment