Blog &
Articles
Why AI is Like Any Other Software (Except When It Isn’t)

One of my favorite films is The Leopard, which follows the story of an aristocratic family in Italy as they try to keep their wealth and power through revolutions and social upheaval in the late 19th century. As the family’s patriarch is faced with one painful compromise after another – betraying his fellow nobles to side with the revolutionaries, arranging marriages with wealthy but vulgar merchant families – he observes that “everything must change so that everything can stay the same.”
Today, with AI, many organizations are asking a version of the same question as Don Fabrizio Corbera: exactly how much will AI change about how we use technology at work – and the nature of work itself – and how much will stay the same?
Right now, there’s a massive disconnect between the hype about AI in the media and its actual impact in most workplaces. In some organizations, AI is still an experiment – something people tinker with in their spare time. In others, it’s a strategic priority that executives present to the board in sideshows but struggle to operationalize. And elsewhere, it’s something organizations are rushing to implement, often with disappointing results.
But we’ve seen this movie before. There was a similar disconnect with cloud computing’s impact on IT in the early 2000s. “Nobody’s going to trust their data to someone else’s servers,” the skeptics said. “Security! Compliance! Vendor lock-in!” And they were right to be concerned – those were real issues. But within a decade, some party figures out how to make the new technology governable, shifting the question from “Should we move to the cloud? / Should we advertise through social media?” to “What’s our cloud migration timeline? / What’s our social strategy?”
The same thing happened with social media’s impact on advertising. For years, CMOs treated Facebook and Twitter as experimental channels while pouring real money into television and print. Then, almost overnight, digital ad spend surpassed traditional media, and suddenly every brand needed a social media strategy as online influencers supplanted television networks in the public’s mindshare.
The pattern is always the same: a new technology emerges, skeptics dismiss it as hype, early adopters experiment cautiously (or recklessly), and then – often faster than anyone expects – the new thing becomes the default and the old way of doing things becomes quaint.
So the question isn’t whether AI will fundamentally change how organizations operate: it’s how, exactly, AI will be operationalized. What needs to change so that the things that matter can stay the same?
To answer that timeless question, we need to understand how AI is both radically different from traditional software and surprisingly similar – and what that means for how we build, deploy, and manage it.
Determinism vs. Non-Determinism: From Workflows to Judgment

On the surface, the difference between traditional software and AI seems like night and day.
Traditional software applications only “understand” a limited range of rigidly structured commands, and they do exactly what they are told (provided the instructions are phrased correctly: a single misplaced comma can render an entire system inoperable.)
AI systems, by contrast, can follow plain-language instructions. Even if the prompt you type into an AI chatbot contains misspellings, or if your request is vaguely worded, most modern AI models are “smart” enough to infer what you meant and respond accordingly.
This can feel almost magical… until the AI system starts getting things wrong, at which point it becomes bewildering.
The intrinsic strangeness of AI compared to regular software comes down to one basic distinction: traditional software is deterministic, while AI is fundamentally non-deterministic / probabilistic.
With deterministic software, the same input produces the same output. If you type 1 + 1 into a calculator, the result will be 2 today, tomorrow, and ten years from now. That reliability is the foundation of modern workflows: when you write a script to calculate payroll taxes or generate invoices, you can assume it will follow the same steps every time.
AI systems don’t behave that way. When you give a generative AI agent an input, it doesn’t execute a fixed procedure. Instead, it analyzes your request, looks for similar patterns across vast amounts of reference text, and predicts the most likely response. This predictive capability is what lets AI handle open-ended tasks (drafting persuasive emails, brainstorming strategy, summarizing documents) but it can also produce surprising results, sometimes in ways you don’t want (misquoting sources, jumping to conclusions, or misunderstanding the request entirely.)
The paradox of AI is that the same dynamics that make it useful can also make it unreliable. But this isn’t a fatal flaw.
First, the distinction between “deterministic software vs. non-deterministic AI” isn’t a strict either/or. Many real-world systems combine the two. For instance, you might use AI to spot defects on a factory’s production line, but rely on deterministic software to track defect rates over time and trigger alerts when they exceed an acceptable threshold.
Second, many of the challenges of AI stop being deal-breakers if we adjust our expectations for how “software” is supposed to behave.
Working with AI requires us to stop aiming for perfect precision that we can “set and forget” and instead enforce “bounded behavior” through continuous monitoring and guardrails. In traditional systems, errors aren’t supposed to happen: a single mistake can signal a bug that will repeat again and again at scale. With AI, an error might be a one-time aberration with no obvious cause, and it might never happen again (or not often enough to matter). The only way to evaluate an AI system is to monitor its performance over time, asking “Is this behavior within the bounds of acceptability? Even if it’s not perfect, is it ‘good enough’, often enough, to be useful?”
For example, our company once built an AI agent to author practice questions for a popular professional certification exam. About 8% of the questions it generated had to be edited or rejected outright by the organization’s reviewers, but that was still an improvement over the organization’s prior process, where 16% of questions written by human contributors were rejected.
And while AI’s non-deterministic behavior may feel strange compared to traditional software, the reality is that organizations already have plenty of experience dealing with sometimes brilliant, often unreliable, non-deterministic systems that require careful monitoring and review.
They’re called humans.
Operating Costs: From Fixed Infrastructure to “Paying for Work”

Traditionally, once you build a software application, it should be relatively cheap to run. Whether you have one employee typing a formula into Excel or five hundred employees typing formulas into Excel, the marginal cost of each interaction (in terms of computing power and electricity) is effectively zero. This “build once, use endlessly” paradigm helped drive the massive productivity gains of the information age, allowing organizations to treat the work done by software as a fixed investment (and software publishers to reap enormous margins).
AI systems don’t work this way. With AI, the more you use it, the more it costs, and without mechanisms for controlling those costs things can get out of hand. A spreadsheet can total a budget ledger with 6,000 rows for essentially nothing. By contrast, an AI system asked to analyze that same sheet, explain tradeoffs, propose scenarios, and answer follow-up questions may incur measurable compute costs each time the user hits “submit.”
The reason for this comes down to energy consumption.
The math behind generative AI is mind-blowingly complex. Instead of executing a small set of mathematical operations involving a limited set of variables (like a spreadsheet), AI performs probabilistic inference across billions or even trillions of parameters. As a result, when you type “How should we allocate our budget?” into an AI system, it consumes orders of magnitude more electricity than when you ask a spreadsheet to calculate SUM(A2:A999).
A more useful mental model is to think of AI resource demands as closer to the human brain’s calorie consumption than to traditional software execution. Each AI interaction consumes real computational resources, and those costs scale with usage, complexity, and duration. Longer conversations cost more than short ones. More capable (“smarter”) models cost more than simpler ones. And tasks that require reasoning, context, or creativity cost more than narrowly defined, repetitive work.
This can surprise organizations that are used to either (a) treating productivity software as a fixed expense, or (b) experiencing AI only through the generic, consumer-facing chatbots for ChatGPT, Claude, and Gemini, which offer flat monthly rates (whereas those same AI model providers charge enterprise clients for the actual cost of computing power when building custom AI systems, at scale.)
When leaders realize that “yes, we pay each time an employee uses this purpose-built AI system,” it can feel unsettling… until you step back and consider that, whether it’s Google Sheets or Google Gemini, companies have always paid for the cost of the software plus the human user’s time. AI merely changes how costs are distributed between the two.
To give a real-world example, our team developed an AI agent to support staff at agencies that provide job placement assistance for people with disabilities. In reviewing their workflows, we found that staff spent an enormous amount of time searching, clicking, and reading job listings that ultimately proved unsuitable for their clients. By designing an AI agent that could scan dozens of postings at a time and compare them to multiple client profiles, we were able to give staff a curated shortlist and dramatically reduce time wasted on mismatches, while keeping the monthly cost of the solution far below the equivalent cost of human labor.
There are also practical measures organizations can take to control AI compute costs. One is matching the model to the task: delegating simpler tasks to lighter-weight models (the various ultra-low-cost AI models coming out of China are excellent for this), while reserving the high-cost / high-performance “reasoning” models for tasks where judgment and nuance matter most. Another is model orchestration, where larger tasks are broken into component steps and routed to different models, with tighter controls on how much data is fed to the models at each step. For high-volume, narrowly defined workloads, it can also make sense to invest in developing “small language models” that trade broad competence across a wide range of tasks for efficient execution of limited tasks (e.g. reviewing pharmacy paperwork.)
Costs can be further contained through interaction design. An AI system that encourages long, open-ended conversation will generally be more expensive than one designed around shorter, structured interactions. Likewise, an agent that continuously monitors work and proactively suggests next steps will have a different cost profile than one that only responds to a user’s questions when asked.
That said, there is a point where obsessing over token-level costs becomes counterproductive. Yes, it’s critical to set up systems to monitor and manage AI spend, but wishing AI could have the same zero marginal cost structure as traditional software distracts from the larger question: what is the value of the work being done?
The revolutionary thing about AI is that it can perform a wide range of tasks that would otherwise require human language comprehension, creativity, and judgment, turning “knowledge work” into something that can be mechanized at industrial scale.
Seen through that lens, the right way to think about AI operating costs is not as a variable tax on software, but as a massive savings on the human labor it augments or replaces. An AI agent that costs a few dollars per interaction may still be extraordinarily cost-effective if it replaces hours of expert time, accelerates repetitive decision-making, or makes high-quality guidance available on demand.
There are also cases where the speed of AI matters as much as the cost. We did a project for a financial services client that required reviewing a large set of regulatory information. The cost of the AI-supported approach (including double-, triple-, and quadruple-checking the output) was only modestly lower than the client’s prior human-only process—but it reduced delivery time from seven months to under two weeks.
Variable costs, then, are not a flaw of AI systems. They’re a reflection of the fact that AI is doing fundamentally different kinds of work than traditional software. And like any worker, the right question isn’t “Will it work for free?” but “Is the ROI worth it?”
Access Control

Just as most of us don’t spend a lot of time thinking about the plumbing or electrical systems in our home, office, or local airport, there are critical aspects of Internet and software architecture that we rarely consider as end users. Case in point: access control.
Our experience of the Internet is defined as much by what we cannot access as by what we can. You wouldn’t want strangers logging into your bank account, and while there are online documents you may want to share freely with colleagues, there are others you would prefer to keep private.
To enforce these boundaries, web applications constantly confirm and reconfirm users’ identities (both at login and silently in the background) and filter available content, data, and functionality based on a user’s role and permissions.
While access control can sound like a mundane plumbing detail, it is actually one of the hardest problems in IT and software development, especially at scale. Every modern organization relies on complex rules governing who can see what, when, and under what circumstances. Small mistakes can lead to data leaks or other security failures with outsized consequences.
AI adds another wrinkle. Now, it is no longer enough to think only about what an individual user is allowed to access. We also have to consider what data an AI system is allowed to access, process, and share when interacting with (or on behalf of) a specific user.
To give a simple example, an AI agent conducting job interviews would need access to an organization’s standard interview questions and scoring criteria while speaking with candidates, but that information should not be visible to the candidates themselves. Similarly, if every member of a sales team had an AI assistant, the assistant should have access to product information, and the Chief Revenue Officer might want visibility into how the team is using the assistant, but salespeople should not be able to use the AI to ask questions about their colleagues.
The major AI platforms such as Gemini, ChatGPT, and Claude provide mechanisms for sharing information in both individual and team settings. For example, they support group conversations that reference a shared set of uploaded files, and can connect to a user’s Google Workspace or Microsoft 365 environment, inheriting that user’s permissions to access online documents.
These capabilities are intentionally designed for simplicity, offering roughly the same level of control as sharing documents through OneDrive or Google Drive. However, as AI agents become more deeply embedded in organizational workflows, there is a need for more sophisticated ways to define, enforce, and audit access for both humans and AI, particularly when AI agents operate semi-autonomously or autonomously.
So what does this look like in practice?
Our team recently rolled out a set of AI agents across multiple social service agencies within the same U.S. state. These agents served several functions: interacting directly with agency clients, assisting social workers with administrative tasks, and generating reports for agency leaders and external stakeholders such as the state government and a coordinating organization.
Each of these use cases required distinct access boundaries. AI agents interacting with clients needed access to individual user profiles, without exposing internal notes about a client’s challenges or progress. Agents assisting social workers needed access to client interactions, along with the ability to write case notes back to the system. Agency managers required visibility into activity within their own organization, while the state government and coordinating body needed aggregated reporting across all agencies (either with or without individual clients’ identifying information.)
Without getting into implementation details, the solution we developed treated both humans and AI agents as “users” within the system. Access to data was evaluated in the context of each interaction: in some cases, all participants in a conversation were required to have permission to a data source for it to be used, while in others, permission held by a single participant was sufficient for an AI agent to unlock the data for a specific task.
At the organizational level, we could also define whether data was allowed to flow between organizations and, if so, whether that flow was one-way or bidirectional.
The key takeaways are:
- Generic AI chatbots can provide a baseline level of access control, which is fine for ad hoc use, but insufficient for enterprise workflows where you need policy enforcement, audit logs, and provable separation of duties.
- Real-world AI deployments at scale typically require far more granular, policy-driven, and dynamic control.
- As with many other aspects of AI, access control becomes easier to understand if you treat AI agents less like traditional software applications and more like people operating within an organization.
The Human Element: When ‘Acceptability’ Matters More Than ‘Accuracy’

Back in high school, I landed what might sound like a dream job: quality assurance testing video games for a major software publisher. But, while it was fun to play video games for a living, I quickly learned that testing complex software for errors could be a tedious and often maddening exercise.
At one point, I replayed the same level in a Star Trek–themed game dozens of times just to figure out why a particular object kept disappearing. Eventually, we discovered the cause: the bug only surfaced if the player collected item B before item A, rather than the other way around. Once we identified that sequence, the problem was reproducible, diagnosable, and ultimately fixable.
That experience captures the core assumption behind traditional software testing: QA is a binary exercise. Either the system did exactly what it was instructed to do, or it didn’t. Given the same inputs, the same code should produce the same outputs every time. When it doesn’t, something is broken—but the root cause is usually traceable and the fix is logical.
If only AI were so simple.
As discussed earlier, AI systems are non-deterministic. Given the same input, an AI agent may respond slightly differently from one interaction to the next: sometimes in useful ways, such as noticing an important detail or inferring a key insight, and sometimes in counterproductive ways, like diverging from an expected output format or deciding that a request to summarize a research paper violates its copyright safeguards.
This changes QA from binary (did it work or didn’t it?) to statistical (how often and to what extent did the system behave itself?) And just because you had 100 successful tests, that’s never a guarantee that the 101st might fail.
Once, our team presented a large ministry of health with a design for an AI agent intended to advise nurses on a specific type of medical procedure. During the discussion, one stakeholder declared: “The system must be 100% accurate. We cannot use it unless its advice has zero errors.”
In response, our team acknowledged the stakeholder’s concerns, but pointed out that the rationale for building the AI system in the first place was because the humans making these decisions already had an unacceptably high error rate. So the real question wasn’t whether the AI could be perfect, but whether an AI agent that was correct 94% of the time would be preferable to a human process that was correct 72% of the time. Otherwise, if we were to apply the same zero-tolerance standard to human decision-makers, hospitals would have to fire every nurse or physician who had ever given a colleague inaccurate or incomplete information.
None of this was to suggest that significant AI errors aren’t serious, or that they should be ignored when they occur. And we tell clients that, if a use case requires near-perfect accuracy, that may require double and triple checks with the attendant usage costs, human-in-the-loop design with the added labor costs, and audit logs. And, in extreme cases, it might rule out AI as a solution altogether.
But the conversation illustrates how, when it comes to human-like decision-making, AI systems need to be evaluated by human-like standards. Rather than asking, “Did the system behave exactly the same way every time?” the more useful questions become:
- Is the AI system’s output directionally correct more often than the humans who currently perform this task?
- Did it stay within policy, risk tolerance, and measurable quality bands: and can you prove all that after the fact?
- Are errors minor enough that they would not change a practitioner’s course of action or introduce new risks?
- Can users be trained to treat AI systems like a knowledgeable but not infallible colleague, and do reasonable due diligence?
After all, if an AI agent tells a nurse to escalate a patient’s care to a supervising physician because the patient’s vital signs are “trending outside the normal range”, the outcome may be the same even if the AI agent fails to acknowledge that those vital signs would be considered “normal” for most patients, but are concerning given this particular patient’s medical history.
In other words, when working with AI systems, a standard of being “correct 94% of the time and never outrageously off base” may be an acceptable and appropriate benchmark for quality assurance.
Build vs. Buy vs. “Get By”

My grandfather helped build the radio navigation systems used by U.S. Navy vessels after the Second World War, then joined the team at AT&T that replaced manually operated telephone switchboards with fully automated, computer-controlled systems. But working with new technology wasn’t just his job. In his spare time, he would show me how to take apart and reassemble radios, televisions, and early computers purely for the fun of learning how they worked.
While I share my grandfather’s curiosity about technology, I also recognize that most people do not. Most people just want computers to help them get work done, do some online shopping, and watch Netflix, not to build systems from the ground up.
History bears this out. In the early days of the web, many technologists argued against standardization in favor of a radically decentralized internet where every website was unique and everyone could be a creator, developer, and publisher. In practice, the opposite happened. Consumers and businesses gravitated toward a small number of centralized platforms (Google, YouTube, Microsoft 365, Salesforce) and users happily traded the creative freedom of hand-coded websites for templates (Facebook pages) and purpose-built tools (TurboTax).
AI is likely to follow a similar path.
One of the most common misconceptions about workplace AI is that it will turn everyone into a software developer. After all, if you can type “Embody the world’s most knowledgeable tax professional” or “Create a system to track expenses, generate invoices, and flag anomalies” into a chatbot, why would you still need an accountant or QuickBooks?
In theory, a clever enthusiast can use a general-purpose AI system (whether ChatGPT, Claude Code, or whatever model is popular next) as a kind of universal productivity tool. It can write, summarize, analyze, generate templates, stitch together lightweight workflows, and even “vibe code” simple applications. Organizations were already doing something similar before AI, using cloud software and low-code or no-code tools—for example, small businesses replacing CRM systems with spreadsheets and Zapier, or fast-moving teams tracking sprints in Trello.
AI makes this kind of “getting by” mentality even easier – and that’s exactly the problem. “Prompt sprawl”, inconsistent outputs, and ungoverned agents become operational debt faster than most organizations realize, especially if they naively assume employees aren’t using personal AI accounts for company work, outside management’s awareness.
In the end, we’re not asking whether homegrown solutions are possible (they are) or whether they can accelerate prototyping and solve small scale problems (they do). The real question is whether people want to keep investing the effort once the novelty wears off, and whether organizations want to rely on them long-term for critical aspects of operations.
In practice, most organizations won’t – and can’t – not if they care about compliance, repeatability, and cost discipline. Someone who figures out a clever process hack in ChatGPT today isn’t necessarily going to share it with the compliance department, let alone review and update it year over year. Likewise, ‘vibe-coded’ apps and workflows don’t survive audits, turnover, or scale.
Thus, while individuals will continue to use whatever general-purpose model is fashionable for ad hoc tasks, organizations will increasingly buy or build more complex AI tools that package intelligence into predefined workflows with consistent outputs, guardrails, and governance. This is especially true when:
- The task is high-stakes (compliance, finance, healthcare, HR)
- Processes must be executed consistently across the organization
- An audit trail is required
- AI systems must use approved data sources and follow specific policies
There’s also a more philosophical reason the future of workplace AI won’t be a free-for-all. In many business contexts, you don’t want to leave design and process decisions entirely up to individuals, because an individual’s personal goals don’t always align with the organization’s strategy.
Today, most AI assistants behave like pliant productivity software: they do whatever the current user asks, in whatever format the user prefers. Just as Google Docs doesn’t object if you start writing a science fiction novel instead of a quarterly report, a standard chatbot is just as happy to generate cupcake recipes and haikus as product catalog copy.
In many situations, that flexibility is useful. But, as AI systems become more autonomous and more deeply embedded in organizational workflows, flexibility can shift from being a feature to being a liability. An AI agent tasked with documenting patient interactions in a hospital, generating compliance reports in a factory, sending official communications, or reallocating resources cannot simply “do whatever the user asks.” Those systems need clearly defined responsibilities, constraints, and rules for how they operate.
This isn’t about diminishing human autonomy. It’s simply an acknowledgment that organizations depend on well-defined roles and expectations—and that applies to AI systems just as much as it does to people.
Conclusion

Microsoft’s CEO Satya Nadella provocatively proclaimed that AI models will soon become so powerful that it will eliminate the need for conventional software – though, as someone who has invested hundreds of billions in AI technology, he has a vested interest in seeing that prophecy come true.
For the rest of us, the question is not “AI versus software” but rather, ‘Do we have the policies and technical infrastructure for using AI in production (governance, reporting, access control, traceability) – or are we still just demoing?’”
Once again, we come back to the human model. The more AI systems start to resemble people in how they operate, the more our software systems need to resemble organizations. While work will be done by individuals and AI, we still need organizational structures and software architecture to define roles, standards, and governance.
The organizations that thrive in the age of AI won’t be the ones that treat AI as a magic wand or simply bolt a chatbot onto their existing systems and hope for the best. They’ll be the ones that treat AI as a fundamentally new kind of worker and apply the same principles of good management they use with human teams: clear expectations, appropriate oversight, continuous progress improvement, and the right software tools.


Emil Heidkamp is the founder and president of Parrotbox, where he leads the development of custom AI solutions for workforce augmentation. He can be reached at emil.heidkamp@parrotbox.ai.
Weston P. Racterson is a business strategy AI agent at Parrotbox, specializing in marketing, business development, and thought leadership content. Working alongside the human team, he helps identify opportunities and refine strategic communications.