Agentic Systems Without Chaos: Early Operating Models for Autonomous Agents

Agentic Systems Without Chaos: Early Operating Models for Autonomous Agents

InfoQ

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Оглавление (11 сегментов)

Segment 1 (00:00 - 05:00)

If your team has AI running in a proof of concept, but you're still figuring out how to run it reliably in production, you're not alone. That's the gap most engineering teams are navigating right now. Yukon AI Boston this June 1st and 2nd brings together senior engineers, software architects, and technical leaders who've already made that shift. They'll share the patterns that scaled, the mistakes that didn't make the blog post, and what they'd actually do differently. No hidden product pitches, just senior practitioners helping senior practitioners. Learn more at boston. qcon. ai. Welcome, welcome, welcome everyone. Welcome to this third episode of our podcast series, Next Generation Playbook for AI era insights and patterns. What is relevant? And uh if you have yet not seen, please go back and see the episode 1 and two. First one we did with Greddy Boots which was all about what's fundamentally changing. What is principled view on what's new and what is just appearing to be new but it is same old design and architect thing. Then we looked at what is evolutionary about the architectures. While coding is going at a pace whether pipe coding, spec coding or various forms of coding but then how do we evolve our decisions our design and architecture with equal pace? Is it even possible? How do we go about that? And in today's episode, we have agentic systems without chaos. Well, it is already a chaos. But how do we give early operating models for autonomous agents to our viewers, listeners and bit of direction what we as a practitioners seeing on the field and how do we go about that? It's a mutual learning. Let's learn together. So to have this topic discussed today I have with me Joe Stain. Hey Joe, how are you doing? — Good. How's it going? — Good. Happy to you have you here and I know offline we have been talking about the subject but finally it's the day when we are going to hear you a lot and um maybe share a bit of it our experiences in the place. But what I would like to do is that hear from you in 60 seconds or maybe a little more about you Joe and what are you excited about agentic stuff around the industry. — Sure. So for me it's really a combination of unlocking and evolving new capabilities and being able to tackle problems that we weren't able to tackle before. And for me as an architect, as an engineer, and someone who's very creative, like a lot of us, like we love hard problems and new hard problems that just weren't even like, oh, you could do that, right? Is something now that's tangible. But there's also been a shift of the operating model. And as a architect, developer, security professional, and like, you know, I've been doing this since 97, like, you know, long-term like industry, you know, thoughtful engineer. There's a lot that goes into all of what changes and happens around that and how different roles happen for different people and where the autonomy goes in and takes tedious tasks away. I think it's a very exciting time. — 100%. And it's um I like saying it that it's 10x opportunity and 10x responsibility. — Exactly. — Because the the complications and the complexity with which this technology is increasing simple if you put the problem simply it's easy to solve. But if you make the problem statement itself complex it's difficult. But good to hear from you that you're excited and I know with your vast experience we will have lot to talk about it today. So to start with the problem space, what do you think is agentic AI a new shift or evolution or is it just the more enhancements to the same ML and automation world which we have been anyway carrying out from years? It's an entirely different domain space and there are connectivities to everything from microservices to classic ML that go into that new domain like everything else in it. It's a ven diagram. we just have a new circle, right? And that space has only been evolving now for like maybe the last year, right? Where folks like OASP have done like a great job um getting out there. Some organizations like the AI Alliance have been coming together trying to work through some problems from that perspective, but it's something that has a new encompassment of challenges and opportunities ahead. Well said. Uh how about we start with

Segment 2 (05:00 - 10:00)

agentic use cases because I see across industry there's lot of confusion which is not agentic. People are calling it as agentic. A lot of time confusion in terms of that which applications are good candidate to be uh evolved as agentic use cases. But let's start by defining agentic use cases. So what according to you maybe if you can give us one example clearly that this is agentic use case and this is not. — Sure. So to me an aogentic use case is something that would be like some type of incident production response system where some anomaly comes into the system and then some LLM is going and making some decision around what should happen based on tools calls that it's making to introspect what's happening with the system at that time. Right? So combination of anomaly or many anomalies coming in doing some logical decision that is going to occur and not a rules engine but actually some non-deterministic state and then making those tools calls which are really just API calls at the end of the day to gather information and flow that through a process that may or may not involve a human that then orchestrates to achieve a goal such as constraining and moving a server offline so it no longer talks on the network, right? And having a full use case end to end where you're accomplishing what someone might have had to do for 45 minutes, you can now get done in 30 seconds and corner off a security threat, right? and you re reduce the meantime for the security operations people for seeing and deciding this could be a security threat for non-agentic use cases. I see those the chat bots of the world, the deterministic systems of the world where you know for a fact that it's almost like you have a compiler and you know it's going to actually do what it needs to do that everything is going to be functional in nature from like a programming perspective that you're not going to have side effects that you're going to have item potency and all these things that we have today as software engineers that like just go away and either have to be built up and considered or not used and cornered off and they're very different systems from the way that I look at it. — Yeah. So if you may call it that uh the difference definitely one certain difference is determinism and non-determinism but on top of that the use cases which have loop back learning and dynamism to the level where it's not coded yet system is able to decide the path forward and take the decisions call the tooling etc is what makes it special and different from any normal other autonomous uh use cases or solutions etc. Have you heard about malt book open? Who does — Yeah, I heard I heard it leaked 1. 5 million API keys for 37,000 users. — That's insane. — I I have been following everything around openclaw and nanobot and picobot which can run on a raspberry pi. Now, you know, this is why I love open source, honestly, um is because people can come up with some idea and then yeah, it may not have like practical applications at my day job today, but like it's going to um you know, I've already seen VCs like just instead of investing in a startup just put their own money in and just spin up their own openclaw hosted sandbox company, right? It's wild what's going on out there, right? they just claw code into their own startup again. And it's going to be interesting like where and how fast that evolves. Like the guy who wrote it um now works for OpenAI. So um it's going to be a continual interesting opportunity to see who like the same way that we had the rush with who is going to own the app stores and device and mobile. that's not going to be the same here because it's the internet and the way you get to agents and work with them and that's a whole new experience that is going to I think evolve this year. — Yes. um open source makes it uh interesting, faster but at the same time crazy because uh — sure — people think projects not the system and

Segment 3 (10:00 - 15:00)

that's interesting part but I want to call out one use case which I heard from — Peter uh the author of this open claw the famous open claw and uh he was asked that which is the use case you would like to where you had that aha moment that yes open claw is different and doing different and he said that he was away from his system and uh this open claw had to take call for example the boundaries he created is that don't cross outside my computer do whatever you need to do figure it out from the local system and so it first check the local binaries if it has not found how it can install the those binaries from the local scope or clear the paths and figured it out and then start using that application when he's away so that is something smartness which system evolved itself that okay if I cannot do X let me see Y Y let me see Z — it has a soul file you know that is continuously something that like evolves itself and is self-replicating — um it it's like the most novel little thing but like sometimes simple is all you need right that's like sometimes the best engineering is a triangle to hold the bridge up — yeah so we have said that yes agent is real getting more and more real every day. We have said that yes we have agentic use cases and the use cases which are not so agentic but people are getting confused but what does it mean for designers and architects? What is that happens when system plans act and execute? What is they need to be careful about? — Yeah. So I think the industry is going from pioneering to more stability now around this. And all of the problems that you have to be thinking about are boundaries, right? My autonomous unit and what boundaries does it have? And then what are the boundaries that exist between that autonomous unit and the other autonomous units that it's going to be interact with to take its autonomous task. It's not just about one agent. It's about orchestrating many agents working in many different ways with many different APIs on a massive scale within an enterprise. You could have tens of thousands of agents running and working all at the same time making different API calls and any in any one time. You need verification and backup of evidence of what actions those are taking based on your requirements. Just like anything else that we would be doing facilitating the risk appetite for our organization and like what risk mitigations we actually do as engineers because every organization is different and they all have different risk appetites, right? But the threats here are different and there's a combination of setting that risk appetite and then once you understand the risk appetite because most people don't even know what the domain is like you got to tell them like here's your risks, right? like what do you want to do from a business perspective behind it? Don't ask me the engineer, you know, like right. So that's one aspect of it and then being able to have an additional set of metrics and observability for people who have thought in the industry about observability of our systems. Now, we need to have that exact same thing for our AI and we need to see what's happening with our AI, prompts, what's happening with our tools calls, what are the orchestrations that are occurring with those tools calls for the prompts that are coming in over time so we can SDLC them. And onto the SDLC, there's an entirely new SDLC that is emerging right now. I don't know how to put my finger on it. It is moving so fast. But the way that we are going to be working with and interacting and managing code bases is just going from co-pilot to command center. It's a radical shift and I'm excited about it. It's going to be difficult and there's going to be shifts and roles and responsibilities, but you know, I could focus on problems and I can hopefully be more impactful to my or for stuff that, you know, I may want to do on the side like for baseball stuff, whatever, because I'm a, you know, major league baseball nerd. But it's a completely different shift. And once you start doing that and change the SDLC, the CI/CD now has to change because finally Kubernetes doesn't have to be the answer for everything because Kubernetes isn't the answer for everything. But it is because the tooling is there, the people are there, and the hype is there and blah blah. But if I could just claim my way and deploy in a couple of instances and click, I'm in. And if I could then go and re reuse those plugins as skills and do all sorts of other agents that are set up now around autoscaling and different ways of handling it and building it up. I'm hoping we'll see a

Segment 4 (15:00 - 20:00)

new layer emerge around how that is used now, but we'll see how it goes. Um, so I think it's a complete shift across the entire spectrum of everything we're doing as architects and developers and engineers in everything we hit and in a day-to-day basis. — You're giving a lot of responsibility here and it's it sounds to me quite aspirational because I agree that responsibility is increasing, the risks are increasing and we need entire new way if this propels like this at this space. We need newer ways to think the whole things around the agentic systems which certainly has more. We need to talk about those explanability and observability you said but before that I want to doubleclick on the risks because that is what is immediately in hand that is what people can control people can watch out for when they're designing these agentic systems. So tell me what do you think are the newer risks which I'm not talking about the ML and GI other aspects of it which we with the agentic systems what are those newer risks which you are trying to call out — newer risks are prompt injection and hijacking of the control of an agent and it's interesting because what brought me to even understand this risk and this threat that um was some research that Bruce Schneider posted. um he's the author of applied cryptography and industry leader around security and it's all around what they call the Morris 2worm and basically if you have an email agent you are susceptible to having that email potentially hijack your orchestration tool layer based on the interactions that you could have going back and forth between your prompts and the tools calls where your tools calls can get either befuddled and the client can take over or just do things like a denial of service attack, right? So that unlike before, isn't just going to cost you downtime. It's now going to eat up tokens that cost money, right? So it's different blast radiuses around some of the same things that we had before but in new ways coming out with different experiences. It even goes so far where you look at things like you know supply chain security and how it applies to here. There's been papers and research done where you can train an LLM to have certain pieces of information of inside of it so that the prompts going in will be able to generate back doors in code when the code comes back from the LLM it will actually have malware in the code generated in the generated code in the model right like that's nuts right and you know I read these papers and I try it out of my machine and I'm just like wow right there's all sorts of new uh different attacks coming in around that and then you have things like tool chain escalation and to me you know MCP is just remote stored procedures they're just stored procedures that's all they are they're nothing EJBs whatever you want to call them you know they're the EGBs of 2026 okay right but you they still have a place and a purpose and tools are important to understand and to me it's like all about intent. But if they're just API direct calls where you're just hitting rate limits and not knowing what the different APIs are going and orchestrating around, you have a risk of having those tools being called in the wrong way because the LLM's are still not that smart based on their context window and what's coming in and what they've seen before, right? So doing things like trying to figure out how to cache your orchestrations and then start thinking about anything that's out of cache and how you handle the exceptions and the narratives around that, right? It's an entirely new pattern that you have to start thinking about when you're architecting your systems. And if you're a system who develops distributed systems of scale, like you always think about caching, right? So it's not like you're not thinking about caching anymore, but now you've got to think about it at a different layer of where it's interacting with how it's handling this non-deterministic system and storing nondeterministic data that's a cache miss. Ah, — right. So what do I do? Right? How do I create some embedding around it or something where I can go ahead and hold some set of floating points or something that is like something, right? So it's a hard problem to solve and it comes back to the non-deterministic systems right? Yeah, I think uh if I may summarize it, you're saying that some problems and some risks

Segment 5 (20:00 - 25:00)

are where we need higher level of abstractions for example if we had those um uh injection issues now it's prompt injection plus the layers are increasing plus second part you said is a bit more standardization bit more at least on the controlling parts which may be MCP server plus most more standards which will be forming. So more of decision making and the controls line there where we still be if I may call predictable with unpredictability. — Yeah. Yeah. — Probabilistically. — Yeah. And there are certain things which are which need newer approach and newer researches and uh more um solutions coming around. That's interesting. But I think you have called out very nice areas for us to delve into. If we may now look into the explanability bar part which you touched upon. We have explanability, we have human in loop and everybody is using these as jarens. But how much of it is because earlier we had like single prediction or single change decisions but now we have multiple predictions and multiple change decisions which are happening and every stage gives me the explanability. For me if I need to have more observability to observe more it's not helpful it's not insightful for me I mean it shouldn't be the case that for everything we get loads and loads of observability and then we start thinking how do we get insights out of it so what in your view is explanability and if any early insights you have from your work that okay how much is good — yeah so a lot of It comes down to the combination of use case and what set of action if any or actions that need to get performed and when. Sometimes things are active processes where the human in the loop is part of the workflow. Sometimes the human is a passive control where something might be going on and the user or the human might need to go take a look because now the workflow has been stopped. And sure, the human is still in the workflow but at a different swim lane, right? And has a different set of criteria of what they may need to see in order to adjust for that. You don't necessarily need to see potentially, unless you're an auditor, every single little piece of every single little touch system and IP address and user access and the whole DSPM, right? You don't need to necessarily see all that, but you need to have and I think it's going to be almost like a video game, right? where you're going to have 75 different things going on a day with one or more assistants and agents and they're going to be generating reports, taking out tasks. working through actions. They may have failed a boundary that you need to look at before it goes out and oh my god, it's for your boss. You've now dropped anything and you're like going like this, right? Um and then your to-do list is now filling up automatically for the AI is now your boss, right? like you are now getting a to-do list from your AI of things that you either need to take action on or need to go back to the AI about or something else. Like maybe you need to go talk to Suzie and you really need to take this out of the loop completely or maybe you just need to go ahead and switch the input box or radio button and click next, right? It's going to be so driven by use cases and the platform aspect of that is going to be I think interesting. It's a whole new user experience. It's a it's a new behavior that it's like all the mobile apps that we have on our phone and everything that we do on our phone but now like lots faster and has access to everything and makes decisions for us. — Absolutely. More democratization plus more responsibility and risk. I hope we may not have all the answers but yes we know that if people are listening to this and can start gearing it up for the better side of those explanability and those problems of operation sides of it as well it'll be really good and I believe that this is not decreasing the work this is increasing the work uh at different layers but the responsibilities are increasing in that sense and the work human work is do you think human work is reducing with all these. — I actually find my job to be more demanding and increasing than I do uh decreasing. I made the mistake in like a day turning something around that was

Segment 6 (25:00 - 30:00)

just like, you know, nearly impossible to do in like weeks of time and doing it and, you know, like poof, it was there, you know, and then the next day it was like, all right, how about this now, right? And it's like, wait, wait. I've got 750 other things to do. I just kind of dropped everything just for that one little prototype demo, right? So it like the way I think about it is like you know with great power comes great responsibility and those responsibilities are coming sure there's a reduction of work absolutely but the responsibilities now are becoming so much more powerful because the expectations are higher you know before the expectations of what is production quality is very much determinized based on all sorts of different politics. and religion and organizations and everything else, right? But now, if you wanted to, you could have every single threat written down by AI with every single known vulnerability pulled in from MITER, checked up in a box, and have architecture diagrams and user guides and FAQs and a continuous runbook that's an automated website that you build and have people log to it. And all of that is just a click of a button away, right? So, you've got to read that. You've got to understand it. You got to make sure that the AI isn't doing something like crazy, you know, like I've seen the AI just be like, "Oh, sure. I'm just going to log the key in the logs. Oops. " You know, like, right? Like, you can't ship things like that, right? So our responsibilities become very different in how we are now stewards and reviewers and and looking at the world from a lens that from my perspective I've always kind of like wanted to look at the world at like I have a very high bar for things and it's very hard for me and other people and how we all work and negotiate our different weaknesses and strengths together as a team to get to that bar that makes our software awesome and you know makes us good at what we do. I think we all want to be better at that. And I think people are really coming together to to be better. It's just going to mean now we're going to have a whole lot of new things that we're going to have to categorize and isolate on. Like we can come up with a hundred things, but we can't afford them, you know, like from a business perspective. So now all of a sudden you've got the idea ideation nightmare. You've got 1500 things you could do, but what's the thing to do, — right? What is the business to do? you could do anything now but now what do you had a strategic business plan like — yeah so I'm hearing same thing which uh goes in my head so good to know that things are matching on that side of thing that things are increasing and responsibilities are increasing but those who know me they know that I have written a lot about the coding platform engineering patterns and I'm a big fan of that whatever we can give to platforms and do it in standardized way while uh making it easier for the consumers the providers and the whole ecosystem we should do it whereas pattern I'm seeing currently because it's early age also early time so also for agentic systems it's mostly the team based implementations which is going in circles what's your view on the early platforms or doing it platform approach what's your view about that — so at my company we built it centrally and I'm really glad we did it that today because we still have a couple of decentralized systems that do run from prior to our system going live and you know we're having to negotiate ISO 420001 migrations now. — That sounds interesting. — Oh, it's fun. — Tell us more. Yeah. So our platform is focused around identities and key access based on geographical regions with open source models that we run across our GPUs and our private cloud data center. So essentially you get not just V1 chat completions and V1 embeddings but we also built an entire rag as a service system all built out around PG vector that does some amazing hybrid search that our data scientists came up with. We have like eight steps in our data pipelining for our rag system. It calls LLMs. It does um tokenization and hybrid searching and more calls to LLMs and all sorts of good stuff. um and works across you know any document type and it comes back with citations that has lineage around it and you're able to get from you know chatting with documents from source of document and and have it all tied back and it's all in one place and every business unit uses it it's tied into service now for our CMDB right so everyone uses service now and CMDB most likely right so everything is tied into CMDB

Segment 7 (30:00 - 35:00)

so whether it's our you know Geneva product or blue prison product or admin product. We got like 350 products like you know we made like 167 acquisitions when I started two years ago like you know we've grown through acquisition and growth. — So that makes me interrupt you here and ask you when did you start? — Yeah I started — if it is 350 products already integrated on a platform level for agentic systems. Well, no, no. So, when I started two years ago, there was two groups doing AI where we're like a eight billion dollar public on the NASDAQ listed 20,000 person 30,000 customer company or something like that. Okay. And when I started in the private cloud group and the our private cloud runs in multiple geographies and data centers and we're basically a fund manager for funds. We're a fund administrator. Excuse me, not manager. We're fund administrator. And then we have tax products, accounting products, automation products, health products. Oh my god, we just have products everywhere that do everything. Learning products. Like I don't even know like I I've seen some — So you're saying that it is integration of the products on your AI platform is what is centralized. — Yeah. So, and what that's done is all the systems that get built up from the V1 chat completions and the rag and doing service discovery and having a place for your A2A agent cards to go to being able to have from the ground up and everyone know that we have a center of excellence around that. We have one teams channel 24 by7 support that we run internally around that and everything is focused and just grows up from that one central place. And then we have a work HQ system um that is the agentic overlay on top of that does agent building. So that if you're not a coder, you could actually go ahead and build agents and wire them together and then have them run and orchestrate and integrate and process across data sets and do the different wiring and set your prompts and it's it's a really cool system. — It's in production. — Yeah, it's in production. It's in produ it's been in production for a while and yeah I mean billions of tokens like thousands of use cases uh UK US AWS our private cloud all sorts of fun — that's interesting so from your experience then if we get benefited so I hear you you're saying that platform approach early on because you've started and further building on layers on top of it and now that that's the reason you have agentic studios or environments and set up and people are using it actively. What is the operating model others can take from this autonomous system which can be built at scale? What is that we can learn from you? I think a lot of it is having to build the tooling for the organization and either having something where you can extend some system that will allow you to run and to do this or it's going to be something that you have to build yourself or something comes in open source. I don't think all of these systems are there yet, right? Like I I've seen a couple of other systems and folks in the industry who are doing this like there was an announcement with like Goldman Sachs and Anthropic around compliance and so they have a system now right so it's getting out there more around you know our systems for us for our AI gateway and our work HQ system you know besides us running them for our own internal products we offer we also have them like as products that we offer right like AWS has aentic components like there's a lot of different paths around where and and how those are starting to um come about and form and stabilize. I think that everything from user experience all the way down to dev sec ops have to be accounted for. Like you really have to think completely 360 about every stakeholder and every user that is now going to be interacting with and using your system. And you may have to cobble together a whole bunch of different things in order to build it. You know, you may get away with just using Envoy's gateway and writing a whole little services because you only have a shop with 25 people. Or maybe you're a large enterprise with 15,000 people that do. NET, Java, Go, and they do it in 38 different countries. And you know, who knows, right? The fundamental principles though are still the same. It's just how you build that platform out. And I don't think the platform engineering pieces are really much different than we have had before except for this new domain that has to get introduced for the new

Segment 8 (35:00 - 40:00)

things that we have to account for. So it's like platform engineering plus+ almost. Yeah, I think in terms of operating model, if I now put what you said, it's about the registration, the life cycle, the observability, the racy and some of those aspects which early on if we put together will be really helpful for the organizations to do it in the right way. — Yeah. And some when you go through your stakeholders and your systems, it's not always things that you do. It's a combination of functional and non-functional requirements. And as architects, you need to be the one responsible who says like, "Okay, we need this person to be able to go make this decision and there's this is their responsibility and the org has to go and like we need to go and build and do that and get operations people to help us, but like that has to get done here, right? It may not be an engineering task. Um, but it still is part of the overall architecture of what you're trying to accomplish. " — Agreed. I think same thing I'm hearing from uh all our guests that yes uh architects should uh take more responsibility in this case and uh help build that understanding to early on to engineers from in terms of system thinking and broader thinking that where it starts where it ends and what they need to be now more careful about those emergent behaviors. I want to touch upon now that uh what is that organizations should do whether they should wait for more standards to come platforms to come early experiment go in production what is with your experience where do you see — start yesterday um — you're talking like a now — it doesn't mean you have — like an architect to ship it to production But if you don't start to understanding what these tools can do, you'll never be able to have your mind be able to bridge the gap of what is actually now possible in your business with these tools and your competitors will. Full stop. To me, it's just that simple. You know, it's a market. Everyone's got competition. — Do you think uh they should wait for standards to emerge or support them? — I don't think so. And I'll tell you why. Like let's look at something like MCP and talk about MCP for a second. Let's say MCP is SOAP and there's going to be some new standard like REST that'll emerge that everyone's going to use. But SOAP was powerful when it came out. It allowed businesses to have financial fraud transactions and all sorts of interoperability for healthcare and HL7 in order to do computerto computer exchange and interaction of data and files and was a powerful solution back in 2002 or whatever it was. It was amazing and you know what SOAP is still around like it still exists. HL7 is still SOAP. It hasn't gone anywhere. and did a new something emerged and everybody finally went wild and everyone uses it now and everyone does data exchange. Sure, absolutely. But the folks who grasped the interactions and exchanges of data early on are the ones who could start to understand like what there may or may not be and apply it to those technologies. And sorry for the boat CXO now I'm going back there. But it's true because when you're thinking about like whether or not to start like you've got to try them out like just what's your attack surface really just start simple what's my attack surface you know is it like things on my desktop great spin up a box where you can go in Cisco and like you have a VM cloud box now go crazy and go spend a couple of days just going and trying to think about like the things you can do and how it could benefit and make things better. You don't have to go from like zero to hero, but to not use these tools to me is like saying like you don't want to use a computer when the internet is around, you know, like you don't want to use mobile phones anymore. But it's so much faster than those technologies were. Oh, you don't want to use and adopt the cloud, right? It's the same conversation we had over and over and over again. And now it's so much more fast and compact than it used to be. And it's moving so much faster. — Yeah. So stay in that zone and let's make it uh make it real. Not complicated. I said complicated but not complicated but let's make it real that now if we are wearing CXO hat we know that we have to keep the lights on. We have business to run. We have to make new things work with the existing. Right. So what are those guidances or what are those points from your experience where we merge and marry these existing with new while we play with the new what are those things which

Segment 9 (40:00 - 45:00)

you would suggest — so I think it's a combination of having some new things getting tried out from a feature perspective and while the engineers are doing that allow them to try out some new tooling at the same time. So that you're doing a combination of allowing the engineers to explore their needs and their creativity and what they need for them to be more productive, but also doing something that's critical for the business and building out a feature that maybe you can build out in, you know, three weeks instead of three months. Pretty good, right? or maybe even three days and you get to a demo and you start getting customer calls and get them excited about something. However your organization might roll, but your cycle whether it's a sales cycle or a marketing cycle or engineering life cycle, those cycles are going to be I think radically different by the end of the year of all the different things that we're going to have in the consumer marketplace that are going to start to get stabilized that we're going to start trusting. Once we trust it in the consumer, we trust it at the organization. More platforms are going to be built with more security and governance and those platforms are going to be available on the enterprise for either products or open source or just built internally as a system. — Yeah, I think uh that creativity and more ways of let engineers figure out the more ways they want to solve the problem with is an interesting part to look at. Sorry, I didn't mean to interrupt, but what you just said right there, that's going to be really hard for businesses to let go of the business owning the business requirement into the engineers's hand, — right? — You giving it in machines and forget about the engineers. — Yeah. Well, sure, however you want to look at it, say the same thing I said differently. But like maybe the engineers have a bigger problem with that than the business does. But like it's going to be problematic. That's going to be really something that every organization is going to have to deal with and they're all going to deal with it differently based on their people and culture and everything. — I see it more from the existing meeting new from the perspective that uh see I mean when it comes to agentic I always say that everything is not agentic. It is very context and situation and unless until you reimagine the whole problem space which is a new thing system you'll anyway have to carry out a new things uh new model around it but then see to it that where are those meeting points and how do you make things work together so that's uh the perspective and I hear you completely newer space newer uh opportunities more people to do that but then requirements are going in uh more broader perspective is in hands of men and machine. — With that said, let's touch upon the cost and sustainability because I know a lot of companies took this challenge of being green by 2030 or things like that with geni more cost more sustainability issues definitely yet we are not fully talking about it. What do you think is the change in terms of costing models and sustainability? Any early insights from your work? Because everybody's talking tokens. — Part of costing is also what you can afford for your requirements too. Sometimes there is no available like it's not even a money factor. Sometimes your data can't just go to another provider such as open AAI and you can use their tokens. But the way that I've looked at this and you know we run our own GPU and open source models but I look at it as from the perspective of you know we only have a fixed amount of GPU we could only run a certain amount of models and I've got 20,000 people who want 35,000 different models because they saw them on HackerNews and Reddit, right? So how do you serve the people? Right? How do you give them the models they want? And we try to tie it down and roll it around into use cases where different and certain sets of use cases will have different models and different regions that ultimately they have to get pinned to because they're in production. They can't have model drift, right? It can't be this continuous, oh, there's a new model out there and it's so much smarter and so much better. You know what? Maybe for your use case that changed the response of your prompt and you didn't want that because you liked the email that was going out and now all of a sudden your new email was so thinking and so smart that people are complaining. So the new model is not always the best model and sometimes you have to sustain models just like you do operating systems and treat them like end of life. You know I still have Llama 318B running. I still have use cases in production with Llama 318B.

Segment 10 (45:00 - 50:00)

You know I think it's running on like one chip with a couple of other models. So like no big deal but it's relative right? So we run Quen 383B 30B for thinking. Um we run Kimmy 25. We also have Kimmy 2 and Quen 235 vision instruct and thinking and we try to we have a smaller Quinn vision which is much faster because a lot of people who want vision don't need it to be smart but they need it to at least do what it needs to do from the structure perspective. So we have a whole bunch of smaller but good enough just good enough models that maybe don't have a PhD but they have a master's degree. So we run those and the master's degrees are just faster than those PhDs are. and we kind of break it down and that all comes back to cost because that time on the GPU that's waiting on the thinking model for the bigger Quen model is taking token time on the GPU and cycles from someone else coming into our platform queuing into our system waiting for that token to actually be able to process on that GPU and tick if it was a foundational model or something you're paying for at AWS or Anthropic or Azure or OpenAI or Gemini at Google, whatever. Those different models that you're paying for are going to be costs for models that maybe you can run in a different way and not have to expand the cost because you don't need to solve some muon experiment for, you know, some new theoretical physics equation. You're just doing invoice processing and that's all you need. So to me when I think about cost it has always come back down to what is the total cost of ownership of our GPUs and how do we create a multi-dimensional plane in between our GPUs for doing things like overs subscription for requests coming into different regions of what models they want to use and what tenants they have and prioritization around tenency and rate limiting. So we can maximize the four chips we have running our Quen 3 model for dev staging UAT production this application that application this boundary that boundary and it's still all funnels and works around the same four chips and we're isolating it that way. I think that you have to be thinking about it that way because that's how the big model providers thinking about it. How do they do their total cost of ownership? And I think it's now at a point where people are going to start looking at their token bills like they eventually started looking at their cloud bills where they're like, "Wait a minute, we just outsourced to AWS and it's costing us more. Oh my goodness, what happened? " — Absolutely — right. You know, that reckoning is going to come. I don't know what's coming. I have no prediction when like that will come. — I agree. I agree to your point because this is in my observation too that if you are let's say giving a platform or giving a service and until the time you expose that cost to someone and let them own their the cost of that total cost of ownership for their use case or their services what they're getting they don't get to realize it that where it lies in the chain. So it's pretty much like uh make your children uh learn early in the game that how to use money wisely. That's a very good point. With your experience, if you have done any mistakes or early principles or anything you want to give to the builders and engineers and architects who are in the moment designing those systems, what that principle or learning experience would be that you would say that take this early on with agentic systems in particular? Just right off the top of my head and something that almost gives me shivers down my spine, I would say that my biggest failure over the last year and a half of doing this has been my success. The system exploded with usage so fast because everyone was like, "Wait, we could do V1 check completions and all we have to do is go to a website and download a CLI and all of a sudden poof, we can make V1 check completions and there's an image model and we can start sending in all of our images that we were never able to do before and now we can start getting business in mail rooms and all this type of like new type of opportunities that just spraw crawled within the organization over like 3 to 6 months. It was exciting, it was fun, but it was uh constant firefighting, you know, like the train was rolling at 90 miles an hour and we were just trying to get enough track so that you know, there was no stopping at the station, right? like we're just trying to get enough tracks so we can loop to slow down maybe one day so we can stop at a station, you know, which we eventually did at the end of the year and that was good and you know, but it takes a while. Good problems to have kind of thing, right? But those were severe and it wasn't

Segment 11 (50:00 - 54:00)

hype. It wasn't just like everyone tried it out, you know? Everyone who tried it out was using it for something. they had like some tangible thing that they found that they could apply it to in their day-to-day that like helped them out and they became a user, you know, and it was exciting, but it was also a lot of incoming, a lot of structure that we didn't have, a lot of organizational support to run it that we had to put into place, a lot of new software that had to get built that we went live with an MVP. you know, we were we only had a couple of users like when we kicked this off, you know, they were just going to try it and then all of a sudden we had 250 users within three months and they were like half of them were in prod right you know and it was in red big letters with an underline and it says do not go into prod with this but you know it happens and you know DR gets set up and we build and we make things work and it becomes reliable. and it works. — Yeah. So, be ready to explore. Don't fret over it too much. Don't take stress and be prepared for scale early on is what I'm hearing. Am I right? — Yeah. The worst thing that could happen is someone actually takes what you've done and goes live with it. Um, there's so many new boundaries and things to be considering. And for all of my experience, I feel like I started just fresh all over again. So let's call it this way. Be in hurry to learn but don't rush to take halfhearted or half solution out there to create more problems. Wonderful. With that said, maybe a last question. If let's say you and I meet again in December, what new we will be discussing about? I'm sure there will be lot more happening during this time from February we are meeting to December. What is your prediction? ready prediction early on. — Yeah, I think we're going to start seeing the boundary coming into the workplace and the consumer with hardware just in what I'm seeing in what you can do now with the software automation on these simple hardware devices. Um, and I'm not saying robots, it's not about the robots, but you could even look at something like, you know, something like Alexa and like my Alexa, every day I say the same thing to Alexa. I'm not gonna say what it is now because all of a sudden you're gonna start hearing it, right? But like I should be able to do natural language programming with my Alexa and say like whenever you're playing a music station and I want to know the you know who's playing it, um you should just do that for me. I shouldn't have to ask you. You should always tell me who's playing it then play it for me. I don't want a programmer at Alexa, some music company to go and sit with a product manager and make that decision for me. I want to be the product manager for my own product of yours, not you obviously, but of of the new world, right? And be able to shape my interaction. And I think we're going to start seeing assistants working with assistants and agents working with agents. And as these companies start building their own agents, they're going to start working. Like you're not going to be emailing invoices from one agentic system over email to another agentic system in order for ARAP and get that all working. That transmission mechanism like fax went away and turned into email. Email will go away and maybe turn into AAA or some A2A or some new standard or something like that. And I see the world starting to grow and go crossorgganization. — Very interesting. And u I'm sure technology will surprise us but a lot more responsibility which also is our duty to call out at the end. Thank you Joe for joining. I'm glad you could do and we were doing a lot of discussions in the background but it's finally coming to a to this shape whatever form and shape it has come we will see it. Thank you so much. — Thank you. Bye.

Другие видео автора — InfoQ

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник