• 0 Posts
  • 22 Comments
Joined 3 days ago
cake
Cake day: December 1st, 2025

help-circle
  • That’s a straw man.

    You don’t know how often we use LLM calls in our workflow automation, what models we are using, what our margins are or what a high cost is to my organization.

    That aside, business processes solve for problems like this, and the business does a cost benefit analysis.

    We monitor costs via LiteLLM, Langfuse and have budgets on our providers.

    Similar architecture to the Open Source LLMOps Stack https://oss-llmops-stack.com/

    Also, your last note is hilarious to me. “I don’t want all the free stuff because the company might charge me more for it in the future.”

    Our design is decoupled, we do comparisons across models, and the costs are currently laughable anyway. The most expensive process is data loading, but good data lifecycles help with containing costs.

    Inference is cheap and LiteLLM supports caching.

    Also for many tasks you can run local models.


  • It’s professional development of an emerging technology. You’d rather bury your head in the sand and say it’s not useful?

    The reason not to take it seriously is to reinforce a world view instead of looking at how experts in the field are leveraging it, or having discourse regarding the pitfalls you have encountered.

    The Marketing AI hype cycle did the technology an injustice, but that doesn’t mean the technology isn’t useful to accelerate determistic processes.


  • It depends on the methodology. If you’re trying to do a direct port. You’re probably approaching it wrong.

    What matters to the business most is data, your business objects and business logic make the business money.

    If you focus on those parts and port portions at a time, you can substantially lower your tech debt and improve developer experiences, by generating greenfield code which you can verify, that follows modern best practices for your organization.

    One of the main reasons many users are complaining about quality of code edited my agents comes down to the current naive tooling. Most using sloppy find/replace techniques with regex and user tools. As AI tooling improves, we are seeing agents given more IDE-like tools with intimate knowledge of your codebase using things like code indexing and ASTs. Look into Serena, for example.


  • Accelerated delivery. We use it for intelligent verifiable code generation. It’s the same work the senior dev was going to complete anyway, but now they cut out a lot of mundane time intensive parts.

    We still have design discussions that drive the backlog items the developers work off with their AI, we don’t just assign backlog items to bots.

    We have not let loose the SaaS agents that blindly pull from the backlog and open PRs, but we are exploring it carefully with older projects that only require maintenance.

    And yes, we also use chore bots that are determinstic for maintainance, but these are more small changes the business needs.

    There are in fact changes these agents can make well.








  • We use a layered architecture following best practices and have guardrails, observability and evaluations of the AI processes. We have pilot programs and internal SMEs doing thorough testing before launch. It’s modeled after the internal programs we’ve had success with.

    We are doing this very responsibly, and deliver a product our customers are asking for, with the tools to help calibrate minor things based on analytics.

    We take data governance and security compliance seriously.


  • While it’s possible to see gains in complex problems through brute force, learning more about prompt engineering is a powerful way to save time, money, tokens and frustration.

    I see a lot of people saying, “I tried it and it didn’t work,” but have they read the guides or just jumped right in?

    For example, if you haven’t read the claude code guide, you might have never setup mcp servers or taken advantage of slash commands.

    Your CLAUDE.md might be trash, and maybe you’re using @file wrong and blowing tokens or biasing your context wrong.

    LLMs context windows can only scale so far before you start seeing diminishing returns, especially if the model or tools is compacting it.

    1. Plan first, using planning modes to help you, decomposition the plan
    2. Have the model keep track of important context externally (like in markdown files with checkboxes) so the model can recover when the context gets fucked up

    https://www.promptingguide.ai/

    https://www.anthropic.com/engineering/claude-code-best-practices

    There are community guides that take this even further, but these are some starting references I found very valuable.




  • I get it. I was a huge skeptic 2 years ago, and I think that’s part of the reason my company asked me to join our emerging AI team as an Individual Contributor. I didn’t understand why I’d want a shitty junior dev doing a bad job… but the tools, the methodology, the gains… they all started to get better.

    I’m now leading that team, and we’re not only doing accelerated development, we’re building products with AI that have received positive feedback from our internal customers, with a launch of our first external AI product going live in Q1.


  • If you’re not already messing with mcp tools that do browser orchestration, you might want to investigate that.

    For example, if you setup puppeteer, you can have a natural conversation about the website you’re working on, and the agent can orchestrate your browser for you. The implication is that the agent can get into a feedback loop on its own to verify the feature you’re asking it to build.

    I don’t want to make any assumptions about additional tooling, but this is a great one in this space https://www.agentql.com/




  • Cursor and Claude Code are currently top tier.

    GitHub Copilot is catching up, and at a $20/mo price point, it is one of the best ways to get started. Microsoft is slow rolling some of the delivery of features, because they can just steal the ideas from other projects that do it first. VScode also has extensions worth looking at: Cline and RooCode

    Claude Code is better than just using Claude in cursor or copilot. Claude Code has next level magic that dispells some of the myths being propagated here about “ai bad at thing” because of the strong default prompts and validation they have built into it. You can say dumb human ignorant shit, and it will implicitly do a better job than others tools you give the same commands to.

    To REALLY utilize claude code YOU MUST configure mcp tools… context7 is a critical one that avoids one of those footguns, “the model was trained on older versions of these libraries.”

    Cursor hosts models with their own secret sauce that improves their behavior. They hardforked VSCode to make a deeper integrated experience.

    Avoid antigravity (google) and Kiro (Amazon). They don’t offer enough value over the others right now.

    If you already have an openai account, codex is worth trying, it’s like Claude Code, but not as good.

    JetBrains… not worth it for me.

    Aider is an honorable mention.