@TurdBurgler

TurdBurgler@sh.itjust.works · edit-2 2 hours ago

That’s a straw man.

You don’t know how often we use LLM calls in our workflow automation, what models we are using, what our margins are or what a high cost is to my organization.

That aside, business processes solve for problems like this, and the business does a cost benefit analysis.

We monitor costs via LiteLLM, Langfuse and have budgets on our providers.

Similar architecture to the Open Source LLMOps Stack https://oss-llmops-stack.com/

Also, your last note is hilarious to me. “I don’t want all the free stuff because the company might charge me more for it in the future.”

Our design is decoupled, we do comparisons across models, and the costs are currently laughable anyway. The most expensive process is data loading, but good data lifecycles help with containing costs.

Inference is cheap and LiteLLM supports caching.

Also for many tasks you can run local models.

TurdBurgler@sh.itjust.works · edit-2 6 hours ago

It’s professional development of an emerging technology. You’d rather bury your head in the sand and say it’s not useful?

The reason not to take it seriously is to reinforce a world view instead of looking at how experts in the field are leveraging it, or having discourse regarding the pitfalls you have encountered.

The Marketing AI hype cycle did the technology an injustice, but that doesn’t mean the technology isn’t useful to accelerate determistic processes.

TurdBurgler@sh.itjust.works · 6 hours ago

It depends on the methodology. If you’re trying to do a direct port. You’re probably approaching it wrong.

What matters to the business most is data, your business objects and business logic make the business money.

If you focus on those parts and port portions at a time, you can substantially lower your tech debt and improve developer experiences, by generating greenfield code which you can verify, that follows modern best practices for your organization.

One of the main reasons many users are complaining about quality of code edited my agents comes down to the current naive tooling. Most using sloppy find/replace techniques with regex and user tools. As AI tooling improves, we are seeing agents given more IDE-like tools with intimate knowledge of your codebase using things like code indexing and ASTs. Look into Serena, for example.

TurdBurgler@sh.itjust.works · edit-2 2 hours ago

Accelerated delivery. We use it for intelligent verifiable code generation. It’s the same work the senior dev was going to complete anyway, but now they cut out a lot of mundane time intensive parts.

We still have design discussions that drive the backlog items the developers work off with their AI, we don’t just assign backlog items to bots.

We have not let loose the SaaS agents that blindly pull from the backlog and open PRs, but we are exploring it carefully with older projects that only require maintenance.

And yes, we also use chore bots that are determinstic for maintainance, but these are more small changes the business needs.

There are in fact changes these agents can make well.

TurdBurgler@sh.itjust.works · edit-2 12 hours ago

Early adopters will be rewarded by having better methodology by the time the tooling catches up.

Too busy trying to dunk on me than understand that you have some really helpful tools already.

TurdBurgler@sh.itjust.works · 12 hours ago

This is why I say some people are going to lose their jobs to engineers using AI correctly, lol.

TurdBurgler@sh.itjust.works · edit-2 11 hours ago

What are you even trying to say? You have no idea what these products are, but you think they are going to fail?

Our company does market research and test pilots with customers, we aren’t just devs operating in a bubble pushing AI.

We are listening and responding to customer needs and investing in areas that drive revenue using this technology sparingly.

TurdBurgler@sh.itjust.works · 12 hours ago

These tools are mostly determistic applications following the same methodology we’ve used for years in the industry. The development cycle has been accelerated. We are decoupled from specific LLM providers by using LiteLLM, prompt management, and abstractions in our application.

Losing a hosted LLM provider means we prox6 litellm to something out without changing contracts with our applications.

TurdBurgler@sh.itjust.works · 13 hours ago

Well, I typed it with my fingers.

TurdBurgler@sh.itjust.works · 13 hours ago

Incorrect, but okay.

TurdBurgler@sh.itjust.works · edit-2 11 hours ago

We use a layered architecture following best practices and have guardrails, observability and evaluations of the AI processes. We have pilot programs and internal SMEs doing thorough testing before launch. It’s modeled after the internal programs we’ve had success with.

We are doing this very responsibly, and deliver a product our customers are asking for, with the tools to help calibrate minor things based on analytics.

We take data governance and security compliance seriously.

TurdBurgler@sh.itjust.works · edit-2 1 day ago

While it’s possible to see gains in complex problems through brute force, learning more about prompt engineering is a powerful way to save time, money, tokens and frustration.

I see a lot of people saying, “I tried it and it didn’t work,” but have they read the guides or just jumped right in?

For example, if you haven’t read the claude code guide, you might have never setup mcp servers or taken advantage of slash commands.

Your CLAUDE.md might be trash, and maybe you’re using @file wrong and blowing tokens or biasing your context wrong.

LLMs context windows can only scale so far before you start seeing diminishing returns, especially if the model or tools is compacting it.

Plan first, using planning modes to help you, decomposition the plan
Have the model keep track of important context externally (like in markdown files with checkboxes) so the model can recover when the context gets fucked up

https://www.promptingguide.ai/

https://www.anthropic.com/engineering/claude-code-best-practices

There are community guides that take this even further, but these are some starting references I found very valuable.

TurdBurgler@sh.itjust.works · 1 day ago

In my opinon, Codex is fine, but copilot has better support across AI providers (mode models), and Claude is a better developer.

TurdBurgler@sh.itjust.works · 1 day ago

Sure thing, crazy how anti AI lemmy users are!

TurdBurgler@sh.itjust.works · 1 day ago

I get it. I was a huge skeptic 2 years ago, and I think that’s part of the reason my company asked me to join our emerging AI team as an Individual Contributor. I didn’t understand why I’d want a shitty junior dev doing a bad job… but the tools, the methodology, the gains… they all started to get better.

I’m now leading that team, and we’re not only doing accelerated development, we’re building products with AI that have received positive feedback from our internal customers, with a launch of our first external AI product going live in Q1.

TurdBurgler@sh.itjust.works · edit-2 1 day ago

If you’re not already messing with mcp tools that do browser orchestration, you might want to investigate that.

For example, if you setup puppeteer, you can have a natural conversation about the website you’re working on, and the agent can orchestrate your browser for you. The implication is that the agent can get into a feedback loop on its own to verify the feature you’re asking it to build.

I don’t want to make any assumptions about additional tooling, but this is a great one in this space https://www.agentql.com/

TurdBurgler@sh.itjust.works · 1 day ago

That’s a great methodology for a new adopter.

Curious if you read about it, or did it out of mistrust for the AI?

TurdBurgler@sh.itjust.works · edit-2 1 day ago

Great? Business is making money. I already explained we have human reviewed PRs on top of full test coverage and other validations.

We’re compliant on security policies at our organization, and we have no trouble maintaining what the current code we’re generating because it’s based on years of well defined patterns and best practices that we score internally across the entirety of engineering at our organization.

As more examples in the real world:

Aider has written 7% of its own code (outdated, now 70%) | aider https://aider.chat/2024/05/24/self-assembly.html

https://aider.chat/HISTORY.html

LibreChat is largely contributed to by Claude Code, it’s the current best open source ChatGPT client, and they’ve just been acquired by ClickHouse.

https://clickhouse.com/blog/clickhouse-acquires-librechat

https://github.com/danny-avila/LibreChat/commits/main/

Such suffering from the quality! So much worse than our legacy monolith!

TurdBurgler@sh.itjust.works · edit-2 1 day ago

Cursor and Claude Code are currently top tier.

GitHub Copilot is catching up, and at a $20/mo price point, it is one of the best ways to get started. Microsoft is slow rolling some of the delivery of features, because they can just steal the ideas from other projects that do it first. VScode also has extensions worth looking at: Cline and RooCode

Claude Code is better than just using Claude in cursor or copilot. Claude Code has next level magic that dispells some of the myths being propagated here about “ai bad at thing” because of the strong default prompts and validation they have built into it. You can say dumb human ignorant shit, and it will implicitly do a better job than others tools you give the same commands to.

To REALLY utilize claude code YOU MUST configure mcp tools… context7 is a critical one that avoids one of those footguns, “the model was trained on older versions of these libraries.”

Cursor hosts models with their own secret sauce that improves their behavior. They hardforked VSCode to make a deeper integrated experience.

Avoid antigravity (google) and Kiro (Amazon). They don’t offer enough value over the others right now.

If you already have an openai account, codex is worth trying, it’s like Claude Code, but not as good.

JetBrains… not worth it for me.

Aider is an honorable mention.

TurdBurgler@sh.itjust.works · 1 day ago

We have human code review and our backlog has been well curated prior to AI. Strongly definitely acceptance criteria, good application architecture, unit tests with 100% coverage, are just a few ways we keep things on the rails.

I don’t see what the idea of paircoding has to do with this. Never did I claim I’m one shotting agents.