There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good ... The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away ... I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
-Andrej Karpathy, February 2, 2025
The writing's on the wall, if the narrative of silicon valley's giants is to be believed: Nobody writes code anymore. An embellishment, perhaps, but it is undeniable that the rate of change in software development practices in the last year has been unfathomable compared to anything in the history of the field. When I tell my colleagues it has only been a year since we were introduced to "vibe coding", I am met with skepticism, confusion- surely it has been here longer than that. We all feel the change. We know how long change takes. In a time too short for us to wrap our heads around, since chain-of-thought models became a mundane daily reality, since code correctness rapidly vanished as a question of LLM output, LLM code generation has gone from being an industry laughing stock to a tool as commonplace as an IDE.
And at the same time, if you are an experienced developer who remembers the time before, you are likely reading this with a strong sense of skepticism yourself. You have seen wildly incorrect generated code. You have seen diffs too long to reasonably read or comprehend. You have heard the horror stories of startup CEOs letting their agents drop their production tables. You know vibe coding is a path to decay and destruction, and that very well might be your perception of LLM code generation on the whole.
Let me break my tone for a moment to be very transparent about what you are reading, and by whom: I am a human, the kind who still actually write blog posts. You may have written me off already because I used an em dash in my first paragraph or grouped a thought into three examples, but I will not compromise how I express my writing to differentiate myself from machines that would imitate me. I am a software engineer. I am not a pure beacon of anti-AI rejection of the new wave. I use code generation tools. At times, I move fast and break things. I work for a large company where LLMs are becoming part of my daily work aggressively, but perhaps still more slowly than in my personal project life. I share the concerns of my peers about code quality, pace, and the ethics problems found in our seemingly eternal AI summer. I did not welcome new tools with open arms, but with caution and a drive to test efficacy.
The discourse surrounding the changes in software is inescapable, and seemingly draws hard lines. Are you vibe coding? Are you an artisan programmer? Are you the only thing stopping your agents from deploying a billion dollar startup? Do you continue to weigh the options for your approach, looking for the best solution? Do you know your architecture? Or does it just work?
From the moment giving in to the vibes was presented as an option, we were warned about incomprehensibly large codebases and errors we would have no chance at getting fixed by agents. We knew that this approach would give us "good enough," at costs that we have only continually see grow in the year since. Unsustainable development practices can be accepted in an environment where racing to market defines success, but compound when the project needs to scale to meet that market.
Hope is not a strategy.
-Site Reliability Engineering: How Google Runs Production Systems, 2016
Software engineering is not a nebulous dark art that can only be understood through madness. It is a field that has been forged through trial and error, with best practices propagated among its proponents and industry conventions shaped by what works. It is a skill, learnable and taught, that has been fundamental to the construction of the technological infrastructure needed to realize our current landscape. The rise of vibe coding suggests that, for many, these skills are the barrier to development: why learn arcane best practices when GPT moves fast purely off of a vision? And still, the horror stories remain, the code bases remain unreadable, and the solutions prove to be subpar and brittle. By both subjective and metric driven analysis, vibe coded output just kind of sucks. For those who do not have and do not care to develop software engineering skills this doesn't matter, but for veterans of the field this is a compelling enough reason to reject the new way.
The alternative, the way to wielding code generation tools effectively, may be more obvious than this lets on: Reject vibe coding, and invite the LLMs into software engineering. Nobody understands the technical requirements of your service more than you, and nobody can make better planning decisions for it. To drive development, no matter how it is done be it by hand, through designs handed down to other developers, or through a machine, you must understand what is being built in order to build it effectively. To build purely off of ideation is to surrender understanding and agency in the process of construction, and to let unaddressed weaknesses through. Vibes are not a strategy.
What does this look like in practice? Document designs for systems and features before building them- this is standard fare in software engineering, but an easily overlooked step in a time where implementation is free. For smaller changes, this design work for me can often be as simple as extensively describing how I want a thing built in a paragraph or two rather than simply what the result should be. I won't even say that you shouldn't use LLMs to assist with writing these, particularly for larger tasks. But always review what has been produced, regardless of who produced it, even yourself: Does it make sense? Are there potential problems not considered? Seek other review, even if from LLMs. Do not immediately make changes in response to critique; think through the issue, consider tradeoffs, make the decision that seems best for what you need. Nobody understands the technical requirements better than you do.
Then, use these designs to prompt implementation. If it is a large design, break it up into chunks that can reasonably be implemented in a single step. Do not let the task be overscoped. Write tests if you aren't allergic, and generate tests and validate their accuracy personally if you are. Review the implementation. If it is too verbose to justify reading, if it produces significant redundancies, or if it blatantly does not fit the spec, then either the prompt is wrong or the task is overscoped. Vibe coding proponents may argue that this level of granularity is no faster than traditional software engineering. But implementation was never the hard part. Quality was.
Thou shalt not make a machine in the likeness of a human mind ... Beware the seeds you sow and the crops you reap. Do not curse God for the punishment you inflict upon yourself.
-The Orange Catholic Bible, Dune, 1965
So generated code quality sucks, and the way to improving it is the same practices we've used to produce quality code for decades. I don't want to just leave this post on that note though; I do not think this is an inevitability of our current methods, and I do think it is a problem that will be solved eventually. The fact that software engineering is a teachable skill, and that I can describe to you a process for improving output quality, is a testament to the fact that there is a process that can be replicated. How do we get there?
First, I think we continue developing objective measurements of what quality code looks like. Elegance is subjective, but SlopCodeBench provides some promising initial heuristics for grading behavior that we should be avoiding. The only way we can train better structured output is to understand what we are seeking.
Second- and this is less founded and more personal speculation- I think we will need to move away from a process where a request is directly translated into a chain of thought that produces the implementation. Quality human development requires planning, challenge of that plan, and testing. Tooling today is moving towards this with agents that write plans before implementing, but I think we could view this more as a process of compilation. Imagine a system capable of processing higher levels of abstraction into lower level ones:
Ideas present a vision, and are translated into features.
Features present chunks of the idea that are needed to achieve it, and are translated into designs.
Designs describe how a feature is best approached, and are reviewed and iterated on until they are suitable for defining criteria and can be translated into tests.
Tests describe what the code should do, and what the expected results are. A well written test should be resilient to reward hacking, and give a clear path to what implementation should achieve it.
And implementation realizes the idea.
To achieve a system that can follow such a process, each step would need thorough attention to train quality translation to the one below it. This may ultimately decay at a certain level; without a clearly defined feature set, the idea may never even fit the human expectation. But I am hopeful that such a process can be carried to a certain level sufficient to get better results from feature request prompts, and that eventually the compromises we see in quality or development efficiency can be resolved. But it must be considered a priority. We can not accept poor code quality as the path forward if we want it to ever improve.
Thanks for reading. Go make sure your agent isn't plotting character assassination against open source maintainers.