Building velocity is all that matters
At Keplogic, our flagship product is Kepler AI (getkepler.ai): enterprise-grade bioinformatics agents accelerating scientific discovery. Our codebase spans frontend, backend, and cloud compute infrastructure, serving hundreds of enterprise users in biotech and pharma companies. As a startup, our main advantage is always laser focus and building speed. Over the past two months, we have accumulated more than 1300 commits, encompassing over 1.3 million lines of code changes. This rapid development is a direct result of our foundation as an AI-native startup, where AI tools are integrated into every aspect of our workflow.
These integrations have unlocked massive quantitative and qualitative improvements in our company building:
Quantitative: Each of our team members, including interns, is operating at a pace significantly exceeding traditional manual development. The time elapsed from ideation to pull request (PR) to deployment has decreased from several days to a few hours.
Qualitative: AI tools enable all members to become directly integrated into the product development cycle. Traditionally, organizational structures imposed various boundaries, such as distinctions between technical and non-technical roles, frontend and backend development, and research and engineering. Within an AI-native startup, any issue or potential improvement can be addressed directly rather than adding to communication overhead. Instead of contacting the relevant team and waiting for action, individuals can implement code changes directly with AI coding agents. Over time, this will spread to other types of work such as project management, design, etc.
Show me some numbers!
Devin is already the #1 contributor in our repo by number of commits:
And the frequency of commits is growing over time:
The quality is increasing as well - our success rate from creating Devin task to merging has increased from 20% to over 50% even when the type of tasks are getting harder:
As a result, our deployment frequency (number of frontend and backend builds) is growing rapidly:
The Strategic Momentum Behind AI-Native Building
This transformation isn't just about immediate productivity gains—it's about positioning ourselves to capture an exponentially growing advantage. Building an AI-native company allows us to harness momentum from two fundamental shifts:
Cost of intelligence dropping exponentially: As AI capabilities improve while costs plummet, an AI-native company can capture exponentially growing efficiency gains. Once the company is architected correctly, the AI readiness poises the business to take full advantages as it gets 10x better next year.
Reshaped company structure: AI tools are diminishing the boundaries between roles and responsibilities, enabling us to build a fundamentally different organizational structure. An AI-native incentive system empowers high-performance employees to become 10x more capable than their traditional counterparts while gaining exposure to 10x more aspects of the product. We're building a company that embraces more "founders"—individuals who truly care about the product and business and can directly contribute to all major aspects without traditional boundaries. This company culture creates an unbeatable competitive advantage that continuously reinforces itself.
( LLM cost is decreasing by 10x each year )
( LLM coding capabilities keeps growing. )
Remaining Challenges: Vibe coding doesn’t always work
Although vibe coding is a buzzword and it feels like every developer and every company is adopting AI tools for coding, there are serious challenges when you apply it to large code base and enterprise environment:
Prototype ≠ Production: The biggest illusion from AI output is that they look correct at first, but after you look into the details, many things are wrong. Various attempts have been made to build entire SaaS products using AI tools and most of them failed with security issues, architecture issues and performance issues.
Controllability: many times our short prompts lacks enough information to convey what we actually need and how tasks should be handled upon edge cases, and the controllability dramatically decreases as the context grows larger. This means that beyond the efforts to control the AI tool output itself, we also need to have deterministic measures to ensure the quality of key outputs, and today there is a huge gap in this toolset.
Verifiability: Reviewing & guard-railing changes is even more difficult than writing new code. It’s very common that: 1) the agent outputs hundreds to thousands of lines of changes in each PR, making it very time consuming to review; 2) the agent’s design patterns are different from the existing code base– instead of following existing patterns, it comes up with various new structures, making it hard to maintain. 3) unless documentation is very explicit, agents will break many existing assumptions causing various regressions.
These issues are all results of limitations in LLMs’ core capabilities:
Long term planning & reasoning: LLMs still can’t coherently reason over long enough content. Although their context length has surpassed what humans can read, human reasoning over long horizon tasks is much more consistent and focused than LLMs.
Multi-modal & computer interface: Agents still can’t easily recognize images and control computers– this is especially crucial for frontend related testing.
Reward hacking: Agents hack the result and make workarounds instead of addressing the core issue.
A solution: Build an agent-friendly company from the ground up
Instead of prompting and praying every time, we need to fully understand where it works and doesn’t work, and how to architect the company to effectively apply correct leverage and avoid risks. Here is how we are architecting the company to be agent friendly:
Mindset:
Don’t trust agents but use them as often as possible: you should try to use AI tools for every task in your work, but you alone are responsible for the quality of the outcome.
Be prepared for AI to mess up: intentional or unintentional, AI can mess up many things. The good thing is that you can kick off agents in various directions, and even if they only get some of them done, that’s already big savings. And even when agents mess up, often times their planning/exploration process is super helpful context for the human engineer to figure out the correct approach.
Recruiting:
Recruiting is especially crucial for building an AI- native company. Humans are taking more and more important roles in the product building process, with each member contributing 10x compared to employees in non-AI native companies. We especially value these characteristics in candidates with regards to using AI:
Navigating uncertainty: How open are you to exploring uncertainty and how confident are you in gradually reducing uncertainty through iterative learning and adjustments.
Openness: How much do you embrace AI in your workflow and how do you facilitate a sustainable relationship between your judgement and AI’s output
Taste: How effectively can you review and judge AI output and adjust AI tools to be better aligned with your future intents.
We are actively hiring AI-native team members to further explore this ambitious and interesting topic. Please let me (quinn@keplogic.com) know if you are interested in joining us!
Infrastructure:
Accessibility: Connect to all essential data sources and workflows: Github, Slack, Linear, docs. This is the first step to preparing the context for AI tools. Only when agents have the same level of access to human knowledge, will they be able to work at human level quality.
Access control:
Access control: separate role for agents in every workflow, only give permissions that they actually need.
Isolation: creating a separate environment for development and production. Together with access control, these two are the most essential safety measures against agents making mistakes.
Testing & CI/CD:
Test coverage from day one: effective test coverage will act as natural guardrails. When agents make changes, their changes will be verified on the core logic and outcomes. This means adopting Test Driven Development. When working on a task, first write a design doc with core requirements, materialize them as test cases (with the help of AI, of course), and then get agents to work on the actual implementation in a stepwise manner. This will both ensure that agents fully understand the requirements, and also avoid the implementation from derailing during various steps.
The entire end-to-end software development lifecycle should be available to the agent: creating PRs, testing changes, triggering deployment, debugging issues.
Our recommendations for using Devin & Cursor:
Humans setting the standard:
Due to limitations in LLMs’ long term planning capabilities, we still need human experts to set the standard for architecture, guidelines and design principles. This means that for core designs, we still need to have experts in the loop for constructing initial requirements and reviewing core design decisions.
Split it up:
We found that for Devin, 5 ACUs (which I suspect is where Devin’s LLM starts to reach context length limit) is where you start to feel LLM losing focus/alignment on your task and requirements. The Devin team has also recommended that if a task is taking up more than 5 ACU, it is best to break it down into smaller tasks.
For Cursor, two pages of tool calls is roughly where it starts to hit the context limit. Try to break down the tasks, and ensure each task contains minimal but sufficient documents and descriptions before sending them off.
Collaboration between humans and agents:
Devin has been improving on interacting with the frontend, loading up the UI, interacting with the basic elements and debugging. For the rest of the interactions that are more advanced, we found Vercel’s preview feature especially useful. You can get Devin to complete the first half of the easy tasks and then push a PR, and then humans can debug the rest of the frontend in Vercel preview. For full-stack features, we have manually connected Render previews and Vercel previews so we can test the entire PR before merging.
If you enjoyed this read, and want to join our AI-native startup building journey, please ping me at quinn@keplogic.com!