While CRISPR is great at cutting DNA, getting new sequences to slot neatly into the genome is trickier. Standard methods rely on the cell’s homology-directed repair pathway, which needs long stretches of matching DNA and works only in dividing cells. Other repair routes, like nonhomologous end joining and microhomology-mediated end joining, work in a wider range of cells but tend to cause unwanted deletions at the junction where new DNA meets the genome.
Looking for new ways to insert DNA precisely without unwanted damage, a research team hypothesized that the quirks of microhomology-mediated end joining (MMEJ) could be exploited. Past studies have shown that MMEJ doesn’t act randomly, often using short, matching sequences called microhomologies (µHs) near the break site, and this choice can be predicted by deep learning algorithms. The researchers believed that designing donor DNA that matched these predictable patterns would guide the repair process toward cleaner, more reliable integrations.
Using computer models, they tested hundreds of thousands of potential guide RNA (gRNA) sites in the human genome. The simulations showed that repeating the right µH sequence at the ends of the donor DNA increased the chance it would be used in the repair, peaking at about five repeats. Crucially, the “best” repeat sequence varied depending on the local DNA context, so each target site needed a custom design.
Armed with these predictions, they built donor cassettes flanked by five repeats of a 3-base µH tailored to the target site. In human HEK293T cells, the results matched the predictions closely: more of the inserted DNA stayed in frame, and there was less trimming of the surrounding genome. Making the donor DNA linear (rather than circular) improved integration rates, and the boundary sequences after integration matched what the algorithm had forecast.
They also tested the approach in a medical context by inserting a chimeric antigen receptor (CAR) sequence into the TRAC locus – a clinically relevant target for engineered T cells. Again, short µH repeats guided the integration, producing cleaner junctions than traditional methods such as homology-independent targeted insertion. When compared head-to-head, integration efficiencies were similar, but the µH method caused far fewer large deletions, preserving both genome and inserted sequence integrity.
The team dug deeper into what affects success and found that the exact DNA letters making up the µH repeats mattered. In fact, some worked better than others depending on the surrounding sequence. This confirmed that a “one size fits all” design wouldn’t work; instead, the optimal µHs must be computed for each target site.
This approach worked well in situations where homology-directed repair fails. In fast-dividing western clawed frog (Xenopus tropicalis) embryos, the method produced germline-transmissible edits, meaning the changes could be inherited. In adult mouse brains, where neurons are non-dividing, it allowed precise tagging of proteins in their natural genomic context. These are environments where precise DNA integration is usually extremely challenging.
To help others apply this strategy, they have built an online tool called Pythia, which combines deep learning predictions with rules for µH design, letting users enter a CRISPR target site and get tailored donor designs to maximize precision.
Senior author Soeren Lienkamp said, “What excites us most is not only the technology itself, but also the possibilities it opens. Pythia brings together large-scale AI prediction with real biological systems. From cultured cells to whole animals, this tight loop between modeling and experimentation points is becoming increasingly useful, for example, in precise gene therapies.”
The researchers didn’t stop at large DNA cassettes. They adapted the same design principles for much smaller changes, such as swapping a single letter of DNA. Using short single-stranded DNA donors and carefully chosen µHs, they introduced single- or double-letter changes without leaving any extra sequence “scars”. This worked both in cultured cells and in living animals, showing that the method is flexible enough for a range of edit sizes.