Remove email from Git history before brokers scrape it
Learn how to remove email from Git history, clean up hosting profiles, and lower the chance of old contact data being scraped by brokers.

Why an old email stays public
A Git commit is more than a saved code change. It also stores author details, and that often includes an email address. If you used a personal, school, or old work email years ago, that address can stay attached to the commit for as long as the history exists.
That catches people off guard because changing your profile usually does not fix it. Your current account settings can be private, but old commits still keep the original metadata unless you rewrite the history itself.
Public repositories make the problem worse. Once a repo is visible, other people can fork it, clone it, mirror it, archive it, or copy part of it into another project. Even if you later delete the repo or update your profile, those copies can keep the old email alive in places you do not control.
This shows up all the time in side projects. A student pushes code with a college address, graduates, and loses access to that inbox. Someone else uses a personal Gmail account for a weekend project, forgets about it, and leaves the repo public for years. The code gets stale, but the contact data does not.
Scrapers and data brokers do not need to message you to collect that address. They can pull it straight from public commit history, profile activity, exported archives, and copied repositories. Your email can end up on people-search sites or marketing lists without any warning in your inbox.
That is why people try to clean up old commit emails after the fact. The real issue is not one bad commit. It is that public code history is easy to copy, easy to index, and hard to fully pull back once it has been out in the open for a while.
Where your contact data can still appear
If you want to clean this up, do not look only at the repository home page. Your address can show up in several layers of the same project, and some of them are easy to miss.
The first place is the commit data itself. Every commit can store both an author and a committer identity, and each one may include a name and email. If you used a personal address on an old laptop and later switched to a no-reply address, the older commits can still keep the original one.
On code hosting sites, that information often becomes easy to browse. A commit page may show the email directly or connect it to an account, avatar, or verified profile. In some cases, a verified email helps tie repository activity back to a profile page, which makes the address easier to scrape and match with other records.
The trail also spreads beyond the main repo. Forks can keep the same old commits. Mirrors can copy the full history to another host. Anyone who cloned the repo already has a local copy. Patch files may include the author line in plain text, and exported archives or backups can preserve old metadata for years.
That is why a cleanup can feel incomplete even after a rewrite. You may fix the main branch, but an old patch shared in a bug report or a mirror on another service can still expose the same address.
A small example is common: you push a side project with your personal email, make the repo public, and years later old commit pages, patch downloads, and profile signals still point to the same inbox. That gives scrapers one more place to collect it.
Check every place where commit metadata is displayed, copied, or exported. The repository page is only the obvious part.
How to see what is exposed
Before you start rewriting history, find every public place where the old address still appears. If you skip that step, you may clean one repo and miss three others you forgot about.
Start with the obvious pages. Open recent commits in the web interface of each public repo and check the author line on a few commits from different dates. Sometimes the profile looks clean, but older commits still show the raw email when you open the full commit page or patch view.
A direct search helps too. Search the repo history for the exact email string, including old work, school, or custom-domain addresses you no longer use. If you changed emails more than once, search each version. One forgotten side project is enough for a scraper to grab it.
Then widen the check. Review archived repos you stopped thinking about, forks under your account, and forks other people made from your repo. Open tags, release pages, patch views, commit details, and any downloadable source snapshots tied to old commits.
Forks matter more than most people expect. If your old email is still in a fork that points to your commits, rewriting only the main repo may not cover the public trail. Archived repos can be just as easy to overlook. They feel tucked away, but they are often still public and searchable.
Keep one plain note as you go. Write down each repo name, where the address appears, and whether it shows up in a commit, tag, release, or fork. It is boring, but it saves time when you start the cleanup.
A simple reality check helps. Open the repo while signed out, click a few old commits, and see what jumps out first. If you can spot the address in under a minute, a scraper can too.
How to stop new commits from leaking it
Rewriting old commits helps, but it does not fix the next commit you make. If Git still has your old personal address in its settings, the leak starts again the moment you push.
Start with the repo you are using now. Check which email Git will attach to a new commit. If it is wrong, change it at the repo level first.
git config user.email "[email protected]"
git config user.name "Your Name"
That covers one project. Now check your global Git config too. Old global settings are a common reason people think they fixed the problem when they did not. This matters even more on shared machines, old laptops, and work computers you have used for years.
git config --global user.email
git config --global user.name
If those values still point to a personal inbox, update them. If you use one machine for both work and personal code, repo-level settings are usually safer than one global default.
A no-reply address from your code host is often the cleanest choice. It keeps your real inbox out of commit metadata while still letting the host connect commits to your account. For GitHub commit email privacy, this is one of the easiest fixes.
It also helps to separate identities on purpose. Use one email for work repos and another for personal projects, or keep personal work on a separate account. Mixing everything under one address is convenient for a week and annoying for years.
Before you push real work, make a tiny test commit. Change one line in a throwaway file, commit it, and inspect the author email locally:
git log -1 --format='%an \u003c%ae\u003e'
If the address looks right, you are set. If it does not, fix it before you push anything public. That quick check takes less than a minute, and it is much easier than cleaning up old commit emails later.
How to rewrite old commits step by step
If you need to remove an old email from Git history, do it in a copy of the repo first, not your only working folder. Make a full backup before you touch anything. A mirror clone is the safest option, and a second local copy is better than nothing. Pause team pushes for a bit too, because rewriting history changes commit IDs.
Next, pick the email you want to keep everywhere. Use the same replacement address for both the author and committer fields, or the cleanup can end up half finished.
A simple way to rewrite old commits is to use git filter-repo with a mailmap file:
echo "Your Name \[email protected]\u003e \[email protected]\u003e" \u003e mailmap.txt
git filter-repo --mailmap mailmap.txt
If you had more than one old address, add one line per address. Then work through the cleanup in this order:
- Run the rewrite on your backup copy, not on the repo you use every day.
- Replace every old address you want gone, not just the first one you remember.
- Check the result locally before you push anything.
git log --all --format='%an \u003c%ae\u003e | %cn \u003c%ce\u003e' | sort -u
- If the old email is gone, force push carefully with lease protection.
git push --force-with-lease --all
git push --force-with-lease --tags
- Open the public repo again after the push and confirm that the old address no longer appears in the visible history.
This part trips people up: your teammates still have the old history on their machines. Tell them not to merge old branches into the rewritten repo. The clean fix is to re-clone, or to reset the local copy to the new remote history if they know exactly what they are doing.
A careful rewrite takes a little time, but it is much better than leaving a personal email in public commits for scrapers to collect.
What can remain after the rewrite
Even after you clean up the history you control, old copies can hang around. A rewritten repo changes the public trail on your side, but it does not erase every copy made before the fix.
The most common problem is forks. If someone forked your repo before you changed the author email, that fork may still contain the old commits and the old address. The same goes for mirrors on other hosting sites and personal backups that were pushed somewhere and then forgotten.
Copies outside your repo
Search engines can lag behind. A cached result or old snippet may keep showing your email for days or weeks, even after the commit page is gone or updated. Usually that fades with time, but it is a real gap if you are trying to stop scraping quickly.
Downloaded archives do not update themselves either. If someone cloned the repo, grabbed a ZIP, or exported commit data before the rewrite, that copy stays as it was. Public datasets, code indexes, and research archives can hold onto old metadata for a long time.
Another easy one to miss is the repo you forgot about. A side project, an old employer test, a student repo, or a duplicate under another account can still be public somewhere. Cleaning one repo does not help much if the same email is exposed in three older ones.
And broker sites are a separate problem. If a broker scraped your email from commits before the cleanup, rewriting history will not remove it from that database. You have to deal with those listings separately.
What to check next
After the rewrite, do a quick sweep:
- Look for forks and mirrors that still hold the old commits.
- Search the old email together with your username and repo names.
- Check inactive or archived repos under every account you used.
- See whether people-search sites already list that address.
A cleanup is not really done when the main repo looks fixed. It is done when the old email stops showing up in public searches.
Mistakes that make cleanup harder
The technical part is only half the job. Cleanup usually fails for simple reasons. People fix the main repo, then forget the old test repo, a fork, or a side project that still shows the same address.
That matters because scrapers do not care which repo is active. If your old email is still public anywhere, it can still be copied, matched to your name, and passed around.
A few mistakes show up again and again. Someone rewrites one repository and forgets older personal projects, archived repos, or mirrors on another host. Someone else cleans the main branches but leaves tags untouched, so the old author email still points to public snapshots. Another person rewrites months of history, then makes a new commit with the same old address because the global Git config still has it.
Force pushes cause their own problems when collaborators are not warned first. One teammate can later push stale refs back to the remote and bring the old commits with them. Hosts can also take time to refresh cached commit pages, search results, and profile associations, so the first clean-looking page is not always the end of the story.
Tags are easy to miss. People focus on branches because that is where daily work happens, but tags often mark releases, and those releases get copied, downloaded, and indexed. One forgotten tag can undo a lot of careful cleanup.
The global config mistake is even more common. You rewrite months of history, feel done, and then your next laptop commit uses the old email again. That single commit can reconnect the address to your public profile.
Do a second pass after things settle. Check the repo, tags, commit views, and profile pages again. It is a little tedious, but it saves more time than a rushed first pass.
A simple example
Maya used her college email in public commits in 2019. At the time, it felt harmless. She was pushing class projects, a small portfolio site, and a few practice repos. Every commit carried the same address in the author field.
A few years later, that old repo still showed up in search results for her name. The commits were public, the email was still there, and the address had started appearing in places it should not. A broker could match it with an old student profile, a resume posted elsewhere, and a people-search listing. One old commit trail turned into an easy identity link.
She fixed the problem in the right order. First, she changed the email used for all new commits so the leak stopped growing. Then she rewrote the old history to replace the college address with a safer one, or with a no-reply address from the code host.
That did not erase every trace overnight. Search results and mirrors can lag behind. But it cut off the public source, which matters if you want fewer scrapers picking it up.
After the rewrite, Maya searched the old address directly. She checked code hosting profiles, cached search snippets, and broker listings tied to the same email. That last step is easy to skip. If a broker already copied the address, fixing the repo alone will not clean up the rest.
A practical cleanup checklist
If you are doing this now, keep the process simple.
- Check your local Git config and your code host profile so new commits use the right address. Make one test commit in a throwaway repo and inspect the author email before you keep working.
- Review every public repo you control, not only the one you use most. Old demos, archived repos, forks, and organization repos often get missed.
- Look at tags, release snapshots, mirrors, and any secondary hosting account. Cleaning the main branch does not fix an old tag that still points to the bad email.
- Tell collaborators that history changed before they push again. One person pushing an old clone can bring the old commits back.
- Save proof of the cleanup. Keep a dated note with the repos checked, the old and new commit hashes, screenshots if needed, and any support messages tied to cache or mirror cleanup.
A quick example shows why the extra checks matter. Say you rewrote a GitHub repo and your profile now shows a no-reply address. Good start. But if an old tag on a mirror still exposes your personal email, a scraper can still pick it up.
Keep your notes for a while. If the address later shows up on a broker site, you will have a better sense of whether it came from Git history, another public account, or an older scrape.
What to do next
Fixing the Git trail is the big step, but it should not be the last one. An old address can keep showing up in copied profiles, cached pages, and broker listings long after you clean up the repository history.
A simple follow-up routine usually works. Set a reminder every few months to review older repos, especially the ones you have not touched in years. Search for the old email in code hosting profiles, package registries, personal sites, and forum accounts where you may have reused it. If you still control those pages, replace the old contact detail there too. One reused detail can keep the whole trail alive.
This is also where the broker side becomes its own task. Git cleanup removes the public source you control, but it does not pull your information out of broker databases that already copied it. If that has happened, Remove.dev handles that separate step by finding personal data across more than 500 data brokers, sending removal requests, and monitoring for relistings while you keep the code side clean.
The goal is not perfection in one afternoon. It is stopping the obvious leak, then checking often enough that the old address stays buried. A 10-minute review every few months is much easier than cleaning up years of exposure later.