Getting off the ground in Elm: project setup

*Update*: Deprecated. These days, to create a new Elm project, I use rug, as described in this post: https://blog.jessitron.com/2016/12/using-rug-with-elm.html

If you have tried Elm and want to make a client bigger than a sample app, this post can help you get set up. Here, find what goes into each of my Elm repositories and why. This template creates a fullscreen Elm application with the potential for server calls and interactivity. This post is up-to-date as of Elm 0.16.0 on 11/21/2015, on a Mac.

TL;DR: Clone my sample repo; Change CarrotPotato to your project name everywhere it appears (elm-package.json; src/CarrotPotato.elm; index.html). Replace origin with your remote repository.

Step 0: Install Elm (once)

Run the latest installer.
To check that this worked, run `elm` at a terminal prompt. You should see a long usage message, starting with `Elm Platform 0.16.0 – a way to run all Elm tools`

Bonus: getting Elm text highlighting in your favorite text editor is a good idea. That’s outside the scope of this post, because it was hard. I use Sublime 2 and this Elm plugin.

Step 1: Establish version control (every project)

Step 1A: create a directory and a repository. 

Make a directory on your computer and initialize a git repository inside it.

mkdir CarrotPotato
cd CarrotPotato
git init

 Step 1B: configure version control

In every project, I use the first commit to establish which files do not belong under version control.
I’m going to have the Elm compiler write its output to a directory called target. I want to save the source code I write, not stuff that’s generated from it, so git should not save the compiler output. Git ignores any files or directories whose names are in a file called .gitignore, so I put target in there.
The Elm package manager uses a directory called elm-stuff for its work. That doesn’t belong in our repository, so put it in .gitignore too. I recommend making .gitignore the first file committed in any new repository.

echo “target” >> .gitignore
echo “
elm-stuff” >> .gitignore
git add .gitignore
git commit -m “New Elm project”

Step 2: Bring in core dependencies

The Elm package manager will install everything you need, including the core language, including the configuration it needs. To bring in any dependency, use `elm package install `, where is specified as github-user/repo-name. Most of the packages come from github users elm-lang or evancz (Evan Czaplicki is the author of Elm). All the packages that elm-package knows about are listed on package.elm-lang.org.

In keeping with the Elm Architecture, I use StartApp as the basis for all my projects. Bring it in:

elm package install evancz/start-app

elm-package is very polite: it looks at your project, decides what it needs to do, and  asks nicely for permission before doing anything. It will add the dependency to elm-package.json (creating the file if it doesn’t exist), then install the package you requested (along with anything that package depends on) in a directory called elm-stuff.

Here’s a gotcha: the StartApp install downloads its dependencies, but you can’t use them directly until they are declared as a direct dependency of your project. And you can’t actually use StartApp without also using Effects and Html. So install them too:

elm package install evancz/elm-html
elm package install evancz/elm-effects

Note: This step won’t work without internet access. Elm’s package manager doesn’t cache things locally; everything is copied into elm-stuff within each project. On the upside, you can dig around in elm-stuff to look at the code (and embedded documentation) of any of your project’s dependencies.

Step 3: Improve project configuration

3A: Welcome to elm-package.json

You now have an elm-package.json file in your project directory. Open it in your text editor.

{
    “version”: “1.0.0”,
    “summary”: “helpful summary of your project, less than 80 characters”,
    “repository”: “https://github.com/user/project.git”,
    “license”: “BSD3”
,
    “source-directories”: [
        “.”
    ],
    “exposed-modules”: [],
    “dependencies”: {
        “elm-lang/core”: “3.0.0 <= v < 4.0.0",
        “evancz/start-app”: “2.0.2 <= v < 3.0.0
    },
    “elm-version”: “0.16.0 <= v < 0.17.0"
}

The project version, summary, etc. become crucial when you publish a new library to the central Elm package list. Until then, you can update them if you feel like it.

Note: the project’s dependencies are specified as ranges. Elm is super specific about semantic versioning. It is impossible for one of the libraries you use to introduce a compilation-breaking change without going up a major version (the first section in the version number), so Elm knows that (for instance) any version of StartApp that’s at least as high as its current one “2.0.2” and less than the next major version “3.0.0” is acceptable. This matters if you publish your project as a library for other people to use. For now it’s just cool.

3B: Establish a source directory

With the default configurtion, Elm looks for project sources in “.” (the current directory; project root). I want to put them in their own directory, so I change the entry in “source-directories” to “src”. Then I create a directory called `src` in my project root.

mkdir src
[editor] elm-package.json

and set:

“source-directories”: [
       
“src”
    ],

Step 4: Create the main module

4A: Bring in “hello world” code

Create a file src/CarrotPotato.elm (if the name of your project is CarrotPotato), and open it in your text editor.

touch src/CarrotPotato.elm
[editor] 
src/CarrotPotato.elm

Every StartApp application starts about the same. I cut and paste most of this out of the StartApp docs, then added everything necessary to make it compile. It had to do something, so it outputs Hello World in an HTML text element.

Copy from this file, or this gist.

To understand this code, do the Elm Architecture Tutorial. (It’s a lot. But it’s the place to go to understand Elm.)

4B: compile the main module

I want this compiled into a JavaScript file in my `target` directory, so this is my build command:

elm make –output target/elm.js src/CarrotPotato.elm

When this works, a target/elm.js file should exist.

Note: by default, elm-make (v0.16) creates an index.html file instead of elm.js. That’s fine for playing around, but in any real project I want control over the surrounding HTML.
Note: I ask elm-make to build the top-level module of my project. Once I add more source files, elm-make will compile all the ones that my top-level module brings in.

To remind myself of how to do this correctly, I put it in a script:

echo “elm make –output target/elm.js src/CarrotPotato.elm” >> build
chmod u+x 
build

Then every time I want to compile:

./build

Step 5: Run the program in a web page

Elm runs inside a web page. Let’s call that page index.html because that’s the default name for these things. Create that file and put something like this into it:

touch index.html
[editor] 
index.html

put this in:


 
  CarrotPotato
  <script type="text/javascript" src="target/elm.js“>




  var app = Elm.fullscreen(Elm.
CarrotPotato, {});Elm.CarrotPotato.fullscreen();

The important parts here are:

  • in the header, set the page’s title
  • in the header, bring in the compiler output; this matches the file I told elm-make to write to
  • in the header, you’re free to bring in CSS
  • the body is empty
  • the script tag at the end activates my Elm module. (The strikethrough is Elm 0.16; the correction is Elm 0.18)

Save this file, and open it in your default browser:

open index.html

You should see “Hello World”. Quick, make a commit!

Note: opening index.html as a file doesn’t always work smoothly. If the browser gives you trouble, try running an http server in that directory instead. There’s a very easy one available from npm.

Step 6: Go forth and Elminate

The foundation is set for an Elm project. From here, I can start building an application. Here are some things I often do next:

  • change the view function to show something more interesting. see elm-html for what it can retusrn.
  • make a git repository, push my project to it; update this in elm-package.json, and create a README.md
  • create a gh-pages branch to serve my project on the web (blog post on this coming soon, I hope)
  • break out my project’s functionality into more modules, by creating files like src/CarrotPotato/View.elm and importing them from my main module

You can get everything up to this point without doing it yourself by cloning my elm-sample repo.
I do this:

git clone git@github.com:jessitron/elm-sample.git carrot-potato
< create repo at github called my-user/carrot-potato; copy its git url>
cd carrot-potato
git remote set-url origin git@github.com:my-user/carrot-potato.git
 

Comments and suggestions welcome! I’m sure this isn’t the most optimal possible setup.

git: handy alias to find the repository root

To quickly move to the root of the current git repository, I set up this alias:

git config –global alias.home ‘rev-parse –show-toplevel’

Now,  git home prints the full path to the root directory of the current project.
To go there, type (Mac/Linux only)

cd `git home`

Notice the backticks. They’re not single quotes. This executes the command and then uses its output as the argument to cd.

This trick is particularly useful in Scala, where I have to get to the project root to run sbt compile. (Things that make me miss Clojure!)

BONUS: handy alias to find the current branch

git config –global alias.whereami “rev-parse –abbrev-ref HEAD”

As in,

git push -u origin `git whereami`

Your Code as a Crime Scene: book review

What can we learn about our projects with a little data science and a lot of version control?
Locate the most dangerous code! Find where Conway’s Law is working against us! Know who to talk to about a change you want to make! See whether our architectural principles are crumbling!

Adam Tornhill’s book shows both how and why to answer each of these questions. The code archaeology possibilities are intriguing; he shows how to get raw numbers and how to turn them into interactive graphs with a few open-source scripts. He’s pragmatic about the numbers, reminding the reader what not to use them for. For instance: “the value of code coverage is in the implicit code review when you look at uncovered lines.” Trends are emphasized over making judgements about particular values.

Even better are Adam’s expansive insights into psychology, architecture, and the consequences of our decisions as software engineers. For instance: we know about the virtues of automated tests, but what about the costs? And, what is beauty in code? (answer: lack of surprise)

There’s plenty of great material in here, especially for a developer joining an existing team or open-source project, looking to get their mind around the important bits of the source quickly. I also recommend this book for junior- to mid-level developers who want to learn new insight into both their team’s code and coding in general. If you want to accelerate your team, to contribute in ways beyond your own share of the coding, then run Adam’s analyses against your codebase.

One word of caution: it gets repetitive in the intro and conclusion to the book as a whole and each section and each chapter. Whoever keeps repeating “Tell them what you’re going to tell them, tell them, tell them what you just told them,” can we please get past that now??

A few factoids I learned today from this book:
– Distributed teams have the most positive commit messages
– Brainstorming is more effective over chat than in an in-person meeting

and when it comes to the costs of coordination among too-large teams: “The only winning strategy is not to scale.” (hence, many -independent- teams)

Gaining new superpowers

When I first understood git, after dedicating some hours to watching a video and reading long articles, it was like I finally had power over time. I can find out who changed what, and when. I can move branches to point right where I want. I can rewrite history!

Understanding a tool well enough that using it is a joy, not a pain, is like gaining a new superpower. Like I’m Batman, and I just added something new to my toolbelt. I am ready to track down latent bug-villains with git bisect! Merge problems, I will defeat you with frequent commits and regular rebasing – you are no match for me now!
What if Spiderman posted his rope spinner design online, and you downloaded the plans for your 3D printer, and suddenly you could shoot magic sticky rope at any time? You’d find a lot more uses for rope. Not like now, when it’s down in the basement and all awkward to use. Use it for everyday, not-flashy things like grabbing a pencil that’s out of reach, or rolling up your laptop power cable, or reaching for your coffee – ok not that! spilled everywhere. Live and learn.
Git was like that for me. I solve problems I didn’t know I had, like “which files in this repository haven’t been touched since our team took over maintenance?” or “when was this derelict function last used?” or “who would know why this test matters?”
Every new tool that I master is a new superpower. On the Mac or linux, command-line utilities like grep and cut and uniq give me power over file manipulation – they’re like the swingy grabby rope-shooter-outers. For more power, Roopa engages Splunk, which is like the Batmobile of log parsing: flashy and fast, doesn’t fit in small spaces. On Windows, Powershell is at your fingertips, after you’ve put some time in at the dojo. Learn what it can do, and how to look it up – superpowers expand on demand! 
Other days I’m Superman. When I grasp a new concept, or practice a new style of coding until the flow of it sinks in, then I can fly. Learning new mathy concepts, or how and when to use types or loops versus recursion or objects versus functions — these aren’t in my toolbelt. They flow from my brain to my fingertips. Like X-ray vision, I can see through this imperative task to the monad at its core.
Sometimes company policy says, “You may not download vim” or “you must use this coding style.” It’s like they handed me a piece of Kryptonite. 
For whatever problem I’m solving, I have choices. I can kick it down, punch it >POW!< and run away before it wakes up. Or, I can determine what superpower would best defeat it, acquire that superpower, and then WHAM! defeat it forever. Find its vulnerability, so that problems of its ilk will never trouble me again. Sometimes this means learning a tool or technique. Sometimes it means writing the tool. If I publish the tool and teach the technique, then everyone can gain the same superpower! for less work than it took me. Teamwork!
We have the ultimate superpower: gaining superpowers. The only hard part is, which ones to gain? and sometimes, how to explain this to mortals: no, I’m not going to kick this door down, I’m going to build a portal gun, and then we won’t even need doors anymore.
Those hours spent learning git may have been the most productive of my life. Or maybe it was learning my first functional language. Or SQL. Or regular expressions. The combination of all of them makes my unique superhero fighting style. I can do a lot more than kick.

Spring cleaning of git branches

It’s time to clean out some old branches from the team’s git repository. In memory of them, I record useful tricks here.

First, Sharon’s post talks about finding branches that are ripe for deletion, by detecting branches already merged. This post covers those, plus how to find out more about the others. This post is concerned with removing unused branches from origin, not locally.

Here’s a useful hint: start with

git fetch -p

to update your local repository with what’s in origin, including noticing which branches have been deleted from origin.
Also, don’t forget to

git checkout mastergit merge –ff-only

so that you’ll be on the master branch, up-to-date with origin (and won’t accidentally create a merge commit if you have local changes).

Next, to find branches already merged to master:

git branch -a –merged

This lists branches, including remote branches (the ones on origin), but only ones already merged to the current branch. Note that the argument order is important; the reverse gives a silly error.  Here’s a one-liner that lists them:

git branch -a –merged | grep -v -e HEAD -e master | grep origin | cut -d ‘/’ -f 3- 

This says, find branches already merged; exclude any references to master and HEAD; include only ones from origin (hopefully); cut out the /remotes/origin/ prefix.

The listed branches are safe to delete. If you’re brave, delete them permanently from origin by adding this to the previous command:

 | xargs git push –delete origin

This says, take all those words and put them at the end of this other command, which says “delete these references on the origin repository.”

OK, those were the easy ones. What about all the branches that haven’t been merged? Who created those things anyway, and how old are they?

git log –date=iso –pretty=format:%an %ad %d-1 –decorate

is a lovely command that lists the author, date in ISO format (which is good for sorting), and branches and tags of the last commit (on the current branch, by default).

Use it on all the branches on origin:

git branch -a | grep origin | grep -v HEAD | xargs -n 1 git log –date=iso –pretty=format:”%an %ad %d%n” -1 –decorate | grep -v master | sort

List remote branches; only the ones from origin; exclude the HEAD, we don’t care and that line is formatted oddly; send each one through the handy description; exclude master; sort (by name then date, since that’s the information at the beginning of the line).

This gives me a bunch of lines that look like:

Shashy 2014-08-15 11:07:37 -0400  (origin/faster-upsert)
Shashy 2014-10-23 22:11:40 -0400  (origin/fix_planners)
Shashy 2014-11-30 06:50:57 -0500  (origin/remote-upsert)
Tanya 2014-10-24 11:35:02 -0500  (origin/tanya_jess/newrelic)
Tanya 2014-11-13 10:04:48 -0600  (origin/kafka)
Yves Dorfsman 2014-04-24 14:43:04 -0600  (origin/data_service)
clinton 2014-07-31 16:26:37 -0600  (origin/warrifying)
clinton 2014-09-15 13:29:14 -0600  (origin/tomcat-treats)

Now I am equipped to email those people and ask them to please delete their stray branches, or give me permission to delete them.

Removing files from git

TL;DR – “git rm –cached ” means “Yo git, as far as you know, this file is gone”

In a git repo, there are three places for files to be:
1) In your working directory
2) In the staging area
3) In the most recent commit.

Getting rid of a file means moving it out of all 3 places.
1) by deleting it
   rm
2) by removing it from the staging area:
   git rm
3) by committing that removal.
   git commit -m “die stupid file die”

When you want the file to remain on your filesystem but NOT in the repo, then tell git to ignore it. But that isn’t enough! You also have to get it out of the repo.

1) tell git to ignore it: add the file or directory name to .gitignore[1]
2) get it out of the staging area BUT NOT the working directory:
  git rm –cached
3) If the file has been committed before, commit that removal, along with your .gitignore changes:
  git add .gitignore
  git commit -m “hide stupid file hide”

Git etiquette: package the .gitignore updates along with the removal of the newly-ignored files in one commit.

Warning: if you ever check out a commit that doesn’t have that file in .gitignore, whatever’s in the commit will overwrite your current one. No warnings. I hope this was some sort of build output that you can regenerate.

Sometimes the file has never been committed, but it was accidentally added to the staging area, and now you want git to leave you alone and ignore that file already!

Delete the file and remove it from the staging area in one easy step:
1) git rm -f pee

Or keep it, and tell git to leave it the freak alone:
1) tell git to ignore it: add the file or directory name to .gitignore
2) get it out of the staging area BUT NOT the working directory:
  git rm –cached

Terminology: the “staging area” is also called the “index” and the “cache,” for historical reasons.

If this seems complicated… yeah, I agree. If you know that “git rm –cached ” means “Yo git, take this file out of you,” that’ll get you through most of the frustration.

—————
[1] For more ways to ignore files, and when to use each: http://jessitron.github.io/git-happens/ignore.html

git: I want my stash back

TL;DR – it’s a good idea to throw experimental changes on a branch instead of stashing them.

Today I stashed some changes, then popped them out, then decided they were a failure and wiped them out, then (later) wanted them back.

git stash makes a commit, and commits are not deleted for thirty days, so there had to be a way to get them back.
git fsck finds lost references. But only ones that are good and thoroughly lost, not stashed or listed in a reflog. To find all the references that are not in the history of some named commit, try
git fsck –unreachable > temp_file
Stick the output into a temp file because this takes a long time.
Then to find the commit I was looking for:
cat temp_file | grep commit | cut -d ‘ ‘ -f 3 | xargs git show –name-only
where 
cat temp_file reads the temp file
grep commit chooses only the commits out of all the unreachable objects found by git fsck
cut -d ‘ ‘ -f 3 prints only the third field, using space as a delimiter. This gives me the commit hash
xargs puts the input lines on the end of the argument list
git show prints information about any object it git; for commits it shows the git log output and the diff
–name-only narrows the diff output to only the changed filenames
That gave me a whole slew of commit descriptions, opened in less the same way git log works. I searched for the filename of interest
/Stuff
(forward slash is the less command for search; push n to go to the next one)
and then moved up (arrow key) to see the description of that lost stash-entry commit, along with its commit hash.
commit 2b16061dba6ba8e529e56f53544c84ba432ed7be
Author: Jessitron
Date:   Wed Nov 13 11:06:19 2013 -0600

    index on develop: 776a776 Debug logging for claiming

src/main/scala/poo/Stuff.scala
src/test/scala/poo/Stuff_UT.scala
Finally, I can check that out and create a temporary branch, which is what I should have done in the first place.
git checkout commit_hash
git branch crazy-changes
Now these experimental changes can hang out, out of my way and easily reachable, until I’m good and done with them and delete the branch.
git branch -d crazy-changes

Question: if I can only find this with fsck –unreachable, that implies that this commit is still recorded somewhere. Anyone know where that is?

Git: Checkout multiple branches at the same time

With git, source code lives in a single working directory. When we switch from one branch to another, git rearranges our files for us. Usually this keeps things simple, but now and then I wish for two copies of the code.

Do you ever want to run a big fat suite of tests in one branch while working on code in another branch? I do! This requires two different versions of the code on the filesystem at the same time.

Before you go making another copy of the whole repository, consider this solution: write a version of the for-testing branch to a temporary place outside of your working directory. Run the tests over there, leaving the real local repository available for work as usual.

1. Create a temporary directory. Maybe /Users/me/somewhere
2. Go to your git repository, to the project root directory. Now run this command:

git archive branchypoo | tar –xC /Users/me/somewhere

git archive takes the whole source tree from branchypoo, tars it up, and writes the result to stdout.
tar –x extracts the files from stdin
-C means “do the extraction over in this directory.”

Now you’re clear to run tests over in /Users/me/somewhere, and your local repository is ready for real work.

Thanks to JørgenE for this trick!

——–

The original post used three steps:

0. Commit or stash any changes you have lying around. (important!)

1. Create a temporary directory. Maybe /Users/me/some/where/else

2. Go to your git repository, to the project root directory. Now run this crazy command, and don’t forget the dot at the end:

git –work-tree=/Users/me/some/where/else checkout branch-of-interest .

Here,
–work-tree=/Users/me/some/where/else tells git to act like your working directory is in some other location.
checkout writes the specified version of the specified files into your (modified) working directory. Note that checkout behaves differently when given a filepath; if you leave that out, checkout will change your current branch and not give you all the files.
branch-of-interest is a branch name, or any commit identifier.
says “I’m done naming commits, here come some paths to files”
. chooses the current directory, including all subdirectories and so on.

The whole command means “Write all the files, as they currently exist on branch-of-interest, into this directory over here (and my staging area).” The staging-area bit is unfortunate. If there’s a way to tell checkout to skip that, I’d like to know it.

3. Fix your staging area. git status will tell you that you have a bunch of changes to be committed, and a bunch of other changes. Tell it to forget about that stuff in the staging area.

git reset

Now you’re clear to run tests over in /Users/me/some/where/else, and your local repository is ready for real work.

Converting from svn to git: salvaging local branches

If some team members use git-svn locally, they might have a local working branch. When the team moves to a central git repository, that work needs to come with them.

In the old git-svn repo:

1) Find the revision where the local branch of interest branched off master:

git svn find-rev `git merge-base $branch master`

Here, master corresponds to subversion’s trunk and $branch to the local branch we want to save.

merge-base says “Tell me the last commit these two branches have in common.” It gives back the label (hash) of the commit. To learn more about it, run git show

svn find-rev says “Tell me the svn revision number for this git commit.” You’ll get back a number, such as 3456.

2) Create patch files for every commit on the branch:

git format-patch master..$branch

format-patch will produce a bunch of .patch files in your local directory, one for each commit between master and the tip of your branch. They start with numbers, in the order the commits were made, so that they can be applied in that order later.

In the new git repository:

After cloning the new git repository (goodbye svn!)[1]

1) Find the commit that corresponds to the one where the branch started.

git log :/@3456

where 3456 is the revision number, @ is a symbol that happens to be before the revision in the commit log, and git log :/ means “show me the log starting from a commit with this text in its message.”

You might have to search for your revision number to find the right one, in case that number happens to appear in other commit messages.

Copy the hash of the commit you find.

2) Check out that commit

git checkout

Now you’re in detached head state!

3) Create the branch here, and check it out

git checkout -b $branch

Whew, head is attached again.

4) Apply the patches

git am ../path-to-old-git-svn-repo/*.patch

Now your branch exists on the new repository, as if you’d always been working from there.

————-
[1] If you’re not moving the branch from one git-svn repo to another, then find the destination commit like this:

git svn find-rev r3456

where 3456 is the revision number where the branch starts.

Finding and removing large files in git history

Sometimes it behooves one to erase commits of large binaries from history. Binaries don’t play nicely in git. They slow it down. This post will help you find them and remove them like they were never there.

In svn, people only download the latest version of the repository, so if you commit a large binary, you can delete it in a future commit and the rest of your team is unharmed. In git, everyone who clones a repo gets the entire history, so they’re stuck downloading every version of every binary ever committed. Yuck!

Therefore, the svn-to-git conversion is a good time to delete all the large binaries from history. Do this before anyone has cloned the repository, before you push all the commits to a shared place like bitbucket or github.

Caution: Never alter commits that are in your repo and someone else’s, if you ever plan to talk to their repo again.

Step 1: Identify the large files. We need to search through all of the history to find the files that are good candidates for deletion. As far as I can tell, this is nontrivial, so here is a complicated command that lists the sum of the sizes of all revisions of files that are over a million bytes. Run it on a mac.

git rev-list master | while read rev; do git ls-tree -lr $rev | cut -c54- | grep -v ‘^ ‘; done | sort -u | perl -e ‘
  while () {
    chomp;
    @stuff=split(“\t”);
    $sums{$stuff[1]} += $stuff[0];
  }
  print “$sums{$_} $_\n” for (keys %sums);
| sort -rn >> large_files.txt

Please replace master with a list of all branches you care about.
This command says: List all commits in the history of these branches. For each one, list all the files; descend into directories recursively; include the size of the file. Cut out everything before the size of the file (which starts at character 54). Anything that starts with space is under a million bytes, so skip it. Now, choose only the unique lines; that’s approximately the unique large revisions. Sum the sizes for each filename, and output these biggest-first. Store the output in a file.

If this works, large_files.txt will look something like mine:

186028032 AccessibilityNative/WindowsAccessibleHandler/WindowsAccessibleHandler.sdf
94973848 quorum/installers/windows/jdk-7u21-windows-x64.exe
93300120 quorum/installers/windows/jdk-7u21-windows-i586.exe
84144520 quorum/installers/windows/jdk-7-windows-x64.exe
83345288 quorum/installers/windows/jdk-7-windows-i586.exe
57712115 quorum/Run/Default.jar

Yeah, let’s not retain multiple versions of the jdk in our repository.

Step 2: Decide which large files to keep. For any file you want to keep in the history, delete its line from large_files.txt.

Step 3: Remove them like they were never there. This is the fun part. If large_files.txt is still in the same format as before, do this:

git filter-branch –tree-filter ‘rm -rf `cat /full/path/to/large_files.txt | cut -d ” ” -f 2` ‘ –prune-empty 

This says: Create an alternate universe with a history that looks like , except for each commit, take its files and remove everything in large_files.txt (which contains the filename in the second space-delimited field). Drop any commits which only affected files that don’t exist anymore. Point at this new version of history.

Whew. If this worked, then when you push to a brand-new repository for sharing, those binaries won’t go. Not in the current revision, not in any history. It is like they were never there.

————————-

OH GOD WHAT DID I DO: If you change your mind or mess up, you can undo this operation.
First, look at the history of where your branch has pointed recently:
git reflog

Here’s my output:

→ git reflog bbm2e9429a7 bbm2@{0}: filter-branch: rewrite08d7da5 bbm2@{1}: branch: Created from HEAD

The top line is the filter-branch I just did. The line before that lists the tip of the branch before that crazy filter operation.
I can do git log 08d7da5 to check on it, and git ls-tree 08d7da5 to see what’s in it. (If you want all the files to be listed, then git ls-tree -r 08d7da5.)

When I’m sure I want to undo the filter-branch, then:
git checkout
git reset @{1}

will put the branch riiiight back where it was. If you don’t like the weird @{1} notation, you can use the specific commit name instead, and tell the branch exactly where you want it to be.

It’s important to feel safe to experiment. In git, as long as it was ever committed in the last 30 days, you won’t lose it.