Keeping Your Cache Down to Size

Guava’s CacheBuilder provides several options for keeping a cache down to a reasonable size. Entries can expire for multiple reasons, entries can be limited to a certain number, and entries can be made available for garbage collection. These options can be used individually or together. Each is simply one more method call during cache creation.

When should we use each alternative? It depends on our objective.

General Memory Conservation

If the objective is to avoid running out of memory JVM-wide, then the garbage collector is the right guy for the job. There are three ways to make our entries available for garbage collection. One of them is right, and two are wrong.

Soft Values: this is the right way.

CacheBuilder.newBuilder().softValues().build(cacheLoader)

This sets up a cache which wraps any value it stores in a SoftReference. If no other object holds a strong reference to the value, the garbage collector has the option of freeing this memory. As Brian Goetz put it, “soft references allow the application to enlist the aid of the garbage collector by designating some objects as ‘expendable.'” The key here is that the garbage collector can choose which (if any) soft-referenced objects to collect. The JVM can choose the least recently or least frequently used objects, and it can perform this ranking across all soft-referenced objects, not only the ones in your cache. This makes soft references the optimal choice for overall memory conservation.

When a value is collected, that entry is invalidated in the cache. It may still show up in the total cache.size(), but when cache.get(key) is called, a new value will be loaded and a new entry created.

Now, take my advice: if your goal is to prevent the cache from using all your memory, use softValues() and skip the rest of this blog post. If you are also concerned about frequency of garbage collection, consider maximumSize(), discussed later.

There’s one small caution with soft values: it causes values to be compared with == instead of .equals(). This only impacts the rare person who conditionally removes entries from the cache like this:

cache.asMap().remove(key, value); 

Weak Values: If being in the cache is no reason to keep an object around, you can request weakValues(). This wraps the values in a WeakReference. This means that the next round of garbage collection will sweep up any entries whose value is not referenced elsewhere. These entries get invalidated even if memory is not in short supply. That’s why soft references are better.

The same == comparison caution applies with weakValues as with softValues.

Weak Keys: This means that any entry whose key is no longer referenced elsewhere should be garbage collected and its entry evicted from the cache. If your keys are transient objects expected to fall out of use after a period of time, never to be used again, then weak keys could make sense. I’m not sure why you would base a cache on something like that.

There’s another caveat to weakKeys() — it changes the way the Cache retrieves values. Weak keys are compared with == instead of .equals(). That could have a huge impact on how your map works. When == is the comparison, you can never construct a key; you need a reference to the same key object. With the cache, if you mess this up, you won’t get an error. Instead, the value will be loaded again and a new entry created. The result could be far more entries in the cache than if you hadn’t used weakKeys()!

Note that if your keys are Strings, this == comparison can be even more insidious. Identical hard-coded Strings (like those in your tests) will all reference the same object, and == will work. Strings parsed from input text or files will not be the same reference as any other String, so == will fail and more entries will be loaded into the Cache. What’s more, any hard-coded String will be allocated in PermGen space and not subject to regular garbage collection, so why wrap it in a weak reference? Just don’t do it, people!

Unless your values are smaller than your keys, there’s one more reason to prefer softValues() over weakKeys(). When a key is garbage collected, its entry is invalidated in the cache. However, the entry isn’t fully cleared until the cache does its cleanup activities. That won’t happen until (at the earliest) some call or other is made to the cache. This could delay freeing of the larger memory consumed by the value. Of course, if you’re enough of a memory miser to try using weakKeys(), you probably used weakValues() too, so the relative size of keys and values is not a concern.

Restrict Memory Consumption by the Cache

If your goal is to keep this particular cache down to size, then limiting the maximum size of the cache could make sense. Soft values will prevent the cache from running you out of memory, but it will let your cache take up as much memory as you have, which means more frequent garbage collection and slower overall performance. It is therefore useful to limit the overall size of the cache.

CacheBuilder.newBuilder().maximumSize(100).build(cacheLoader)

That’s all it takes. Your cache.size() will never exceed 100. Entries may be evicted from the cache before the size limit is reached; the cache will start clearing out old, less-used entries as soon as it gets close to the limit.

Get Old Data Out

If your values are perishable, then expire them. Expiration occurs with a fixed time limit, either from last write (when the value was loaded) or last access (when get(key) was called). For instance, if your reference data changes rarely but not never, you might want to fetch it every hour or so.

CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build(cacheLoader)

I’m not sure when you would prefer expireAfterAccess. Perhaps your data only changes when you’re not looking?
Remember that if your objective is to keep less-recently-used objects from taking up space, softValues() is superior.

To programmatically expire objects from the cache, call cache.invalidate(key) or cache.invalidateAll(). This will trigger a new load at the next cache.get(key).

When values expire or are invalidated, it doesn’t mean they’re immediately removed. It means that any cache.get(key) will result in a miss and a reload. However, cache.size() might show those entries hanging around. The cache will remove these entries thoroughly during periodic maintenance, which can occur at any cache access. To trigger the maintenance explicitly, call cache.cleanUp().

There are a few useful diagnostic tools to evaluate performance of your cache. This is useful for tuning your maximum size and expiration times. Check out Cache.stats() and removalListener if you want to test or optimize.

No Really, Just Use softValues()

CacheBuilder gives you all kinds of ways to keep your cache under control. It is smarter than your average Map.

Goodbye, MapMaker. Hello, CacheBuilder.

Google has released a new version of Guava, and it’s bad news for one of my favorite classes, MapMaker. The exciting feature of MapMaker is its ability to produce an insta-cache, a key-value mapping that calculates values on the fly, expires them, and limits the quantity stored — all while maintaining thread safety.

In release ten, Guava added CacheBuilder. CacheBuilder produces a Cache: a key-value mapping that calculates values on the fly, expires them, and limits the quantity stored — all while maintaining thread safety.

Wait, that sounds exactly like MapMaker. Poor MapMaker — its methods are now deprecated. The king is dead, long live the king!

Here’s a quick look at what CacheBuilder does:


final Cache cache =
CacheBuilder.newBuilder()
.expireAfterWrite(10, TimeUnit.MINUTES)
.maximumSize(50)
.build(CacheLoader.fromFunction(retrieveInterestingInfoByKey);

Here, retrieveInterestingInfoByKey is a Function that goes to the database, calls a service, or performs some calculation to get the value for a key.

What does this cache do?

The first time cache.get(key) is called, the cacheLoader loads the value, stores the entry in its map, then returns the value. If two threads make this call at the same time, one of them loads the value, and the other one blocks until the value is available. Values are never loaded more times than necessary.

If cache.get(key) is called more than ten minutes after that entry was created, the value will be loaded again. When the size of the cache gets close to the maximum, then the older, less-used entries are evicted.

What is this cache good for? Reference data, certainly. Any time you have data that’s loaded over and over again, but doesn’t change nearly as often as it’s loaded. The whole process can share a single cache. It’s a good candidate for a singleton Spring bean.

What is this cache not good for? Anything you want to cache across processes or machines. Anything you want to update. (You can evict entries from the cache, but not put them in.) Anything you want persisted.

I could write whole posts about exception handling in the load, testing and tuning hooks, and memory use optimization of the Cache. But those are decorations on the cake. The real point of CacheBuilder is how quickly and easily you can get a map that gets you the information you want, when you want it, without hitting the database every single time.

MapMaker, your name was fun to say, but we won’t miss you now that we have CacheBuilder.