Thursday, October 6, 2011

Keeping Your Cache Down to Size

Guava's CacheBuilder provides several options for keeping a cache down to a reasonable size. Entries can expire for multiple reasons, entries can be limited to a certain number, and entries can be made available for garbage collection. These options can be used individually or together. Each is simply one more method call during cache creation.

When should we use each alternative? It depends on our objective.

General Memory Conservation


If the objective is to avoid running out of memory JVM-wide, then the garbage collector is the right guy for the job. There are three ways to make our entries available for garbage collection. One of them is right, and two are wrong.

Soft Values: this is the right way.
CacheBuilder.newBuilder().softValues().build(cacheLoader)

This sets up a cache which wraps any value it stores in a SoftReference. If no other object holds a strong reference to the value, the garbage collector has the option of freeing this memory. As Brian Goetz put it, "soft references allow the application to enlist the aid of the garbage collector by designating some objects as 'expendable.'" The key here is that the garbage collector can choose which (if any) soft-referenced objects to collect. The JVM can choose the least recently or least frequently used objects, and it can perform this ranking across all soft-referenced objects, not only the ones in your cache. This makes soft references the optimal choice for overall memory conservation.

When a value is collected, that entry is invalidated in the cache. It may still show up in the total cache.size(), but when cache.get(key) is called, a new value will be loaded and a new entry created.

Now, take my advice: if your goal is to prevent the cache from using all your memory, use softValues() and skip the rest of this blog post. If you are also concerned about frequency of garbage collection, consider maximumSize(), discussed later.

There's one small caution with soft values: it causes values to be compared with == instead of .equals(). This only impacts the rare person who conditionally removes entries from the cache like this:
cache.asMap().remove(key, value); 

Weak Values: If being in the cache is no reason to keep an object around, you can request weakValues(). This wraps the values in a WeakReference. This means that the next round of garbage collection will sweep up any entries whose value is not referenced elsewhere. These entries get invalidated even if memory is not in short supply. That's why soft references are better.

The same == comparison caution applies with weakValues as with softValues.

Weak Keys: This means that any entry whose key is no longer referenced elsewhere should be garbage collected and its entry evicted from the cache. If your keys are transient objects expected to fall out of use after a period of time, never to be used again, then weak keys could make sense. I'm not sure why you would base a cache on something like that.

There's another caveat to weakKeys() -- it changes the way the Cache retrieves values. Weak keys are compared with == instead of .equals(). That could have a huge impact on how your map works. When == is the comparison, you can never construct a key; you need a reference to the same key object. With the cache, if you mess this up, you won't get an error. Instead, the value will be loaded again and a new entry created. The result could be far more entries in the cache than if you hadn't used weakKeys()!

Note that if your keys are Strings, this == comparison can be even more insidious. Identical hard-coded Strings (like those in your tests) will all reference the same object, and == will work. Strings parsed from input text or files will not be the same reference as any other String, so == will fail and more entries will be loaded into the Cache. What's more, any hard-coded String will be allocated in PermGen space and not subject to regular garbage collection, so why wrap it in a weak reference? Just don't do it, people!

Unless your values are smaller than your keys, there's one more reason to prefer softValues() over weakKeys(). When a key is garbage collected, its entry is invalidated in the cache. However, the entry isn't fully cleared until the cache does its cleanup activities. That won't happen until (at the earliest) some call or other is made to the cache. This could delay freeing of the larger memory consumed by the value. Of course, if you're enough of a memory miser to try using weakKeys(), you probably used weakValues() too, so the relative size of keys and values is not a concern.

Restrict Memory Consumption by the Cache


If your goal is to keep this particular cache down to size, then limiting the maximum size of the cache could make sense. Soft values will prevent the cache from running you out of memory, but it will let your cache take up as much memory as you have, which means more frequent garbage collection and slower overall performance. It is therefore useful to limit the overall size of the cache.

CacheBuilder.newBuilder().maximumSize(100).build(cacheLoader)


That's all it takes. Your cache.size() will never exceed 100. Entries may be evicted from the cache before the size limit is reached; the cache will start clearing out old, less-used entries as soon as it gets close to the limit.

Get Old Data Out


If your values are perishable, then expire them. Expiration occurs with a fixed time limit, either from last write (when the value was loaded) or last access (when get(key) was called). For instance, if your reference data changes rarely but not never, you might want to fetch it every hour or so.
CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build(cacheLoader)

I'm not sure when you would prefer expireAfterAccess. Perhaps your data only changes when you're not looking?
Remember that if your objective is to keep less-recently-used objects from taking up space, softValues() is superior.

To programmatically expire objects from the cache, call cache.invalidate(key) or cache.invalidateAll(). This will trigger a new load at the next cache.get(key).

When values expire or are invalidated, it doesn't mean they're immediately removed. It means that any cache.get(key) will result in a miss and a reload. However, cache.size() might show those entries hanging around. The cache will remove these entries thoroughly during periodic maintenance, which can occur at any cache access. To trigger the maintenance explicitly, call cache.cleanUp().

There are a few useful diagnostic tools to evaluate performance of your cache. This is useful for tuning your maximum size and expiration times. Check out Cache.stats() and removalListener if you want to test or optimize.

No Really, Just Use softValues()



CacheBuilder gives you all kinds of ways to keep your cache under control. It is smarter than your average Map.

2 comments:

  1. The reason for maximumSize() is that soft values can be alluring but dangerous. Unless capped by the domain, soft references can fill up the heap and trigger more frequent GC cycles. In Hotspot, this clean-up requires a more expensive full GC. This means that you can suffer performance depredations due to unexpected memory pressure. In other words, its not production friendly.

    A bounded size requires more work to monitor the statistics, but provides more deterministic behavior. You can combine it with soft values if desired, so that under load the server can degrade gracefully before throwing an OutOfMemoryError.

    Prior to a maximumSize feature, softValues was recommended because concurrency is handled by the JVM. After an algorithmic redesign [1] to support a concurrent LRU policy, its preferred to favor an explicit bounding when possible.

    [1] http://concurrentlinkedhashmap.googlecode.com/files/ConcurrentCachingAtGoogle.pdf

    ReplyDelete
  2. Ben, thanks a ton! I modified the post to incorporate your advice.

    ReplyDelete