Sunday, October 30, 2011

Setting up an Android project in IntelliJ Idea

And there was pain...

I imported an existing Android project into IntelliJ Idea Community Edition 10.5.1. Some things started complaining "Please select Android SDK."

The help pages are out of date; they say to do this in the Android facet settings. It isn't there.

Here is the secret do-things-right button: Module settings (ctrl-alt-shift-S), then pick Project, and then you need to change the Project SDK (1). To teach Idea where your Android SDK is, click New, and lo! There it is! The Android SDK option!

project settings screenshot


Choose Android SDK, and then select the home directory of your Android SDK installation. It will then ask you to choose a target API level. Now, the Android SDK is set.

The code still didn't compile and the Run task still complained "Please select Android SDK." Strangely, it seemed to have lost the SDK selection. I restarted Idea, opened the project structure options, and chose the Android SDK again. This time in "Edit Run Configurations", I was able to select the virtual device to run on.

If you know a better way to accomplish this setup, please comment.

(1) You could follow these steps at the module level, too.

Friday, October 28, 2011

Debugging Log4J: where are my log messages going?

If you're looking for interesting reading, this is not it.

Today the question emerged: where are this test's log messages going?

Sticking a debug point somewhere and doing Logger.getLogger("blah"), then digging into the resulting Logger object, revealed the answer.

The Logger object had a parent member (a RootLogger), which had a member aai (AppenderAttachableImpl), which had an appenderList, which contained the appender, which contained the filename.

From that, we were able to figure out which log4j.properties file it was using, and remove the errant one from the classpath.

Debugging Hibernate: what database am I connecting to?

If you're looking for interesting reading, skip this post.

If you're trying to find out what database, username, and password Hibernate is using to execute the query, continue. This post is a reference for me, and it might be useful to someone else someday.

Set a debug point anywhere you can get to the Hibernate session. For instance, anywhere a Query is in scope.

The Query has a session, which has a jdbcContext, which has a connectionManager, which has a factory, which has a connectionProvider. Looking at the objects in the debugger, the factory has a settings object which holds the connectionProvider. Or, get to the connectionProvider from the session by executing this (Alt-F8 for execute in Idea):
((QueryImpl) query).getSession().getJDBCContext().getConnectionManager().getFactory().getConnectionProvider()

You may put any expression that returns the session in place of "((QueryImpl) query).getSession()".

This connection provider has a datasource (ds). This is the object you need to explore. Implementations will vary. In my case, the PoolBackedDataSource has a WrapperConnectionPoolDataSource which has a nestedDataSource of type DriverManagerDataSource. Inside here is the jdbcUrl, which shows where it is connection, and a properties member which includes "user" and "password" properties. There's the information I was looking for, with the password in plain text.

Maybe there's an easier way to do this, but this worked.

Wednesday, October 26, 2011

Platforms vs Process

In the IT department where I work, our goals include clarifying and drawing out the processes used and decisions made by the business. Processes and decisions are our competitive advantage.

The strategic direction is to center our products around Business Process Management. BPM models call out the major steps in the business process into charts that people can read and discuss. Meanwhile, the applications we deliver are each called "x Manager", where x is the piece of business the application helps. The focus is workflows and work queues.

It occurs to me that we're building a product, not a platform. We're not only enabling a process: we're enforcing it. If our users worked in a warehouse, this would make sense. But our users are scientists. They have PhDs. They're capable of programming. They're capable of working in a platform.

If we gave our users a platform to support their processes while enabling autonomy and innovation, perhaps they'd be happier with the software. Perhaps they'd have more opportunity for innovation -- innovation that we could capture. When the scientist copies data out of the application and pastes it into Excel to do analysis, that knowledge remains with the scientist. When such analysis capabilities take 18 months of lead time to get programmed into the system (by us), that scientist sticks with Excel.

Is it possible to create a platform that would give them this flexibility? I don't know. It would be a complete change of direction for our department.

Or would it? We're looking at rules engines to give the user a quicker turnaround on improved decision-making. This is in the direction of a platform. However, a one-month turnaround on implementing and testing a new set of rules is not autonomy. It does not give a user the same freedom as Excel on his desktop. It still enforces uniformity company-wide.

If innovation is the competitive advantage, then the closer we get to a platform, the more advantage we have.

What is the competitive advantage in R&D? Processes and decisions, or innovation?

Friday, October 14, 2011

Profiling for Fun and Profit

This week I had a major success improving the performance of our application, thanks to some judicious use of JProfiler. The key was setting the filters correctly. This post will show you how to do that.

First, you need JProfiler. This is a commercial product You can get a ten-day evaluation license for free, and use what you learn to convince your managers to pay for a permanent one. Download the app and unzip it. I have version 7.

Second, create a profiling session. There are several ways to do this, and the documentation is not especially helpful. If I cover by experience in detail, it'll be in a separate post. One way or another, hook up to a JVM. Get the profiling going, because you can't set up the filters properly until it's hooked up. If it asks you to choose between Instrumentation or Sampling, pick Instrumentation. As it points out (in JProfiler7), good filters are critical for instrumentation, and filters are the focus of this post.

Before you get serious with the profiling, give the filters a useful initial setting. This is key to weeding out the garbage. Visit the Session Settings (button along the top toolbar). In there, pick Filter Settings (button along the left toolbar).

filters before

Initially, the filters are set up with some default excludes. Change this -- push the green "+" button on the right, and from the popup menu pick "Add filters from package browser." (This won't work if the profiling session is not connected to an app, because it needs to query the java process for its classpath.)

Now comes the critical part: selecting the classes we care about. It's tempting to pick all kinds of interesting ones, but start small. Choose only the packages you can change. Basically, expand the com node, find your company's name, and check that. If there's another package that holds code controlled by your team or your organization, check that too. Leave everything else unchecked. Focus on what you control!

Click OK on that. On the filters page, make sure that old Excluded filter is gone and only your Included filter (or filters) remains.

filters AFTER

Once you click OK on the Session Settings and select "Apply Now", magic happens. The code in your running application is re-instrumented so that classes selected by your filters record their activities.

Here's the trick about the filters: the instrumentation will record the time spent in your methods (the ones included by the filter) and the time spent in all methods directly called by those methods. That way it knows how to divide the bar into dark red and bright red for all the methods in your code.

Proceed with profiling! You want the CPU View (left-hand toolbar). Start recording CPU if necessary. Make your app do the slow thing it does. Stop CPU profiling.

Now for the delicate bit, where I can't show you screenshots because of improprietary. The CPU view shows you a big tree of where your app spends its time. There are bars next to each method name. The longer the bar, the more time it's spending there. There are two components in the bar: dark-red and bright red. (At the top level, it's all dark red.) Bright-red represents the time spent within this method, and dark-red is the time spent in methods called by this method.

Find the long bar at the top and expand the node. Keep expanding the long bar until you get to something with a significant amount of bright red. Here's your offender.

If the method that's bright red is something low-level like reading a stream, you're not going to make that faster. Move up the stack trace (that is, the big tree of method calls) and look at your own code. Think about how you can do less of whatever operation is taking so long.

If the method that's bright red is higher-level, such as a Spring class, then we might be able to do something about all that slowness. We can get deeper into what it's doing by refining the filters. Say, for example, a lot of time is spent in org.springframework.context.support.ClassPathXmlApplicationContext.refresh(). That's reading a configuration file, so we have control over what it's doing. That method has a whole slew of method calls inside it, any one of which could be slow. To find out more, it's time to refine the filters.

first offending method

Go back to your session settings, filter settings, and add an inclusive filter for the package containing the bright-red class, such as org.springframework.context.support. (Pitfall - if you do this by typing in the name of the package, put a period after the package name. It won't work without it.)

With the new filter settings, do the slow thing again. (You probably need to close your app and start from the beginning.) Now you can drill down further and see where the slowness happens. Repeat if necessary.

iteration 2

In my case, I modified the filters a few more times before the stack trace showed me that the time was being spent loading up files for a component scan. It loaded a few hundred files from the jar, just to choose eight based on annotations. Narrowing the component scan improved application time by 40%, and replacing it with explicit bean definition helps even more. This was a big victory for JProfiler.

Thursday, October 13, 2011

Product vs Platform in Programming Languages

There's a big rant floating around the internet this week about Products vs Platforms. Platforms are service-based, with layer upon layer of exposed and secured and throttled services, while Products give the user everything they need in a perfect streamlined form. That is, Products give the user exactly what the builder of the Product believes they need.

Steve Yegge believes we should all be building platforms. It makes sense; Facebook and iPhone and Android all provide some core functionality and let the whole world build on top of them. They're like this Lego table:

Lego building table

So, applications can be self-contained programs or expansible platforms, or somewhere in between.

Programming languages have this spectrum as well. John Backus, in his famous Turing Award lecture (pdf), divides programming languages into framework (functionality built into the language) and changeable parts (all the reusable modules we create in the language). Backus votes strongly for languages with a small framework, instead emphasizing expressive, easily-combined changeable parts.

Functional languages target tiny reusable components that combine in many ways. Building programs with them is like playing with Legos -- the old-fashioned Legos with small rectangular blocks in bright colors. You can build anything with those.

Legos climbing out of your head

In strictly-OO languages like Java, the reusable parts are bigger and fancier. They're more like the new Legos, where whole walls come in one elaborate piece, and half the pieces are special-purpose bells and whistles.

elaborate yet inflexible Lego castle

You can build a space station or a sturdy castle quickly, but if that castle wall piece is seven lumps wide when you expected six, it's going to be one ugly castle. Combine too many of these bulky languages and specialized frameworks and...

scary alien head made of Legos

Monday, October 10, 2011

Dart. What's the point?

Dart is a nascent web development language. What's the point? I observe two.

Uniformity and familiarity: Like Node.js, Dart wants to run both in the browser and on the server. It tries to look familiar to both JavaScript and Java developers, so that either can adapt quickly. The goal is to move away from having different languages on the front end and the back end. Implement a new user story all the way through the app without switching languages at the client/server boundary.

Evolution from prototype to real app: Several features in Dart aim to smooth the transition from quick proof of concept to real, maintainable application. These include:
* turn a property from a simple field to a something with defined getters and setters without changing any code outside the class. Getters and setters look identical to public fields.
* change a constructor into a factory method that may or may not return a new object, also invisibly to any code outside the class.
* start with dynamic typing (everything is declared as "var") and move to dynamic typing.
* an interactive console in the browser makes very simple tests very easy.

One syntax feature in Dart that jives with CoffeeScript (as presented by Mark Volkmann last week at Lambda Lounge) was single, double, or triple-quoted (multiline) strings with interpolation of variables and expressions.

In some ways Dart stays closer to our familiar OO-languages; variables are mutable by default, and classes are the way to organize code.

Does Dart hit the target? It looks interesting, but it's still very much under development. The interactive console doesn't work in my browser (IE8 - I'm at work). On the server side, Dart runs in a VM. How good is the VM? It is too early to tell - we'll have to wait and watch to see whether Dart sticks.

Thursday, October 6, 2011

Keeping Your Cache Down to Size

Guava's CacheBuilder provides several options for keeping a cache down to a reasonable size. Entries can expire for multiple reasons, entries can be limited to a certain number, and entries can be made available for garbage collection. These options can be used individually or together. Each is simply one more method call during cache creation.

When should we use each alternative? It depends on our objective.

General Memory Conservation


If the objective is to avoid running out of memory JVM-wide, then the garbage collector is the right guy for the job. There are three ways to make our entries available for garbage collection. One of them is right, and two are wrong.

Soft Values: this is the right way.
CacheBuilder.newBuilder().softValues().build(cacheLoader)

This sets up a cache which wraps any value it stores in a SoftReference. If no other object holds a strong reference to the value, the garbage collector has the option of freeing this memory. As Brian Goetz put it, "soft references allow the application to enlist the aid of the garbage collector by designating some objects as 'expendable.'" The key here is that the garbage collector can choose which (if any) soft-referenced objects to collect. The JVM can choose the least recently or least frequently used objects, and it can perform this ranking across all soft-referenced objects, not only the ones in your cache. This makes soft references the optimal choice for overall memory conservation.

When a value is collected, that entry is invalidated in the cache. It may still show up in the total cache.size(), but when cache.get(key) is called, a new value will be loaded and a new entry created.

Now, take my advice: if your goal is to prevent the cache from using all your memory, use softValues() and skip the rest of this blog post. If you are also concerned about frequency of garbage collection, consider maximumSize(), discussed later.

There's one small caution with soft values: it causes values to be compared with == instead of .equals(). This only impacts the rare person who conditionally removes entries from the cache like this:
cache.asMap().remove(key, value); 

Weak Values: If being in the cache is no reason to keep an object around, you can request weakValues(). This wraps the values in a WeakReference. This means that the next round of garbage collection will sweep up any entries whose value is not referenced elsewhere. These entries get invalidated even if memory is not in short supply. That's why soft references are better.

The same == comparison caution applies with weakValues as with softValues.

Weak Keys: This means that any entry whose key is no longer referenced elsewhere should be garbage collected and its entry evicted from the cache. If your keys are transient objects expected to fall out of use after a period of time, never to be used again, then weak keys could make sense. I'm not sure why you would base a cache on something like that.

There's another caveat to weakKeys() -- it changes the way the Cache retrieves values. Weak keys are compared with == instead of .equals(). That could have a huge impact on how your map works. When == is the comparison, you can never construct a key; you need a reference to the same key object. With the cache, if you mess this up, you won't get an error. Instead, the value will be loaded again and a new entry created. The result could be far more entries in the cache than if you hadn't used weakKeys()!

Note that if your keys are Strings, this == comparison can be even more insidious. Identical hard-coded Strings (like those in your tests) will all reference the same object, and == will work. Strings parsed from input text or files will not be the same reference as any other String, so == will fail and more entries will be loaded into the Cache. What's more, any hard-coded String will be allocated in PermGen space and not subject to regular garbage collection, so why wrap it in a weak reference? Just don't do it, people!

Unless your values are smaller than your keys, there's one more reason to prefer softValues() over weakKeys(). When a key is garbage collected, its entry is invalidated in the cache. However, the entry isn't fully cleared until the cache does its cleanup activities. That won't happen until (at the earliest) some call or other is made to the cache. This could delay freeing of the larger memory consumed by the value. Of course, if you're enough of a memory miser to try using weakKeys(), you probably used weakValues() too, so the relative size of keys and values is not a concern.

Restrict Memory Consumption by the Cache


If your goal is to keep this particular cache down to size, then limiting the maximum size of the cache could make sense. Soft values will prevent the cache from running you out of memory, but it will let your cache take up as much memory as you have, which means more frequent garbage collection and slower overall performance. It is therefore useful to limit the overall size of the cache.

CacheBuilder.newBuilder().maximumSize(100).build(cacheLoader)


That's all it takes. Your cache.size() will never exceed 100. Entries may be evicted from the cache before the size limit is reached; the cache will start clearing out old, less-used entries as soon as it gets close to the limit.

Get Old Data Out


If your values are perishable, then expire them. Expiration occurs with a fixed time limit, either from last write (when the value was loaded) or last access (when get(key) was called). For instance, if your reference data changes rarely but not never, you might want to fetch it every hour or so.
CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build(cacheLoader)

I'm not sure when you would prefer expireAfterAccess. Perhaps your data only changes when you're not looking?
Remember that if your objective is to keep less-recently-used objects from taking up space, softValues() is superior.

To programmatically expire objects from the cache, call cache.invalidate(key) or cache.invalidateAll(). This will trigger a new load at the next cache.get(key).

When values expire or are invalidated, it doesn't mean they're immediately removed. It means that any cache.get(key) will result in a miss and a reload. However, cache.size() might show those entries hanging around. The cache will remove these entries thoroughly during periodic maintenance, which can occur at any cache access. To trigger the maintenance explicitly, call cache.cleanUp().

There are a few useful diagnostic tools to evaluate performance of your cache. This is useful for tuning your maximum size and expiration times. Check out Cache.stats() and removalListener if you want to test or optimize.

No Really, Just Use softValues()



CacheBuilder gives you all kinds of ways to keep your cache under control. It is smarter than your average Map.

Goodbye, MapMaker. Hello, CacheBuilder.

Google has released a new version of Guava, and it's bad news for one of my favorite classes, MapMaker. The exciting feature of MapMaker is its ability to produce an insta-cache, a key-value mapping that calculates values on the fly, expires them, and limits the quantity stored -- all while maintaining thread safety.

In release ten, Guava added CacheBuilder. CacheBuilder produces a Cache: a key-value mapping that calculates values on the fly, expires them, and limits the quantity stored -- all while maintaining thread safety.

Wait, that sounds exactly like MapMaker. Poor MapMaker -- its methods are now deprecated. The king is dead, long live the king!

Here's a quick look at what CacheBuilder does:

final Cache cache =
CacheBuilder.newBuilder()
.expireAfterWrite(10, TimeUnit.MINUTES)
.maximumSize(50)
.build(CacheLoader.fromFunction(retrieveInterestingInfoByKey);

Here, retrieveInterestingInfoByKey is a Function that goes to the database, calls a service, or performs some calculation to get the value for a key.

What does this cache do?

The first time cache.get(key) is called, the cacheLoader loads the value, stores the entry in its map, then returns the value. If two threads make this call at the same time, one of them loads the value, and the other one blocks until the value is available. Values are never loaded more times than necessary.

If cache.get(key) is called more than ten minutes after that entry was created, the value will be loaded again. When the size of the cache gets close to the maximum, then the older, less-used entries are evicted.

What is this cache good for? Reference data, certainly. Any time you have data that's loaded over and over again, but doesn't change nearly as often as it's loaded. The whole process can share a single cache. It's a good candidate for a singleton Spring bean.

What is this cache not good for? Anything you want to cache across processes or machines. Anything you want to update. (You can evict entries from the cache, but not put them in.) Anything you want persisted.

I could write whole posts about exception handling in the load, testing and tuning hooks, and memory use optimization of the Cache. But those are decorations on the cake. The real point of CacheBuilder is how quickly and easily you can get a map that gets you the information you want, when you want it, without hitting the database every single time.

MapMaker, your name was fun to say, but we won't miss you now that we have CacheBuilder.