Friday, April 13, 2012

Inserting data into elasticsearch over HTTP: a breakdown

There are a zillion examples of what to type to insert into elasticsearch. But what does each part mean?

Shoving information into elasticsearch is pretty easy. You don't have to set anything up. (so, if you have a typo, good luck figuring out later where your data went. This is the price of sensible defaults.)

Here is one way to throw some data into an index - type this at a *nix prompt:

curl -XPOST "http://localhost:9200/indexname/typename/optionalUniqueId" -d '{ "field" : "value" }'

Here is what that means:
curl is a command that sends messages over HTTP.
-X is the option that tells curl which HTTP command to send. GET is the default.
POST is one of the HTTP commands that you can use for this insertion. PUT will work as well, but then the optionalUniqueId is not optional.
localhost is the machine where elasticsearch is running.
9200 is the default port number for elasticsearch
indexname is the name of your index. This must be in all lowercase. You can use different indexes to restrict your searches later. Also, indexes are associated with particular mapping configurations. The defaults are sensible, but know that you can configure stuff by index and search by index (or multiple indexes).
typename describes the type of document you're sticking into the index. You can use this later to narrow searches. Also, the ID of each document in the index should be unique per type.
optionalUniqueId if you have an intelligent ID for the document you're sticking in, then put it here. Otherwise elasticsearch will create one. When you want to update your object, you'll need this. it's also handy for retrieval of exactly one object.
-d tells curl "here comes the data!"
{ "field" : "value" } represents any valid JSON. all this stuff is stored for your object.

The output of this is an HTTP 200 if the document was updated or HTTP 201 if a document was created. If you want curl to tell you what http status code came back, add this to your command line: -w " status: %{http_code}"

Here are two of the easiest ways to see what you just inserted.

Retrieve by ID:

curl "http://localhost:9200/indexname/typename/optionalUniqueId?pretty=true"

This does a GET to fetch the object by ID.
?pretty=true tells elasticsearch to put newlines and indentation into the JSON so that it's easier for humans to read.

Retrieve everything in the index:
curl "http://localhost:9200/indexname/_search?pretty=true"

_searchtells elasticsearch that this is a query. Since no parameters are provided, everything is returned.
Notice that typename is omitted here. If you include it, then you'll get back everything of that type in the index.

4 comments:

  1. Thanks for taking the time to put this together. It provided me a very good starting point for feeding data into elasticsearch.

    ReplyDelete
  2. Its very helpful for beginners. Thanks a ton

    ReplyDelete
  3. curl -XGET "http://localhost:9200/indexname/_search?pretty=true"

    Make sure that you put -XGET for Retrieve by ID and Retrieve everything in the index.. then works perfectly

    ReplyDelete
  4. Sorry, I am newbies to ElasticSearch, may I ask that where can I run the curl-XGet .... I google many place I still don't see any indicate how to run it?

    ReplyDelete