1/8/2014
What is Elasticsearch?
At its core, Elasticsearch is an open, distributed, and document oriented full text search engine that indexes data in real time through a RESTful API.
Elasticsearch (and the underlying engine, Lucene) are written in Java and requires Java 6 or higher to run. You can download and install Java, or if you have it installed you can easily check your version with the following command:
$java -version
That is the only requirement for Elasticsearch.
Download Elasticsearch and extract it onto your computer. After that, navigate to the location you installed it and run:
$bin/elasticsearch -f
or on a Windows machine:
bin\elasticsearch.bat
You can check to see that Elasticsearch is up and running. Once it is started open a new browser window and navigate to http://localhost:9200
If everything is working properly you should see a JSON document with information describing the running instance of Elasticsearch.
cURL is a command line tool that can be used to transfer data across many protocols such as HTTP. It is often used to quickly interact with RESTful APIs. You will see it in much of the Elasticsearch documentation.
Data that is added to Elasticsearch is called a document. Documents are represented in JSON format and you are not required to create a schema before adding it.
{
"id": "1",
"title": "Ulysses",
"author": "James Joyce",
"publish_date": "1922-02-02",
"description": "Ulysses is a modernist novel by James Joyce."
}
Elasticsearch stores data in an index. Elasticsearch can contain multiple indexes to separate data and an index can also be sharded across multiple nodes in a cluster. A search can span multiple indexes.
Indexes can be created and deleted within Elasticsearch.
$curl -XPOST http://localhost:9200/library/
$curl -XDELETE http://localhost:9200/library/
A type is how you keep documents with different schemas separate. As an example, if you had an ecommerce site, you might create an index containing documents of a Product type and also a Review type.
Each of these would have different data structures, but could be included together in a search.
Documents that are added must be analyzed so that they can be searched for. This analysis work is done by an Analyzer and can be configured per field in the document.
There are multiple analyzers that are built in and are useful for different circumstances. It is also possible to turn off analysis for a field if you do not want the field to be indexed.
It is possible to Create, Read, Update, and Delete documents within an index. In addition you can bulk load data into an index.
By default, Elasticsearch will store the entire source of a document in a special field named _source. This is required for certain operations, such as an update.
Once data is added to an index it can be searched for. Searching can be accomplished either through the the URL or via a GET with a JSON body.
The main way to query Elasticsearch is through their Query DSL (domain specific language). The DSL is a JSON document that describes how the search should be put together.
{
"query" : {
"term" : {
"title" : "gatsby"
}
}
}
Filters are a way to select a subset of data as part of a query. They can be used include or exclude data from the query. They do not impact the scoring of the results (how relevant the result is to the query). They should be used instead of a query if the criteria is not important to the score of the result.
{
"filter" : {
"bool" : {
"must" : {
"term" : { "user" : "pzerkel" }
}
}
}
}