Solr vs. elasticsearch who is the king of open source search engine
At present, cloud computing and data are growing rapidly. Today's applications are producing data at the rate of Pb and ZB, but people are still pursuing higher and faster performance requirements. With the accumulation of data, how to search these data quickly and effectively has become a challenge to back-end services. In this article, we will compare the two most popular open source search engines in the industry, Solr and elasticsearch. Both are built on the Apache Lucene open source platform. Their main functions are very similar, but there are great differences in deployment ease of use, scalability and other functions.
About Apache Solr
Apache Solr is based on Lucene, the famous Java open source search engine in the industry. Lucene is more a software package than a search engine, while Solr completes the encapsulation of Lucene, which is a real search engine framework. In the past ten years, Solr has developed and has a wide range of user groups. Solr provides distributed indexing, sharding, replica sets, load balancing, and automatic failover and recovery. If properly deployed and well managed, Solr can become a highly reliable, scalable and error tolerant search engine. Many Internet giants, such as Netflix, eBay, instagram and Amazon (cloudsearch), use Solr.
Main features of Solr:
About elasticsearch
Like Solr, elasticsearch is built on the Apache Lucene library and is an open source search engine. Elasticsearch was launched a few years after Solr's launch. It provides a distributed and multi tenant full-text search engine through JSON documents of rest and schema free (schema does not need to be pre-defined, Solr needs to be pre-defined). It also officially provides Java, groovy, PHP, ruby, Perl, python,. Net and JavaScript clients.
The distributed search engine contains indexes that can be divided into shards, and each shard can have multiple replicas. Each elasticsearch node can have one or more shards, and its engine acts as a coordinator to forward operations to the correct shard at the same time.
Elasticsearch can be extended to be a quasi real-time search engine. One of the key features is the multi tenant function. Indexes can be divided according to different purposes, and multiple indexes can be operated at the same time.
Elasticsearch features:
Heat contrast
Before starting the comparison, we can check the search popularity of the two in Google. We can see that elasticsearch is very attractive compared with Solr after 2013, but this does not mean that Apache Solr is dead. Although not recognized by many people, Solr is still one of the most popular search engines with strong open source community support.
Installation and configuration
Elasticsearch is relatively easy to install and very lightweight compared to Solr. The current version of Solr's distribution package size (6.4.2) is about 150 MB, while the current version of elasticsearch distribution package size (5.2.2) is only 32.2mb.
However, if elasticsearch is not well managed, this ease of deployment and use may become a problem. JSON based configuration is easy, but if you want to specify comments for each configuration in the file, it's not for you. Solr also provides rest API. You can create custom sharding sets, record clustering algorithms and execute custom sharding through the set API.
In general, if your application uses JSON, elastic search is a better choice. Otherwise, use Solr because of its schema XML and solrconfig XML has good documentation.
Indexing and searching
data source
Solr accepts data from different sources, including XML files, comma delimited (CSV) files and data extracted from tables in the database, as well as common file formats (such as Microsoft Word and PDF).
Elasticsearch also supports data from other sources, such as ActiveMQ, AWS SQS, dynamodb (Amazon NoSQL), filesystem, GIT, JDBC, JMS, Kafka, LDAP, mongodb, neo4j, rabbitmq, redis, Solr and twitter. Various plug-ins are available.
search
Solr focuses on text search, while elasticsearch is often used for query, filtering and grouping analysis statistics. The team behind elasticsearch is also trying to make these queries more efficient. Therefore, when comparing the two, elasticsearch is undoubtedly the best choice for applications that need not only text search, but also complex time series search and aggregation.
Indexes
Both support the use of stop words and synonyms to match documents.
In Solr, the join between indexes must be a single fragment associated with the replica set on other nodes to search the relationship between documents (such as SQL connection). Elasticsearch provides more efficient has_children and top_children queries to retrieve such related documents.
Scalability and distributed
Search engines need to process millions of documents. Based on this, search engines should be replicable, modular and scalable, and support cluster and distributed architecture.
Designed for the cloud
Elasticsearch is very easy to scale and has enough use cases that require large clusters.
Solr also implements a distributed deployment mode similar to ES based on Apache zookeeper. Zookeeper is a mature and widely used stand-alone application.
In contrast, elasticsearch has a built-in component named Zen similar to zookeeper, which maintains the cluster state through internal coordination mechanism.
It can be said that elasticsearch is designed for the cloud and is the first choice for distributed.
Slice splitting and rebalancing
Shards is the partition unit of luence index, which is used by Solr and elastic search. You can distribute indexes by running shard on different computers in the cluster. With the introduction of solrcloud, Solr began to support shard splitting, which allows you to add more shards by splitting existing Shards. In contrast, elasticsearch still does not support this, in fact, it actually prevents this practice. Es you can use the automatic fragment balancing feature by adding more computers to the settings. by comparison, Solr allows adding shards (when implicit routing is used) or splits (when using composite ID), but you cannot delete Shards. It allows you to add replicas. In elasticsearch, by default, each index has five Shards. It does not allow you to change the number of primary shards, but it allows you to increase the number of replicas. Shard rebalancing is very useful for horizontal expansion. When adding a new machine, it will automatically rebalance the available shards in different machines Slice.
community
Solr has an extensive open source community. Anyone can contribute to Solr, and new Solr developers or code submitters can only choose according to the function. Elasticsearch is technically open source, but not completely. All contributors have access to the source code, and users can make changes and provide. However, the final change is confirmed and completed by employees of elastic (the company running elasticsearch and other software). Therefore, elasticsearch is more driven by a single company than the whole community.
Solr contributors and submitters span multiple organizations, while elasticsearch submitters are only from elastic. It was also pointed out that Solr's strong community has a healthy project pipeline and the participation of many well-known companies. These members also invest in the platform by contributing throughout the development and engineering process.
Both have a good user base and a rich developer community, but elasticsearch is newer than Solr. Solr has existed for a longer time, so its ecosystem is developed and has a larger user base.
file
Solr scores high here. It is a very well documented product with clear examples and API use case scenarios. Elasticsearch's documentation is well organized, but it lacks good examples and clear configuration instructions.
Solr or elastic search?
Through the above comparison, it is difficult to determine who is the ultimate winner. In fact, whether you choose Solr or elasticsearch, you first need to understand your user scenarios and future needs. Let's summarize:
Remember:
In short, both are feature rich search engines and give more or less the same performance as long as they are designed and implemented well.
The main content of this paper is translation http://logz.io/blog/solr-vs-elasticsearch/ , thank the author, thank Google translation!