Archive for Solr:
In another blog post, I wrote about Java Heap Space OutOfMemoryError and how to resolve it. Then I came to the conclusion that allocating more memory to Java in Apache Tomcat would resolve the issue. It did - but only for a while.
It turns out if you run the 32 bit version of Apache Tomcat on Windows, you can only assign 1.3 GB of memory to it. In some cases it's enough, in other cases, it's not.
This is where you should really know the server Tomcat is installed on. If it's a 64 bit machine, you're probably safe. Because if you have a 64 bit machine, you'll be able to allocate more memory to Java in Tomcat.
You just need to install Java 64 bit and Tomcat 64 bit. (I couldn't find Java 64 bit on the Java website so I used FileHippo instead.)
When the 64 bit versions of Java and Tomcat are installed, you'll be able to allocate a lot more memory.
In our specific project where we use Solr (Java based) for search, Tomcat 64 bit in fact used less memory for Java than Tomcat in the 32 bit version. Plus indexing was faster.
Conclusion: if your server runs a 64 bit OS, then you should install the 64 bit Java and Tomcat versions to be able to address Java Heap Space issues.
If you simply want to delete documents from your Solr index by using the web interface, here's a code snippet that lets you do so:
This lets you delete documents where the id field matches 298253.
If you want to delete items that matches more than one field, just add another query:
If you want to delete all items in the index, just use this query:
If you get the Java Heap Space OutOfMemoryError in Apache Tomcat, then there's quite an easy fix *.
In Apache Tomcat, you can customize the memory settings, thus allowing more memory to Java clients like Solr.
Just fire up "Configure Tomcat", click on the "Java" tab and allocate enough memory in the "Initial memory pool" and "Maximum memory pool" fields:
What memory size should you allocate here? Well, it depends. The default initial memory pool is 64 MB. If you've got a pretty large search index with many searches, you'll surelly need more memory allocated.
In a project here, where we've got approximately 20 Million indexed items being searched, 256 MB weren't enough as initial memory pool. 512 MB so far seems to work it out. I guess you just have to try some different values out, and monitor Tomcat's performance while fine-tuning.
* Also take a look at my other blog post on Java heap space issues and 32 / 64 bit versions of Apache Tomcat and Java.
If you encounter problems with Solr indexing and inconsistencies in the actual indexing - you might have a problem with the Solr unique key field.
If you index items that may have the same id, and this id is used as the unique key in Solr - then you have a problem. Items with the same id's will simply not be tolerated in the Solr index, thus the latter item overwrites the former when indexing. Of course, this is a default and correct behaviour from Solr.
However - this is easy to solve. You just have to use the UUID (Universal Unique Identifier) field type in Solr.
In schema.config in your Solr conf folder (normally C:/solr/conf or similar), add the UUID type, like this:
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
Then, add a field of this type (still in schema.config):
<field name="uid" type="uuid" indexed="true" stored="true"
Finally, make sure to point out this field as the unique key, in schema.config:
All done! Now all your indexed items will have a unique ID.
Here's more documentation on Solr and the UniqueKey field.
If you use Solr and have a search query including an UTF-8 character, like Swedish åäö, you have to turn on the correct encoding for Apache Tomcat. You do this by simply adding
URIEncoding="UTF-8" to the appropriate connector in Tomcat's server.xml file (normally located in the Tomcat conf folder).
Here's the complete code snippet to get UTF-8 characters working with Solr search:
<Connector port="8983" protocol="HTTP/1.1"
You can also read about URI Charset Config in the Solr documentation.
Related post: Get started using Solr for search in ASP.NET
Solr is an advanced search coming from Apache's Lucene project. Thanks to SolrNet, a .NET library for Solr, it is quite convenient to use Solr for search in ASP.NET. I'll show you how. (You can also download the sample app right off, if you'd like to)
Install Apache Tomcat and Solr
First of all, make sure you get the latest version of Apache Tomcat and Solr. (I installed Tomcat 7 and Solr 1.4.1 (zip version) as of September 2010.)
When installing Tomcat, make sure to remember the port you specify (normal for Solr is 8983). After installation, the Apache Tomcat Properties window should popup. If not, find Configure Tomcat in the start menu and make sure the web server's started. If it's started, you should find the default Tomcat startpage if you browse to http://localhost:8983.
Before you install Solr, stop the Tomcat web server (through the Configure Tomcat window).
When you've downloaded the Solr zip file (make sure it's the zip version!), unzip the archive and find the dist folder. In the dist folder, find the apache-solr-1.4.1.war file and copy it to C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps, renaming it to solr.war.
Now, we also need to create the Solr folder, which will host our Solr configuration files, indexes and so on. I created C:\Solr. You'll also need to copy the contents of the apache-solr-1.4.1\example\solr folder to your newly created Solr folder. When you're done, you should at least have the bin and conf folder.
Finally, we need to tell Tomcat where our Solr folder is located. Open up the Configure Tomcat window, navigate to the Java tab and add this row to Java Options:
It should look like this:
Now, you should first start the web server and then be able to navigate to http://localhost:8983/solr. Installation and basic configuration done!
Quick look at the configuration files
The Solr configuration files are important - you will use them to tell Solr what should be indexed and not. The most important config files are schema.xml and solrconfig.xml. These are located in the C:\Solr\conf folder.
Easier use of Solr with the SolrNet library
SolrNet is a great .NET library for Solr, making it all easier. Download assemblies (and samples) on the SolrNet Google Code page.
Sample ASP.NET app with SolrNet for download
I've developed a sample web application for you, using SolrNet for search in ASP.NET.
Basically, you've got some data in the SQL Server database and use SolrNet to find items in the search index and present all you want from the database.
Here's what you should know:
- Map fields to Solr using attributes (Player.cs in the classes folder)
- This is sample code, not everything might be suitable for a production environment
- You should use Linq to Sql, NHibernate or similar for better scaling and easier data access
Download the sample Solr app
I'd like to thank A. Friedman for his contribution to the Solr and ASP.NET world. Here's his great blog post on Solr and SolrNet.
Code snippets using SolrNet
Here's some code snippets from my Solr app. You can find them in the source code, although I found it being a good idea to post code for a couple of common actions using Solr and SolrNet for search.
Search the index and bind to Repeater:
var search = new DefaultSearcher()
.Search(query, 10, 1);
rptResults.DataSource = search.Result;
Re-index all data:
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Player>>();
var players = new PlayerRepository().GetPlayers();
Remove from index:
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<Player>>();
var specificPlayer = new PlayerRepository().GetPlayer(id);
Adding multiple fields to the search index
By standard, Solr lets you index one field only, thanks to the defaultSearchField in schema.xml. It's easy to turn on indexing of multiple fields though, using copyField and an additional field which takes multi values.
What you have to do is to edit schema.xml a bit:
- Setup the fields you want to get indexed, using field.
- Create an additional field called "text", setting its multiValued property to true.
- Use copyField to copy data to this additional field.
- Use this additional field, "text", as the defaultSearchField.
<field name="id" type="int" indexed="true" stored="true" required="true" />
<field name="firstname" type="text" indexed="true" stored="false" required="false" />
<field name="lastname" type="text" indexed="true" stored="false" required="false" />
<field name="text" type="text" indexed="true" stored="false" multiValued="true" />
<copyField source="firstname" dest="text" />
<copyField source="lastname" dest="text" />
<solrQueryParser defaultOperator=uot;AND" />
If you encounter any problems with Solr, try this to get it working:
- Turn off elevate.xml handler (comment appropriate lines in solrconfig.xml).
- Case sensitive configuration files - make sure you spell copyField, multiValued etc correctly.
- In solrconfig.xml, make sure you use matching data types to those you've defined in your ASP.NET app.
Solr is really powerful and gives you a lot of options. I recommend the Solr Wiki for more information on what actually is possible.