If you’ve followed my blog in the past, you know that Neo4j lagged behind a bit, replacing the old manual indexes with respective schema indexes.

But now with Neo4j 3.x on the horizon this will be amended. Today I want to present some of the capabilities will be available with version 3.0.

  • index supported STARTS WITH, CONTAINS, ENDS WITH text search

  • index supported range searches for strings and numbers

  • combining indexes for multi-faceted lookups

  • spatial point and distance functions

Some of these capabilities don’t cover all the use-cases, but already provide great improvements to the existing functionality.

Today we’ll use the Factual dataset, which is also nicely demoed in the popoto.js structural graph search (example app).

TODO update all examples as soon as I have the factual dataset.

CREATE INDEX ON :Restaurant(name);
CREATE INDEX ON :Restaurant(zip);

LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/jexp/02c29202537e893a99f0/raw/factual_ca.csv" AS row
CREATE (r:Restaurant {name:row.name, latitude:toFloat(row.latitude),toFloat(longitude:row.longitude),zip:toInt(row.zip)});

Schema Indexes

Schema indexes and constraints are defined like this:

CREATE CONSTRAINT ON (r:Restaurant) ASSERT r.id IS UNIQUE;
CREATE INDEX ON :Restaurant(name);
CREATE INDEX ON :Restaurant(opened);

If you prefix your queries with EXPLAIN or PROFILE you should see the index being used with one of these Operations:

  • NodeIndexSeek, NodeUniqueIndexSeek, NodeIndexSeekByRange, NodeUniqueIndexSeekByRange

The textual query plan of a query looks like this:

ANd the visual query plan like this:

image::[]

Since version 2.3 Cypher began to provide support for schema-indexed text search, starting with STARTS WITH. In Neo4j 3.x CONTAINS and ENDS WITH will also use existing indexes.

Note
Currently all 3 functions are case sensitive. You might need to use a separate, indexed lower-cased property for text search.
CREATE INDEX ON :Restaurant(name);

MATCH (r:Restaurant) WHERE r.name STARTS WITH "Taco" RETURN count(*);
MATCH (r:Restaurant) WHERE r.name CONTAINS "Village" RETURN count(*);
MATCH (r:Restaurant) WHERE r.name ENDS WITH "House" RETURN count(*);

Range Searches

With the same 2.3 version, you were also able to use range searches on indexed properties. This works for numeric fields like timestamps, years, prices but also on textual properties like date-strings.

All of the following predicate expressions work and use an index if available:

restaurant.rating > 4
restaurant.guests > 200
90 < restaurant.price < 130
"2010-01-01" < restaurant.opened < "2010-09-01"

Multiple Facets

Cypher’s query planner chooses the most selective index to query for a label-property combination if there are multiple choices. Then post-filtering other properties to check upon is the fastst way to get your data.

If need be, you can also force it to use an index with an index hint: MATCH (n:Label) USING INDEX n:Label(prop) WHERE n.prop = "foo".

Sometimes you have really large datasets (more than ten million entries) to query upon. Then two index lookups and joining them together might be the better choice.

You can achieve it with this (not so pretty variant):

MATCH (r:Restaurant)
WHERE r.name CONTAINS "Village"
MATCH (o:Restaurant)
WHERE o.year > 2010 AND r = o
RETURN r;

TODO explain

Geospatial Functions

The point function creates a geo-point from a node, relationship or map with a latitude and a longitude property with floating point values. The distance function computes the sperical distances between two of these points in meters.

So you can filter two locations on distance or order a group of locations by distance. Those locations can both be nodes or one node and one point generated from user-input, which would be passed in as geo parameter with the two properties.

MATCH (r:Restaurant)
WHERE r.name CONTAINS "Sushi"
AND distance(point(r),point({latitude:52.0, longitude:11.0})) < 20*1000
RETURN r;