How to Filter HBase Scan Based on Column Value in Java
How can we filter a scan of an HBase table based on some column value in Java?
Suppose we have an HBase table with the column
greeting (a column qualifier).
We want to filter the scan results to only
greetings that contain the string
1. Filter cell value using
We can use a
SingleColumnValueFilter to filter cells based on its value.
byte CF = Bytes.toBytes("column_family"); byte CQ = Bytes.toBytes("greeting"); SingleColumnValueFilter filter = new SingleColumnValueFilter( CF, CQ, CompareOp.EQUAL, comparator );
SingleColumnValueFilter will take in a column family and column qualifier for the first two arguments.
For the third and fourth arguments, we’ll want to use the
EQUAL compare operator along with a custom comparator like
RegexStringComparator, where we’ll define our filter condition.
2. Set filter conditions with a comparator
SubstringComparator will return a cell if the supplied substring appears in a cell value in the column.
SubstringComparator comparator = new SubstringComparator("hello");
RegexStringComparator will return a cell if the supplied regular expression matches a cell value in the column.
We can certainly perform more complex operation using regular expressions than with a simple substring comparator, but the filter operations will be less performant.
RegexStringComparator comparator = new RegexStringComparator(".*hello.*");
3. Apply filter to the scan
After defining the comparator and creating the filter, we can apply the filter to a scan.
Scan scan = new Scan(); scan.setFilter(filter);