一、介绍
1.hbase运算符
2.Hbase 过滤器的比较器
二、代码
1.hbase建表
2.创建数据
3.导入依赖
4.列值过滤器
5.单列值过滤器
6.单列值排除过滤器
7.rowkey过滤器
8.rowkey前缀过滤器:PrefixFilter
9. 列簇过滤器
10.列过滤器
11.综合过滤器
一、介绍
- 列值过滤器
- SingleColumnValueFilter 单列值过滤器
- SingleColumnValueExcludeFilter 单列值排除过滤器
- rowkey过滤器
- rowkey前缀过滤器:PrefixFilter
- 列簇过滤器
- 列过滤器
- PageFilter 分页过滤器
- 分页过滤器 改进版
- 综合使用多过滤器 之前Hbase查询表中的数据是通过的 get 和 scan ,但是get只能查询一行数 据,scan虽然可以查询范围内的数据,但这个范围的划分只取决于行键的范围,不能像 mysql或者hive对列值进行筛选查询等操作。
过滤器可以根据列族、列、版等更多条件过滤数据, 基于 HBase 三维有序(行键、列、版本有序),这些过滤器可以有效地完成查询和过滤的任务 过滤条件 RPC 查询请求将过滤器分发给每个人 RegionServer(这是服务器过滤器) 降低网络传输压力。 使用过滤器至少需要两种参数: 一种是抽象的比较运算符,另一种是比较器
1.hbase运算符
// CompareFilter.CompareOp.LESS_OR_EQUAL LESS < LESS_OR_EQUAL <= EQUAL = NOT_EQUAL <> GREATER_OR_EQUAL >= GREATER > NO_OP 排除所有
2.Hbase 过滤器的比较器
BinaryComparator 按字节索引顺序比较指定字节数组,采用Bytes.compareTo(byte[]) BinaryPrefixComparator 和前面一样,只是比较左端的数据是否相同 NullComparator 判断给定的是否为空 BitComparator 按位比较 a BitwiseOp class 做异或,和,并操作 RegexStringComparator 提供正则比较器,仅支持 EQUAL 和非EQUAL SubstringComparator 判断提供的子串是否出现在判断中table的value中。
比较过滤器:可用于比较过滤器:rowkey、列簇、列、列值过滤器
列值过滤器:ValueFilter
列过滤器:QualifierFilter
列簇过滤器:FamilyFilter
rowKey过滤器:RowFilter 专用过滤器:只适用于特定过滤器
单列值过滤器:SingleColumnValueFilter
列值排除过滤器:SingleColumnValueExcludeFilter
rowkey前缀过滤器:PrefixFilter
分页过滤器PageFilter
二、代码
1.hbase建表
hbase(main):002:0> create 'emp2','info' 0 row(s) in 1.7460 seconds => Hbase::Table - emp2
2.创建数据
put 'emp二、info:name','zhangsan' put 'emp二、info:job','preader' put 'emp二、info:salary','35000' put 'emp2','1001','info:deptName','TP' put 'emp十二、info:name','lisi' put 'emp十二、info:job','preader' put 'emp十二、info:salary','35000' put 'emp2','1002','info:deptName','AC' put 'empsinfo:name','gopal' put 'empsinfo:job','manager' put 'empsinfo:salary','50000' put 'empsinfo:deptName','TP' put 'emp1202info:name','manisha' put 'emp1202info:job','preader' put 'emp2','1202','info:salary','50000' put 'emp1202info:deptName','TP' put 'emp1203info:name','kalil' put 'emp1203info:job','phpdev' put 'emp2','1203','info:salary','30000' put 'emp1203info:deptName','AC' put 'emp1204info:name','prasanth' put 'emp1204info:job','phpdev' put 'emp2','1204','info:salary','30000' put 'emp2','1204','info:deptName','AC' put 'emp1205info:name','kranthi' put 'emp2','1205','info:job','admin' put 'emp1205info:salary','20000' put 'emp1205info:deptName','TP' put 'emp1206,info:name','satishp' put 'emp2','1206','info:job,'grpdes'
put 'emp2','1206','info:salary','20000'
put 'emp2','1206','info:deptName','GR'
3.导入依赖
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.4.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.4.13</version>
</dependency>
</dependencies>
4.列值过滤器
列值过滤器仅仅针对单元格中的值进行过滤,满足 比较运算符加上比较器 构成的过滤条件,则留下,否则为 null 虽然这里给出的过滤条件是 salary>30000 ,但是发现在代码里面根本没有指定salary这一列,因此实际上,列值过 滤器是与所有列的所有单元格进行比较。如果满足条件则保留数据,如果不满足则过滤掉该数据,查询时不满足 过滤条件的单元格都为null 这里的 id 和 name 等列的值也能显示出现,是因为这里的比较器是按照字节数组进行比较,id和name里面 的值都满足过滤条件,所以没有 变成null
package com.lenovo.Filter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
public class ValueFilterDemo {
//成员变量自动初始化
Connection connection;
Admin admin;
TableName tableName;
Table table;
/**
* @Date 2022.04.26
* @Description 获取连接以及表对象
*/
@Before
public void createConnection(){
//局部变量手动初始化
//获取配置对象
Configuration configuration = new Configuration();
configuration.set("hbase.zookeeper.quorum", "IP地址");
//获取连接
try {
connection = ConnectionFactory.createConnection(configuration);
//获取管理员对象
admin = connection.getAdmin();
//获取表名
tableName = tableName.valueOf("emp2");
//获取表的对象
table = connection.getTable(tableName);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @Date 2022.04.26
* @Description 列值过滤器
* salary>30000
*/
@Test
public void test(){
//创建比较运算符以及比较运算器
BinaryComparator binaryComparator = new BinaryComparator("30000".getBytes());
//创建过滤器
ValueFilter valueFilter = new ValueFilter(CompareFilter.CompareOp.GREATER, binaryComparator);
//调用print方法
print(valueFilter);
}
//打印过滤后数据
private void print(Filter filter){
//创建Scan对象
Scan scan = new Scan();
//将过滤器放到scan对象
scan.setFilter(filter);
//调用scan
try {
ResultScanner scanner = table.getScanner(scan);
//解析
for (Result result : scanner) {
//scan扫描返回的时多行数据,遍历循环每一行的数据
// 利用getrow方法取出rowkey
// 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
//拿到rowKey
String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
//拿到其他列
String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@After
public void close(){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
运行结果:
5.单列值过滤器
package com.lenovo.Filter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
public class ValueFilterDemo {
//成员变量自动初始化
Connection connection;
Admin admin;
TableName tableName;
Table table;
/**
* @Date 2022.04.26
* @Description 获取连接以及表对象
*/
@Before
public void createConnection(){
//局部变量手动初始化
//获取配置对象
Configuration configuration = new Configuration();
configuration.set("hbase.zookeeper.quorum", "IP地址");
//获取连接
try {
connection = ConnectionFactory.createConnection(configuration);
//获取管理员对象
admin = connection.getAdmin();
//获取表名
tableName = tableName.valueOf("emp2");
//获取表的对象
table = connection.getTable(tableName);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @Date 2022.04.26
* @Description 单列值过滤器
* 可以指定一个列进行过滤
* 该过滤器会将符合过滤条件的列对应的cell所在的整行数据进行返回
* 如果某条数据的列不符合条件,则会将整条数据进行过滤
* 如果数据中不存在指定的列,则默认会直接返回,并且该列全为null
* salary>30000
*/
@Test
public void SingleColumnValueFilter() throws IOException {
SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
"info".getBytes(),
"salary".getBytes(),
CompareFilter.CompareOp.GREATER,
"30000".getBytes()
);
print(singleColumnValueFilter);
}
//打印过滤后数据
private void print(Filter filter){
//创建Scan对象
Scan scan = new Scan();
//将过滤器放到scan对象
scan.setFilter(filter);
//调用scan
try {
ResultScanner scanner = table.getScanner(scan);
//解析
for (Result result : scanner) {
//scan扫描返回的时多行数据,遍历循环每一行的数据
// 利用getrow方法取出rowkey
// 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
//拿到rowKey
String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
//拿到其他列
String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@After
public void close(){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
运行结果:
6.单列值排除过滤器
package com.lenovo.Filter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
public class ValueFilterDemo {
//成员变量自动初始化
Connection connection;
Admin admin;
TableName tableName;
Table table;
/**
* @Date 2022.04.26
* @Description 获取连接以及表对象
*/
@Before
public void createConnection(){
//局部变量手动初始化
//获取配置对象
Configuration configuration = new Configuration();
configuration.set("hbase.zookeeper.quorum", "IP地址");
//获取连接
try {
connection = ConnectionFactory.createConnection(configuration);
//获取管理员对象
admin = connection.getAdmin();
//获取表名
tableName = tableName.valueOf("emp2");
//获取表的对象
table = connection.getTable(tableName);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @Date 2022.04.26
* @Description 单列值排除过滤器
* 列值过滤器返回的是全部的行,而单列值过滤器返回的是满足过滤条件的行
* salary=30000
*/
@Test
public void SingleColumnValueExcludeFilter(){
BinaryPrefixComparator binaryPrefixComparator = new BinaryPrefixComparator("30000".getBytes());
SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(
"info".getBytes(),
"salary".getBytes(),
CompareFilter.CompareOp.EQUAL,
binaryPrefixComparator
);
print(singleColumnValueFilter);
}
//打印过滤后数据
private void print(Filter filter){
//创建Scan对象
Scan scan = new Scan();
//将过滤器放到scan对象
scan.setFilter(filter);
//调用scan
try {
ResultScanner scanner = table.getScanner(scan);
//解析
for (Result result : scanner) {
//scan扫描返回的时多行数据,遍历循环每一行的数据
// 利用getrow方法取出rowkey
// 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
//拿到rowKey
String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
//拿到其他列
String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@After
public void close(){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
运行结果:
7.rowkey过滤器
package com.lenovo.Filter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
public class ValueFilterDemo {
//成员变量自动初始化
Connection connection;
Admin admin;
TableName tableName;
Table table;
/**
* @Date 2022.04.26
* @Description 获取连接以及表对象
*/
@Before
public void createConnection(){
//局部变量手动初始化
//获取配置对象
Configuration configuration = new Configuration();
configuration.set("hbase.zookeeper.quorum", "IP地址");
//获取连接
try {
connection = ConnectionFactory.createConnection(configuration);
//获取管理员对象
admin = connection.getAdmin();
//获取表名
tableName = tableName.valueOf("emp2");
//获取表的对象
table = connection.getTable(tableName);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @Date 2022.04.26
* @Description rowkey过滤器
* rowkey过滤器加上前缀比较器
* 过滤出rowkey(id)以100开头的
*/
@Test
public void rowKeyFilter(){
BinaryPrefixComparator binaryPrefixComparator = new BinaryPrefixComparator("100".getBytes());
RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, binaryPrefixComparator);
print(rowFilter);
}
//打印过滤后数据
private void print(Filter filter){
//创建Scan对象
Scan scan = new Scan();
//将过滤器放到scan对象
scan.setFilter(filter);
//调用scan
try {
ResultScanner scanner = table.getScanner(scan);
//解析
for (Result result : scanner) {
//scan扫描返回的时多行数据,遍历循环每一行的数据
// 利用getrow方法取出rowkey
// 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
//拿到rowKey
String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
//拿到其他列
String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@After
public void close(){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
运行结果:
8.rowkey前缀过滤器:PrefixFilter
package com.lenovo.Filter;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
public class ValueFilterDemo {
//成员变量自动初始化
Connection connection;
Admin admin;
TableName tableName;
Table table;
/**
* @Date 2022.04.26
* @Description 获取连接以及表对象
*/
@Before
public void createConnection(){
//局部变量手动初始化
//获取配置对象
Configuration configuration = new Configuration();
configuration.set("hbase.zookeeper.quorum", "IP地址");
//获取连接
try {
connection = ConnectionFactory.createConnection(configuration);
//获取管理员对象
admin = connection.getAdmin();
//获取表名
tableName = tableName.valueOf("emp2");
//获取表的对象
table = connection.getTable(tableName);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* @Date 2022.04.26
* @Description rowkey前缀过滤器
* rowkey过滤器加上前缀比较器
* 过滤出rowkey(id)以100开头的
* rowkey过滤器加上前缀比较器后,与rowkey前缀过滤器的效果相同
*/
@Test
public void PrefixFilter(){
PrefixFilter prefixFilter = new PrefixFilter("100".getBytes());
print(prefixFilter);
}
//打印过滤后数据
private void print(Filter filter){
//创建Scan对象
Scan scan = new Scan();
//将过滤器放到scan对象
scan.setFilter(filter);
//调用scan
try {
ResultScanner scanner = table.getScanner(scan);
//解析
for (Result result : scanner) {
//scan扫描返回的时多行数据,遍历循环每一行的数据
// 利用getrow方法取出rowkey
// 利用getValue方法取出这一行的value值,根据列簇和列确定一个单元格的值
//拿到rowKey
String rowKey = Bytes.toString(result.getRow());//byte数组转字符串
//拿到其他列
String name = Bytes.toString(result.getValue("info".getBytes(), "name".getBytes()));
String job = Bytes.toString(result.getValue("info".getBytes(), "job".getBytes()));
String salary = Bytes.toString(result.getValue("info".getBytes(), "salary".getBytes()));
String deptName = Bytes.toString(result.getValue("info".getBytes(), "deptName".getBytes()));
System.out.println("id:"+rowKey+",name:"+name+",job:"+job+",salary:"+salary+",deptName:"+deptName);
}
} catch (IOException e) {
e.printStackTrace();
}
}
@After
public void close(){
try {
admin.close();
} catch (IOException e) {
e.printStackTrace();
}
try {
connection.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
运行结果:
9. 列簇过滤器
/**
* @Date 2022.04.26
* @Description 列簇过滤器
* 匹配列簇
*/
@Test
public void familyFilterTest(){
RegexStringComparator regexStringComparator = new RegexStringComparator("i[a-zA-Z]");
FamilyFilter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, regexStringComparator);
print(familyFilter);
}
运行结果:
10.列过滤器
/**
* @Date 2022.04.26
* @Description 列过滤器
* 匹配子字符串
* 其他列为null
*/
@Test
public void substringFilterTest(){
SubstringComparator substringComparator = new SubstringComparator("me");
QualifierFilter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, substringComparator);
print(qualifierFilter);
}
运行结果:
11.综合过滤器
@Test
public void manyColumnFilter(){
//查询列
SubstringComparator substringComparator = new SubstringComparator("a");
QualifierFilter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL, substringComparator);
//查询salary>30000
BinaryComparator binaryComparator = new BinaryComparator("30000".getBytes());
SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter("info".getBytes(), "salary".getBytes(), CompareFilter.CompareOp.GREATER, binaryComparator);
//利用FilterList,将多个过滤器放在一起,一起过滤
FilterList filterList = new FilterList();
filterList.addFilter(qualifierFilter);
filterList.addFilter(singleColumnValueFilter);
print(filterList);
}
运行结果: