个人简介
作者是河源的大三学生。以下笔记是作者自学道路上的一些肤浅经验。如果有错误,请纠正。他将来会不断改进笔记,帮助更多Java爱好者入门。
文章目录
-
- 个人简介
- 分布式搜索引擎-ElasticSearch(上集)
-
- 什么是ElasticSearch
- ElasticSearch概念
- ElasticSearch的底层索引
- elasticsearch和关系数据库(MySQL)
- elasticsearch一些注意事项
-
- 跨域问题
- 内存占用过多导致卡顿问题
- elasticsearch和kibana版本问题
- ik分词器
-
- ik使用分词器
- ik扩展分词器分词
- elasticsearch的操作(REST风格)
-
- 创建索引
- 删除索引
- 将数据插入到索引中(document)
- 删除索引中指定的数据(根据id)
- 修改索引中指定的数据
- 删除索引中指定的数据
- 创建映射字段
-
- 指定索引映射只能使用一次
- 使用"_mapping",在索引中添加字段
- 使用_reindex实现数据迁移
- 获取索引信息
- 在指定索引中获取所有记录(_search)
- 获取索引指定的数据
- 获取指定索引的所有数据(match_all:{})
- match查询(只允许单个查询条件)
-
- 如果我们加一个查询条件,
- 精准查询(term)和模糊查询(match)区别
- multi_match类似百度搜索的实现
- 短语(精准)搜索(match_phrase)
- 指定查询显示字段(_source)
- 排序sort
分布式搜索引擎-ElasticSearch(上集)
什么是ElasticSearch
ElasticSearch是一个基于Lucene搜索服务器。基于分布式多用户能力的全文搜索引擎RESTful web接口。Elasticsearch是用Java并作为Apache许可条款下的开放源代码发布是目前流行的企业级搜索引擎。云计算采用设计,可实现实时搜索,稳定、可靠、快速、安装使用方便。
我们建立网站或应用程序并添加搜索功能,但很难创建搜索工作。我们希望搜索解决方案能够快速运行,我们希望有零配置和完全免费的搜索模式,我们希望能够简单地使用它JSON通过HTTP为了索引数据,我们希望我们的搜索服务器总是可用的,我们希望从一个开始,扩展到数百个,我们需要实时搜索,我们需要简单的多租户,我们希望建立一个云解决方案。所以我们用它Elasticsearch解决这些问题和其他可能出现的问题。
ElasticSearch概念
ElasticSearch的底层索引
documentid age name
elasticsearch和关系数据库(MySQL)
mysql数据库(database) ========== elasticsearch的索引(index)
mysql的表(table)==============elasticsearch的type(类型)======以后会被废除
mysql的记录 =========== elasticsearch的文档(document)
mysql的字段 ============= elasticsearch的字段(Field)
elasticsearch的一些注意点
跨域问题
http.cors.enabled: true http.cors.allow-origin: "*"
内存占用过多导致卡顿问题
-Xms1g //最小内存
-Xms1g //最大内存
-Xms256m
-Xms512m
elasticsearch和kibana版本问题
ik分词器
ik分词器的使用
ik分词器是一种中文分词器,但是比如有一些词(例如人名)它是不会分词的,所以我们可以对它进行扩展。
ik_smart : 最少切分(尽可能少切分单词)
ik_max_word : 最多切分 (尽可能多切分单词)
=============================
GET _analyze // _analyze 固定写法
{
"text": ["分布式搜索"],
"analyzer": "ik_smart"
}
ik_max_word :
GET _analyze
{
"text": ["分布式搜索"],
"analyzer": "ik_max_word"
}
ik分词器分词的扩展
GET _analyze
{
"text": ["我是张三,very nice"],
"analyzer": "ik_max_word"
}
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry> //如果有自己新建的dic扩展,就可以加到<entry>xxx.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
比如我们想添加多“张三”这个分词,就可以在my.dic输入进去
GET _analyze
{
"text": ["我是张三,very nice"],
"analyzer": "ik_max_word"
}
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "张三",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "very",
"start_offset" : 6,
"end_offset" : 10,
"type" : "ENGLISH",
"position" : 3
},
{
"token" : "nice",
"start_offset" : 11,
"end_offset" : 15,
"type" : "ENGLISH",
"position" : 4
}
]
}
elasticsearch的操作(REST风格)
创建索引
PUT /hello03
{
//请求体,为空就是没有任何数据
}
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "hello03"
}
删除索引
DELETE hello01
{
}
往索引插入数据(document)
PUT /hello03/_doc/1
{
"name": "yzj",
"age" : 18
}
{
"_index" : "hello03",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
{
"state": "open",
"settings": {
"index": {
"creation_date": "1618408917052",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "OEVNL7cCQgG74KMPG5LjLA",
"version": {
"created": "7060199"
},
"provided_name": "hello03"
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword" //name的底层默认用了keyword(不可分词)
}
}
},
"age": {
"type": "long" //age用了long
}
}
}
},
"aliases": [ ],
"primary_terms": {
"0": 1
},
"in_sync_allocations": {
"0": [
"17d4jyS9RgGEVid4rIANQA"
]
}
}
删除索引中指定的数据(根据id)
DELETE hello01/_doc/004
{
}
修改索引中指定的数据
POST hello02/_update/001
{
"doc": {
"d2":"Java"
}
}
删除索引中指定的数据
DELETE hello02/_doc/001
{
}
创建映射字段
PUT /hello05
{
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"say":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
{
"state": "open",
"settings": {
"index": {
"creation_date": "1618410744334",
"number_of_shards": "1",
"number_of_replicas": "1",
"uuid": "isCuH2wTQ8S3Yw2MSspvGA",
"version": {
"created": "7060199"
},
"provided_name": "hello05"
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"analyzer": "ik_max_word", //说明指定字段类型成功了
"type": "text"
},
"say": {
"analyzer": "ik_max_word",
"type": "text"
}
}
}
},
"aliases": [ ],
"primary_terms": {
"0": 1
},
"in_sync_allocations": {
"0": [
"lh6O9N8KQNKtLqD3PSU-Fg"
]
}
}
指定索引映射字段只能使用一次
PUT /hello05
{
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"say":{
"type": "text",
"analyzer": "ik_max_word"
},
"age":{
"type": "integer"
}
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "resource_already_exists_exception",
"reason" : "index [hello05/isCuH2wTQ8S3Yw2MSspvGA] already exists",
"index_uuid" : "isCuH2wTQ8S3Yw2MSspvGA",
"index" : "hello05"
}
],
"type" : "resource_already_exists_exception",
"reason" : "index [hello05/isCuH2wTQ8S3Yw2MSspvGA] already exists",
"index_uuid" : "isCuH2wTQ8S3Yw2MSspvGA",
"index" : "hello05"
},
"status" : 400
}
使用"_mapping",往索引添加字段
PUT hello05/_mapping
{
"properties": {
"ls":{
"type": "keyword"
}
}
}
使用_reindex实现数据迁移
POST _reindex
{
"source": {
"index": "hello05",
"type": "_doc"
},
"dest": {
"index": "hello06"
}
}
#! Deprecation: [types removal] Specifying types in reindex requests is deprecated.
{
"took" : 36,
"timed_out" : false,
"total" : 5,
"updated" : 0,
"created" : 5,
"deleted" : 0,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
获取索引信息
GET hello05
{
}
获取指定索引中所有的记录(_search)
GET hello05/_search
{
"query": {
"match_all": {}
}
}
获取索引指定的数据
GET hello05/_doc/1
{
}
获取指定索引全部数据(match_all:{})
GET hello05/_search
{
}
GET hello05/_search
{
"query": {
"match_all": {}
}
}
match查询(只允许单个查询条件)
GET hello05/_search
{
"query": {
"match": {
"name": "李" //查询条件
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.9395274,
"hits" : [
{
"_index" : "hello05",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9395274,
"_source" : {
"name" : "李四",
"age" : 3
}
},
{
"_index" : "hello05",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.79423964,
"_source" : {
"name" : "李小龙",
"age" : 45
}
}
]
}
}
如果我们再加多一个查询条件
GET hello05/_search
{
"query": {
"match": {
"name": "李"
, "age": 45
}
}
}
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "[match] query doesn't support multiple fields, found [name] and [age]",
"line" : 6,
"col" : 18
}
],
"type" : "parsing_exception",
"reason" : "[match] query doesn't support multiple fields, found [name] and [age]",
"line" : 6,
"col" : 18
},
"status" : 400
}
精准查询(term)和模糊查询(match)区别
GET hello05/_search
{
"query": {
"match": {
"name": "李龙"
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.0519087,
"hits" : [
{
"_index" : "hello05",
"_type" : "_doc",
"_id" : "4",
"_score" : 2.0519087,
"_source" : {
"name" : "李小龙",
"age" : 45
}
},
{
"_index" : "hello05",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9395274,
"_source" : {
"name" : "李四",
"age" : 3
}
}
]
}
}
**==================**
GET hello05/_search
{
"query": {
"term": {
"name": "李龙"
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
multi_match实现类似于百度搜索
PUT /goods
{
"mappings": {
"properties": {
"title":{
"analyzer": "standard",
"type" : "text"
},
"content":{
"analyzer": "standard",
"type": "text"
}
}
}
}
GET goods/_search
{
"query": {
//下面输入华为,会进行分词,然后在title和content两个字段中搜索
"multi_match": {
"query": "华为",
"fields": ["title","content"]
}
}
}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.1568705,
"hits" : [
{
"_index" : "goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1568705,
"_source" : {
"title" : "华为Mate30",
"content" : "华为Mate30 8+128G,麒麟990Soc",
"price" : "3998"
}
},
{
"_index" : "goods",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0173018,
"_source" : {
"title" : "华为P40",
"content" : "华为P40 8+256G,麒麟990Soc,贼牛逼",
"price" : "4999"
}
}
]
}
}
短语(精准)搜索(match_phrase)
GET goods/_search
{
"query": {
"match_phrase": {
"content": "华为P40手机"
}
}
}
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
指定查询显示字段(_source)
自定义成类似于select id,name from xxx
GET goods/_search
{
"query": {
"multi_match": {
"query": "华为",
"fields": ["title","content"]
}
}
, "_source" : ["title","content"] //指定只显示title和content
}
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.1568705,
"hits" : [
{
"_index" : "goods",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.1568705,
"_source" : {
"title" : "华为Mate30",
"content" : "华为Mate30 8+128G,麒麟990Soc"
}
},
{
"_index" : "goods",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0173018,
"_source" : {
"title" : "华为P40",
"content" : "华为P40 8+256G,麒麟990Soc,贼牛逼"
}
}
]
}
}
排序sort
POST goods/_update/1
{
"doc": {
"od":1
}
}
GET goods/_search
{
"query": {
"multi_match": {
"query": "华为",
"fields": ["title","content"]
}
}
, "sort": [
{
"od": {
"order": "desc" //asc升序,desc降序
}
}
]
}