🌭MongoDB 聚合操作和优化
1️⃣MongoDB 聚合操作
MongoDB聚合框架是一个计算框架作用在一个或几个集合对集合中的数据进行一系列运算,将这些数据转化为期望的形式。
插入案例数据
> db.orders.insertMany(
[
{
"zip": "000001",
"phone": "13101010101",
"name": "LiuBei",
"status": "created",
"shippingFee": 10,
"orderLines": [
{
"product": "Huawei Meta30 Pro",
"sku": "2002",
"qty": 100,
"price": 6000,
"cost": 5599
},
{
"product": "Huawei Meta40 Pro",
"sku": "2003",
"qty": 10,
"price": 7000,
"cost": 6599
},
{
"product": "Huawei Meta40 5G",
"sku": "2004",
"qty": 80,
"price": 4000,
"cost": 3700
}
]
},
{
"zip": "000001",
"phone": "13101010101",
"name": "LiuBei",
"status": "created",
"shippingFee": 10,
"orderLines": [
{
"product": "Huawei Meta30 Pro",
"sku": "2002",
"qty": 100,
"price": 6000,
"cost": 5599
},
{
"product": "Huawei Meta40 Pro",
"sku": "2003",
"qty": 10,
"price": 7000,
"cost": 6599
},
{
"product": "Huawei Meta40 5G",
"sku": "2004",
"qty": 80,
"price": 4000,
"cost": 3700
}
]
},
{
"zip": "000001",
"phone": "13101010101",
"name": "LiuBei",
"status": "created",
"shippingFee": 10,
"orderLines": [
{
"product": "Huawei Meta30 Pro",
"sku": "2002",
"qty": 100,
"price": 6000,
"cost": 5599
},
{
"product": "Huawei Meta40 Pro",
"sku": "2003",
"qty": 10,
"price": 7000,
"cost": 6599
},
{
"product": "Huawei Meta40 5G",
"sku": "2004",
"qty": 80,
"price": 4000,
"cost": 3700
}
]
}
])
原价总额,订单总额 添加两个字段,值为每个订单的原价总价和 订单总额
> db.orders.aggregate([{$addFields: { totalPrice:{ $sum: "$orderLines.price"}, totalCost: { $sum: "$orderLines.cost"}, }}]).pretty()
#展示部分数据
......
"totalPrice" : 17000,
"totalCost" : 15898
......
mongoDB的聚合操作其实可以做多步骤的操作,如上面的操作现实针对整个集合添加两个字段:totalPrice:订单原价总额
和 totalCost订单实际总额
,然后我们还可以吧对应的结果传递给第二个步骤,比如后续给订单的总价格进行排序等
[{
$addFields: {
totalPrice: {
$sum: "$orderLines.price"
},
totalCost: {
$sum: "$orderLines.cost"
},
}
}
, // stage 1
{
$sort: {
totalPrice: -1
}
} // stage 2
]
聚合表达式
# 获取字段信息
$<field> : 用 $ 指示字段路径
$<field>.<sub field> : 使用 $ 和 . 来指示内嵌文档的路径
# 常量表达式
$literal :<value> : 指示常量 <value>
# 系统变量表达式
$$<variable> 使用 $$ 指示系统变量
$$CURRENT 指示管道中当前操作的文档
聚合管道阶段
-
$project
对输入文档进行再次投影投影操作, 将原始字段投影成指定名称, 如将 集合中的 nickName 投影成 name
> db.userInfo.find() { "_id" : ObjectId("627639b1d3204ee658022f56"), "nickName" : "zhangsan", "age" : 18 } { "_id" : ObjectId("627639b1d3204ee658022f57"), "nickName" : "lisi", "age" : 20 } > db.userInfo.aggregate({$project:{ name:"$nickName"}}); { "_id" : ObjectId("627639b1d3204ee658022f56"), "name" : "zhangsan" } { "_id" : ObjectId("627639b1d3204ee658022f57"), "name" : "lisi" } #可以灵活控制输出文档的格式,也可以剔除不需要的字段,如不显示索引id > db.userInfo.aggregate({$project:{ username:"$nickName",_id:0,age:1}}).pretty() { "age" : 18, "username" : "zhangsan" } { "age" : 20, "username" : "lisi" }
-
$match 对输入文档进行筛选
进行文档筛选 也可以理解成对数据的匹配,如获取nickname为list的文档
> db.userInfo.aggregate({$match:{ nickName:"lisi"}}); { "_id" : ObjectId("627639b1d3204ee658022f57"), "nickName" : "lisi", "age" : 20 }
筛选管道操作和其他管道操作配合时候时,尽量放到开始阶段,这样可以减少后续管道操作符要操作的文档数,提升效率,先做大量的过滤,然后在做其他的操作,如:
>db.userInfo.aggregate( [ { $match: { nickName: "lisi"}}, { $project: { _id:0, name: "$nickName", age:1}} ])
-
$limit 筛选出管道内前 N 篇文档
db.userInfo.aggregate({$limit:1});
-
$skip 跳过管道内前N篇文档
db.userInfo.aggregate({$skip:1});
-
$unwind 展开输入文档中的数组字段
# 如现有一条测试数据 > db.userInfo.find().skip(2) { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : [ "80", "IT", "BeiJing" ] } # 可以发现把测试数据中的tags等分了 > db.userInfo.aggregate({$unwind:{path:"$tags"}}) { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "80" } { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "IT" } { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "BeiJing" } # 还可以显示步长的索引值 includeArrayIndex:"arrIndex" > db.userInfo.aggregate({$unwind: {path: "$tags",includeArrayIndex:"arrIndex"}}); { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "80", "arrIndex" : NumberLong(0) } { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "IT", "arrIndex" : NumberLong(1) } { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : "BeiJing", "arrIndex" : NumberLong(2) } # 将不匹配的数据也展示出来 preserveNullAndEmptyArrays:true > db.userInfo.aggregate({$unwind: {path: "$tags",includeArrayIndex:"arrIndex",preserveNullAndEmptyArrays:true}});
-
$sort 对输入文档进行排序
#对文档进行排序: 1 正序, -1 倒序 > db.userInfo.aggregate({$sort:{age:-1}}); { "_id" : ObjectId("62763c90d3204ee658022f58"), "nickName" : "xixi", "age" : 35, "tags" : [ "80", "IT", "BeiJing" ] } { "_id" : ObjectId("627639b1d3204ee658022f57"), "nickName" : "lisi", "age" : 20 } { "_id" : ObjectId("627639b1d3204ee658022f56"), "nickName" : "zhangsan", "age" : 18 }
-
$lookup 对输入文档进行查询操作,使用单一字段值进行查询 类似mysql的联合查询
from
: 需要关联的文档localField
: 本地字段foreignField
: 外部文档关联字段as
作为新的字段,添加到文档中
# 创建新的数据 > db.account.insertMany([{_id:1,name:"zhangsan",age:19},{_id:2,name:"lisi",age:20}]); > db.account.find() { "_id" : 1, "name" : "zhangsan", "age" : 19 } { "_id" : 2, "name" : "lisi", "age" : 20 } # aid 为account的外键 > db.accountDetail.insertMany([{aid:1,address:["address1","address2"]}]); > db.accountDetail.find() { "_id" : ObjectId("62764012d3204ee658022f59"), "aid" : 1, "address" : [ "address1", "address2" ] } # 关联account表,表中_id的值和外键aid一样的数据,并把新字段命名为field1 db.accountDetail.aggregate( { $lookup: { from:"account", # 关联account的集合 localField:"aid", # 外键名字 foreignField:"_id", # 关联表中具体的字段 as: "field1" # 关联后新生成的字段名 } } ); # OUTPUT: { "_id" : ObjectId("62764012d3204ee658022f59"), "aid" : 1, "address" : [ "address1", "address2" ], "field1" : [ { "_id" : 1, "name" : "zhangsan", "age" : 19 } ] }
-
$group 对输入文档进行分组 类似mysql 的分组
# 构造数据展示 > db.sales.find() { "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") } { "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") } { "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") } { "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") } { "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:12:00Z") } { "_id" : 6, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:12:00Z") }
-
$addToSet
将分组中的元素添加到一个数组中,并且自动去重# 查看每天,卖出哪几种商品项目,按每天分组, 将商品加入到去重数组中 > db.sales.aggregate( ... [ ... { ... $group: ... { ... _id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } }, ... itemsSold: { $addToSet: "$item" } ... } ... } ... ] ... ) { "_id" : { "day" : 34, "year" : 2014 }, "itemsSold" : [ "jkl", "xyz" ] } { "_id" : { "day" : 1, "year" : 2014 }, "itemsSold" : [ "abc" ] } { "_id" : { "day" : 46, "year" : 2014 }, "itemsSold" : [ "abc", "xyz" ] }
-
$avg
返回分组中的平均值, 非数值直接忽略#根据产品来进行分组 > db.sales.aggregate( ... [ ... { ... $group: ... { ... _id: "$item", ... avgAmount: { $avg: { $multiply: [ "$price", "$quantity" ] } }, ... avgQuantity: { $avg: "$quantity" } ... } ... } ... ] ... ) { "_id" : "abc", "avgAmount" : 60, "avgQuantity" : 6 } { "_id" : "jkl", "avgAmount" : 20, "avgQuantity" : 1 } { "_id" : "xyz", "avgAmount" : 41.666666666666664, "avgQuantity" : 8.333333333333334 }
-
$first
返回分组中的第一个元素 -
$last
返回分组中的最后一个元素 -
$max
返回分组中的最大元素 -
$min
回分组中的最小元素 -
$push
创建新的数组,将值添加进去#按照条件出售了那些物品并不去重 > db.sales.aggregate( ... [ ... { ... $group: ... { ... _id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } }, ... itemsSold: { $push: { item: "$item", quantity: "$quantity" } } ... } ... } ... ] ... ) { "_id" : { "day" : 34, "year" : 2014 }, "itemsSold" : [ { "item" : "jkl", "quantity" : 1 }, { "item" : "xyz", "quantity" : 5 } ] } { "_id" : { "day" : 1, "year" : 2014 }, "itemsSold" : [ { "item" : "abc", "quantity" : 2 } ] } { "_id" : { "day" : 46, "year" : 2014 }, "itemsSold" : [ { "item" : "abc", "quantity" : 10 }, { "item" : "xyz", "quantity" : 10 }, { "item" : "xyz", "quantity" : 10 } ] }
-
$sum
求分组数值元素和
group 阶段有
100m
内存的使用限制, 默认情况下,如果超过这个限制会直接返回 error,可以通过设置allowDiskUse
为 true 来避免异常,allowDiskUse
为 true 将利用临时文件来辅助实现group操作。 -
-
$out 对管道中的文档输出/将聚合结果写入另一个文档
> db.sales.aggregate(
... [
... {
... $group:
... {
... _id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
... itemsSold: { $push: { item: "$item", quantity: "$quantity" } }
... }
... },
... { $out:"output"}
... ]
... )
> db.output.find()
{ "_id" : { "day" : 34, "year" : 2014 }, "itemsSold" : [ { "item" : "jkl", "quantity" : 1 }, { "item" : "xyz", "quantity" : 5 } ] }
{ "_id" : { "day" : 1, "year" : 2014 }, "itemsSold" : [ { "item" : "abc", "quantity" : 2 } ] }
{ "_id" : { "day" : 46, "year" : 2014 }, "itemsSold" : [ { "item" : "abc", "quantity" : 10 }, { "item" : "xyz", "quantity" : 10 }, { "item" : "xyz", "quantity" : 10 } ] }
2️⃣ MongoDB聚合优化
一下的执行计划的具体优化可以采用explain()
来查看输出内容中的winningPlan
> db.orders.explain().aggregate({ $match: { year: 2014 } }, { $match: { status: "A" } })
.....
"winningPlan" :
.....
1.投影优化
聚合管道可以确定它是否仅需要文档中的字段的子集来获得结果。如果是这样,管道将只使用那些必需的字段,减少通过管道的数据量。 就类似mysql 需要什么字段就输出什么字段
2.管道符号执行顺序优化
对于包含投影阶段($project或$unset或$addFields或$set
)后跟$match
阶段的聚合管道,MongoDB 将match阶段
3.$sort + $match
如果序列中带有$sort
后跟$match
,则$match
会移动到$sort
之前,以最大程度的减少要排序的对象的数量
4.$project/ $unset + $skip序列优化
当有一个unset之后跟有skip 会移至$project之前。
5.$limit+ $limit合并
当$limit紧接着另一个时 $limit,两个阶段可以合并为一个阶段 $limit,其中限制量为两个初始限制量中的较小者。
6.skip+ $skip 合并
当skip,这两个阶段可合并成一个单一的$skip,其中跳过量为总和的两个初始跳过量。
7.$match+ $match合并
当一个$match紧随另一个紧随其后时 $match,这两个阶段可以合并为一个单独 $match的条件 $and
{ $match: { year: 2014 } }, { $match: { status: "A" } }
优化后
{ $match: { $and: [ { "year" : 2014 }, { "status" : "A" } ] } }
评论区