macvlan
macvlan是linux kernel的模块,其功能是允许在同一个物理卡上配置多个mac地址,即多个interface, 每个insterface可以配置自己的IP。
macvlan的最大优点就是性能极好,相比其他实现,macvlan不需要创建Linux Bridge,而是通过以太interface连接到物理网络。
环境准备
1.首先需要开启macvlan网络中网卡的混杂模式
1 | ~]# ip link set eth0 promisc on |
2 | ~]# ip link show eth0 |
3 | 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 |
4 | link/ether 00:1c:42:a9:3a:a6 brd ff:ff:ff:ff:ff:ff |
2.创建macvlan网络
host1:
1 | ~]# docker network create -d macvlan --subnet 10.211.55.0/24 \ |
2 | --gateway=10.211.55.1 \ |
3 | -o parent=eth0 mac_net1 |
注意:在host2上也需要执行相同的命令,只是分配的subnet更改成不一样
- -d macvlan指定driver为macvlan
- macvlan网络为local网络,为了保证跨主机能够通信,用户需要自己管理IP subnet
- 与其他网络不通,docker不会为macvlan创建网关,这里的网关要是真实存在的,否则无法路由
- -o parent指定使用的网络interface
3.创建容器
host1:
1 | ~]# docker run -itd --name bbox1 --ip=10.211.55.50 --network mac_net1 busybox |
为了避免ip冲突,最好通过–ip指定
host2:
1 | ~]# docker run -itd --name bbox2 --ip=10.211.55.51 --network mac_net1 busybox |
4.验证连通性
1 | ~]# docker exec bbox1 ping 10.211.55.51 |
2 | PING 10.211.55.51 (10.211.55.51): 56 data bytes |
3 | 64 bytes from 10.211.55.51: seq=0 ttl=64 time=0.503 ms |
4 | 64 bytes from 10.211.55.51: seq=1 ttl=64 time=0.454 ms |
5 | 64 bytes from 10.211.55.51: seq=2 ttl=64 time=0.466 ms |
6 | 64 bytes from 10.211.55.51: seq=3 ttl=64 time=0.406 ms |
7 | ... |
docker没有为macvlan提供dns服务,所以无法使用主机名通信
网络结构
macvlan不依赖Linux Bridge,查看下容器的网络设备
host1:
1 | ~]# docker exec bbox1 ip a |
2 | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1 |
3 | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 |
4 | inet 127.0.0.1/8 scope host lo |
5 | valid_lft forever preferred_lft forever |
6 | 34: eth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue |
7 | link/ether 02:42:0a:d3:37:32 brd ff:ff:ff:ff:ff:ff |
8 | inet 10.211.55.50/24 brd 10.211.55.255 scope global eth0 |
9 | valid_lft forever preferred_lft forever |
容器只有一个 eth0,请注意 eth0 后面的 @if2,这表明该 interface 有一个对应的 interface,其全局的编号为 2。根据 macvlan 的原理,我们有理由猜测这个 interface 就是主机的 eth0,确认如下:
1 | ~]# ip a |
2 | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 |
3 | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 |
4 | inet 127.0.0.1/8 scope host lo |
5 | valid_lft forever preferred_lft forever |
6 | inet6 ::1/128 scope host |
7 | valid_lft forever preferred_lft forever |
8 | 2: eth0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 |
9 | link/ether 00:1c:42:a9:3a:a6 brd ff:ff:ff:ff:ff:ff |
10 | inet 10.211.55.17/24 brd 10.211.55.255 scope global dynamic eth0 |
11 | valid_lft 1387sec preferred_lft 1387sec |
12 | inet6 fdb2:2c26:f4e4:0:3b2b:6db8:fa6e:5c8/64 scope global noprefixroute dynamic |
13 | valid_lft 2591707sec preferred_lft 604507sec |
14 | inet6 fe80::6499:5c43:d4fa:d8d/64 scope link |
15 | valid_lft forever preferred_lft forever |
16 | 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN |
17 | link/ether 02:42:cc:94:b0:7e brd ff:ff:ff:ff:ff:ff |
18 | inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 |
19 | valid_lft forever preferred_lft forever |
可见,容器的 eth0 就是 eth0 通过 macvlan 虚拟出来的 interface。容器的 interface 直接与主机的网卡连接,这种方案使得容器无需通过 NAT 和端口映射就能与外网直接通信(只要有网关),在网络上与其他独立主机没有区别。
用sub-interface实现多macvlan网络
macvlan会独占主机的网卡,也就是一块网卡只能创建一个macvlan网络,还好macvlan不仅支持连接到interface(eth0),也支持连接到sub-interface(eth0:0)。
创建eth0:10和eth0:20
host1:
1 | ~]# ip address add 10.211.55.53 dev eth0 label eth0.10 |
2 | ~]# ip address add 10.211.55.55 dev eth0 label eth0.20 |
3 | ~]# ifconfig |
4 | docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 |
5 | inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 |
6 | ether 02:42:cc:94:b0:7e txqueuelen 0 (Ethernet) |
7 | RX packets 0 bytes 0 (0.0 B) |
8 | RX errors 0 dropped 0 overruns 0 frame 0 |
9 | TX packets 0 bytes 0 (0.0 B) |
10 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 |
11 | |
12 | eth0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
13 | inet 10.211.55.17 netmask 255.255.255.0 broadcast 10.211.55.255 |
14 | inet6 fdb2:2c26:f4e4:0:3b2b:6db8:fa6e:5c8 prefixlen 64 scopeid 0x0<global> |
15 | inet6 fe80::6499:5c43:d4fa:d8d prefixlen 64 scopeid 0x20<link> |
16 | ether 00:1c:42:a9:3a:a6 txqueuelen 1000 (Ethernet) |
17 | RX packets 198313 bytes 103000165 (98.2 MiB) |
18 | RX errors 0 dropped 0 overruns 0 frame 0 |
19 | TX packets 128036 bytes 17296662 (16.4 MiB) |
20 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 |
21 | |
22 | eth0.10: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
23 | inet 10.211.55.53 netmask 255.255.255.255 broadcast 0.0.0.0 |
24 | ether 00:1c:42:a9:3a:a6 txqueuelen 1000 (Ethernet) |
25 | |
26 | eth0.20: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
27 | inet 10.211.55.55 netmask 255.255.255.255 broadcast 0.0.0.0 |
28 | ether 00:1c:42:a9:3a:a6 txqueuelen 1000 (Ethernet) |
host2:
1 | ~]# ip address add 10.211.55.54 dev eth0 label eth0.10 |
2 | ~]# ip address add 10.211.55.56 dev eth0 label eth0.20 |
3 | ~]# ifconfig |
4 | docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 |
5 | inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255 |
6 | ether 02:42:ee:64:71:0c txqueuelen 0 (Ethernet) |
7 | RX packets 0 bytes 0 (0.0 B) |
8 | RX errors 0 dropped 0 overruns 0 frame 0 |
9 | TX packets 0 bytes 0 (0.0 B) |
10 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 |
11 | |
12 | eth0: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
13 | inet 10.211.55.18 netmask 255.255.255.0 broadcast 10.211.55.255 |
14 | inet6 fdb2:2c26:f4e4:0:d8f5:c8d9:47f4:6fbf prefixlen 64 scopeid 0x0<global> |
15 | inet6 fe80::a66a:4224:d693:4ecd prefixlen 64 scopeid 0x20<link> |
16 | ether 00:1c:42:7d:73:b1 txqueuelen 1000 (Ethernet) |
17 | RX packets 198801 bytes 103116292 (98.3 MiB) |
18 | RX errors 0 dropped 0 overruns 0 frame 0 |
19 | TX packets 128105 bytes 16995328 (16.2 MiB) |
20 | TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 |
21 | |
22 | eth0.10: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
23 | inet 10.211.55.54 netmask 255.255.255.255 broadcast 0.0.0.0 |
24 | ether 00:1c:42:7d:73:b1 txqueuelen 1000 (Ethernet) |
25 | |
26 | eth0.20: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 1500 |
27 | inet 10.211.55.56 netmask 255.255.255.255 broadcast 0.0.0.0 |
28 | ether 00:1c:42:7d:73:b1 txqueuelen 1000 (Ethernet) |
创建macvlan网络:
host1
1 | ~]# sysctl -w net.ipv4.ip_forward=1 |
2 | ~]# docker network create -d macvlan --subnet=192.168.1.0/24 --gateway=192.168.1.1 -o parent=eth0.10 mac_net10 |
3 | ~]# docker network create -d macvlan --subnet=192.168.2.0/24 --gateway=192.168.2.1 -o parent=eth0.20 mac_net20 |
4 | ~]# docker run -itd --name bbox1 --ip=192.168.1.2 --network mac_net10 busybox |
5 | ~]# docker run -itd --name bbox2 --ip=192.168.2.2 --network mac_net20 busybox |
host2:
1 | ~]# sysctl -w net.ipv4.ip_forward=1 |
2 | ~]# docker network create -d macvlan --subnet=192.168.1.0/24 --gateway=192.168.1.1 -o parent=eth0.10 mac_net10 |
3 | ~]# docker network create -d macvlan --subnet=192.168.2.0/24 --gateway=192.168.2.1 -o parent=eth0.20 mac_net20 |
4 | ~]# docker run -itd --name bbox1 --ip=192.168.1.3 --network mac_net10 busybox |
5 | ~]# docker run -itd --name bbox2 --ip=192.168.2.3 --network mac_net20 busybox |
不同的sub-interface的macvlan相互隔离,架构如下图:
不同的macvlan网络不能在二层上通信,而三层上可以通过网关将macvlan连通,设置服务作为虚拟路由器配置,设置网关并转发VLAN 10和VLAN 20的流量。
host3:
1 | ~]# ip address add 192.168.1.1 dev eth0 label eth0.10 |
2 | ~]# ip address add 192.168.2.1 dev eth0 label eth0.20 |
3 | ## 配置iptables规则,转发不通vlan的数据包 |
4 | ~]# iptables -t nat -A POSTROUTING -o eth0.10 -j MASQUERADE |
5 | ~}# iptables -t nat -A POSTROUTING -o eth0.20 -j MASQUERADE |
6 | ~]# iptables -A FORWARD -i eth0.10 -o eth0.20 -m state --state RELATED,ESTABLISHED -j ACCEPT |
7 | ~]# iptables -A FORWARD -i eth0.20 -o eth0.10 -m state --state RELATED,ESTABLISHED -j ACCEPT |
8 | ~]# iptables -A FORWARD -i eth0.10 -o eth0.20 -j ACCEPT |
9 | ~]# iptables -A FORWARD -i eth0.20 -o eth0.10 -j ACCEPT |