Ansible:CoreOS到CentOS,历时18个月

There was a custom configuration management solution.

有一个自定义的配置管理解决方案。

I would like to share the story about a project. The project used to use a custom configuration management solution. Migration lasted 18 months. You can ask me 'Why?'. There are some answers below about changing processes, agreements and workflows.

我想分享一个项目的故事。 该项目曾经使用自定义配置管理解决方案。 迁移持续了18个月。 您可以问我“为什么?”。 以下是有关更改流程,协议和工作流程的一些答案。

第№-ХХХ:开始之前 (Day № -ХХХ: Before the beginning)

Ansible:CoreOS到CentOS,历时18个月

The infrastructure looked like a bunch of standalone Hyper-V servers. In case of creating a VM we had to perform some actions:

基础架构看起来像一堆独立的Hyper-V服务器。 在创建虚拟机的情况下,我们必须执行一些操作:

  • Put the VM hard drives to a special place.

    将VM硬盘驱动器放在特殊位置。
  • Create a DNS record.

    创建一个DNS记录。
  • Create a DHCP reservation.

    创建DHCP保留。
  • Save the VM configuration to a git repo.

    将虚拟机配置保存到git repo。

It was a partially automated process. Unfortunately, we had to manage used resources & VMs locations manually. Hopefully, developers were able to change VMs configuration in the git repo, reboot VM and, as a result, get VM with the desired configuration.

这是一个部分自动化的过程。 不幸的是,我们不得不手动管理使用的资源和VM的位置。 希望开发人员能够在git repo中更改VM配置,重新启动VM,从而获得具有所需配置的VM。

定制配置管理解决方案 (Custom Configuration Management Solution)

Ansible:CoreOS到CentOS,历时18个月

I guess the original approach was to have IaC. It had to be a bunch of stateless VMs. Those VMs had reset they state after reboot. How did it look like?

我想最初的方法是拥有IaC。 它必须是一堆无状态的VM。 这些VM已在重启后重置其状态。 看起来怎么样?

  1. We create MAC address reservation.

    我们创建MAC地址保留。
  2. We mount ISO & special bootable hard drive to a VM.

    我们将ISO和特殊的可引导硬盘驱动器安装到VM。
  3. CoreOS starts OS customization: download the appropriate script(based on the IP) from a web server.

    CoreOS开始进行OS定制:从Web服务器下载适当的脚本(基于IP)。
  4. The script downloads the rest of configuration via SCP.

    该脚本通过SCP下载其余配置。
  5. The rest of the configuration is a bunch of systemd units, compose files & bash scripts.

    其余配置是一堆systemd单元,撰写文件和bash脚本。
Ansible:CoreOS到CentOS,历时18个月

There were some flaws:

有一些缺陷:

  1. ISO was CoreOS deprecated way of booting for CoreOS.

    ISO是CoreOS弃用的CoreOS引导方式。
  2. Too many manual actions.

    手动操作过多。
  3. Hard to update, tricky to maintain.

    难以更新,难以维护。
  4. Nightmare in case of installing specific kernel modules.

    安装特定内核模块时的噩梦。
  5. State full VMs instead of original approach with stateless VMs.

    声明完整的VM,而不是使用无状态VM的原始方法。
  6. From time to time people created broken dependencies across systemd unit files, as a result, CoreOS was not able to reboot without magic sys rq.

    人们不时地在systemd单位文件之间创建坏的依赖关系,结果,如果没有magic sys rq,CoreOS将无法重启。
  7. secret management.

    秘密管理。

It was possible to say that there was no CM. There was a pile of organized bash scripts & systemd unit files.

可以说没有CM。 有很多有组织的bash脚本和systemd单位文件。

第0天:好的。 我们出现了问题 (Day №0: Ok. We have a problem)

Ansible:CoreOS到CentOS,历时18个月

It was a standard environment for developing and testing: Jenkins, test environments, monitoring, registry, etc. CoreOS developes created it as a underlying OS for k8s or rancher. So, we had a problem what we used a good tool, in the wrong way. The first step was to determine the desired technologies stack. Our idea was:

它是用于开发和测试的标准环境:詹金斯(Jenkins),测试环境,监视,注册表等。CoreOS开发后将其创建为k8或Rancher的基础操作系统。 因此,我们有一个问题,就是我们以错误的方式使用了一个好的工具。 第一步是确定所需的技术堆栈。 我们的想法是:

  1. CentOS as base OS, because it was close enough to a productions environments.

    CentOS作为基本OS,因为它足够接近生产环境。

  2. Ansible for configuration management, because we had enough expertise.

    Ansible配置管理,因为我们有足够的专业知识。

  3. Jenkins as a framework for automating our workflow, agreements and process. We used it because we had it before as part release workflow.

    Jenkins作为自动化工作流程,协议和流程的框架。 我们之所以使用它,是因为我们之前已经将它作为部分发布工作流程使用。

  4. Hyper-V virtualization platform. There were some reasons out of scope. To make a short story long: we were not allowed to use public clouds & we had to use MS in our infrastructure.

    Hyper-V虚拟化平台。 有一些超出范围的原因。 长话短说:我们不允许使用公共云,而我们必须在基础架构中使用MS。

第30天:协议即代码 (Day №30: Agreements as Code)

Ansible:CoreOS到CentOS,历时18个月

The next step was to put met requirements, to establish contracts or in other words to have Agreements as Code. It had to be manual actionsmechanizationautomatization.

下一步是提出符合要求的条件,建立合同,或者换句话说,将协议作为准则 。 它必须是手动操作机械化自动化

There were some processes. Let us chat about them separately

有一些过程。 让我们分别谈论他们

1.配置虚拟机 (1. Configure VMs)

Ansible:CoreOS到CentOS,历时18个月
  1. Created a git repository.

    创建了一个git仓库。
  2. Put VMs list into inventory; Configuration into playbooks & roles.

    将虚拟机列表放入清单; 配置为剧本和角色。
  3. Configured a dedicated Jenkins slave for running Ansible playbooks.

    为运行Ansible剧本配置了专用的Jenkins从站。
  4. Created & configured Jenkins pipeline.

    创建并配置了詹金斯管道。

2.创建新的VM (2. Create new VM)

Ansible:CoreOS到CentOS,历时18个月

It was not a picnic. It was a bit tricky to create a VM at Hyper-V from Linux:

这不是野餐。 从Linux在Hyper-V创建VM有点棘手:

  1. Ansible connected via WinRM to a windows host.

    Ansible通过WinRM连接到Windows主机。
  2. Ansible ran powershell script.

    Ansible运行了powershell脚本。
  3. The PowerShell created a new VM.

    PowerShell创建了一个新的VM。
  4. The Hyper-V/ScVMM customized the VM i.e. hostname.

    Hyper-V / ScVMM自定义了VM,即主机名。
  5. VM with DHCP request sent hostname.

    具有DHCP请求的虚拟机已发送主机名。
  6. Integration ddns & DHCP on Domain Controller side configured DNS record.

    在域控制器端配置的DNS记录上集成ddns和DHCP。
  7. We added the VM into Ansible inventory & apply the configuration.

    我们将虚拟机添加到Ansible库存中并应用配置。

3.创建虚拟机模板 (3. Create VM template)

Ansible:CoreOS到CentOS,历时18个月

We decided not to reinvent the wheel and use the packer:

我们决定不重新发明轮子,而是使用打包机:

  1. Put packer config & kickstart file into the git repository.

    将打包程序的配置和kickstart文件放入git仓库。
  2. Configured Jenkins slave with hyper-v & Packer.

    使用hyper-v和Packer配置了Jenkins从属。
  3. Created & configured Jenkins pipeline.

    创建并配置了詹金斯管道。

It worked really easy:

它确实很简单:

  1. Packer created an empty VM & mounted an ISO.

    Packer创建了一个空VM,并安装了一个ISO。
  2. VM booted, Packer typed a boot command into the grub

    VM已启动,Packer在grub中输入了启动命令
  3. Grub got kickstart file from the packer web server or floppy drive provided by the packer.

    Grub从加壳程序Web服务器或加壳程序提供的软盘驱动器中获得了kickstart文件。
  4. Anaconda started with the kickstart file & installed base OS.

    Anaconda从kickstart文件开始并安装了基本操作系统。
  5. Packer was waiting for available SSH connection to the VM.

    Packer正在等待与VM的可用SSH连接。
  6. Packer ran Ansible in local mode inside the VM.

    Packer在VM内以本地模式运行Ansible。
  7. Ansible used exactly the same roles as in the use case №1.

    Ansible使用与用例№1中完全相同的角色。
  8. Packer exported the template.

    Packer导出了模板。

第№75天:重构协议且不做任何事情=测试Ansible角色 (Day №75: Refactor agreements & break nothing = Test Ansible roles)

Ansible:CoreOS到CentOS,历时18个月

Agreements as Code was not enough for us. The amount of IaC was increasing, agreements were changing.We faced a problem about how to sync our knowledge about infrastructure across the team. The solution was to test Ansible roles. You can read the article about that process Test me if you can. Do YML developers Dream of testing ansible? or more general How to test Ansible and don't go nuts.

仅仅以代码作为协议对我们来说还不够。 IaC的数量在增加,协议也在变化。我们面临着如何在团队中同步有关基础架构知识的问题。 解决方案是测试Ansible角色。 您可以阅读有关该过程的文章。如果可以 ,请测试我。 YML开发人员是否梦想过测试? 或更笼统的如何测试Ansible,不要生气

№130:Openshift是什么? 比Ansible + CentOS更好? (Day №130: What is about Openshift? is it better then Ansible + CentOS?)

As I mentioned our infrastructure was like a creature. It was alive. It was growing. It was changing. As a part of that process & development process, we had to research was it possible or not to run our application inside Openshift/k8s. It is better to read Let us deploy to openshift. Unfortunately, we were not able to re-use Openshift inside development infrastructure.

如前所述,我们的基础设施就像一个生物。 它还活着。 它正在增长。 它正在改变。 作为该过程和开发过程的一部分,我们必须研究是否可以在Openshift / k8s中运行我们的应用程序。 最好阅读 让我们部署到openshift 。 不幸的是,我们无法在开发基础架构内部重用Openshift。

第№170天:让我们尝试Windows Azure Pack (Day №170: Let us try Windows Azure Pack)

Ansible:CoreOS到CentOS,历时18个月

Hyper-V & SCVMM were not user friendly for us. There was much more interesting thing — Windows Azure Pack. It was an SCVMM extension. It looked like Windows Azure, it provided HTTP REST API. Unfortunately, in reality, it was an abandoned project. However, we spent time on research.

Hyper-V和SCVMM对我们而言不友好。 还有更多有趣的事情-Windows Azure Pack。 这是一个SCVMM扩展。 看起来像Windows Azure,它提供了HTTP REST API。 不幸的是,实际上,这是一个废弃的项目。 但是,我们花了一些时间进行研究。

第№250天:Windows Azure Pack是如此。 SCVMM是我们的选择 (Day №250: Windows Azure Pack is so so. SCVMM is our choice)

Ansible:CoreOS到CentOS,历时18个月

Windows Azure Pack looked interesting, but we decided it was too risky to use. We used SCVMM.

Windows Azure Pack看起来很有趣,但是我们认为使用它太冒险了。 我们使用了SCVMM。

№360:Ya牛剃须 (Day №360: Yak shaving)

Ansible:CoreOS到CentOS,历时18个月

As you can see, a year later we had the foundation for starting the migration. The migration had to be S.M.A.R.T.. We created the list of VMs & started yak shaving. We were dealing one by one with each old VM, create Ansible roles & cover them by tests.

如您所见,一年后,我们为开始迁移奠定了基础。 迁移必须是SMART 。 我们创建了VM列表并开始了牛剃毛。 我们正在与每个旧VM进行一对一的处理,创建Ansible角色并通过测试进行覆盖。

№450:迁移 (Day №450: Migration)

Ansible:CoreOS到CentOS,历时18个月

Migration was prune determined process. It followed the Pareto principle:

迁移是修剪确定的过程。 它遵循帕累托原则:

  • 80% of the time was spent on preparation & 20% on migration.

    80%的时间用于准备,而20%的时间用于迁移。
  • 80% VMs configuration rewriting took 20% of our time.

    80%的VM配置重写花费了我们20%的时间。

№540:吸取的教训 (Day №540: Lessons learned)

Ansible:CoreOS到CentOS,历时18个月
  1. Agreements as Code.

    协议作为代码

  2. Manual actions -> mechanization -> automatization.

    手动操作 -> 机械化 -> 自动化

链接 (Links)

It is text version of my speech at DevopsConf 2019-10-01 and SPbLUG 2019-09-25 slides.

这是我在DevopsConf 2019-10-01SPbLUG 2019-09-25 幻灯片上的演讲的文本版本。

翻译自: https://habr.com/en/post/500350/

相关文章:

  • 2021-10-15
  • 2021-11-19
  • 2022-01-17
  • 2022-12-23
  • 2021-05-31
  • 2021-04-19
  • 2022-01-09
  • 2021-11-13
猜你喜欢
  • 2021-12-12
  • 2022-12-23
  • 2022-01-12
  • 2022-12-23
  • 2021-08-16
  • 2021-11-08
  • 2021-05-31
相关资源
相似解决方案