IEEE_ThemeIssue_ReleaseEng_CD.md 17.9 KB

Surviving a Political Crisis Using Continuous Delivery

Abstract


Introduction

Many companies think of the operational aspects of Continuous Delivery (CD) and the competitive benefits that come along with it. For us, it was much more: it was a survival technique.

We have worked on a Brazilian government three-year-long project. The project started in a presidential election year and everyone involved were under Presidential re-election campaign pressure to show results. That was when, the Brazilian Federal Police started an investigation named Operation Car Wash, which revealed a huge corruption system and then, the political crisis came. As a result, leadership in government agencies could suddenly change. That reflected on the project’s requirements: each new leader wanted to fulfill their political agenda on which, unfortunately, our project was not part. In this scenario, delivery delays could have sunk the project into oblivion.

Here, we describe how we used CD to rapidly deliver software accordingly to those new agendas and putting the project on each one of them. Among CD’s benefits, the most important to us was to improve customer satisfaction and to make reliable releases. Both of them kept the project alive for another two years during the crisis.


The Context

From 2014 to 2016, our team developed the new platform for the Brazilian Public Software Portal (PSPB, Portuguese acronym). The PSPB has evolved to a Collaborative Development Environment (CDE) and this evolution has brought important benefits not just to the government , but also to society as a whole. For the government, the bureaucracy on using the same software across government agencies, duplicate works and costs all are reduced. The society gains a transparency and collaboration mechanism, since anyone can check the government expenses on software and contribute to the software communities. To achieve these goals, we have chosen to integrate many open source tools (e.g.: Gitlab, Mailman, Noosfero and Colab) rather than write everything from scratch.

During the entire project, we had to handle three distinct issues, usual in a software engineering scenario: reaching the goals which have guided the platform development, managing the diversity of team project members and communicating effectively with clients (in this particular case, government agents). Managing the interaction of these elements was not easy and the unstable Brazilian political scenario only made things worse.

To reaching the SPB project goals, we had to overcome strong political bias tied with complicated technical issues and relatively low budget. Because it is open to the public, the government representatives have seen the platform as a opportunity of marketing and have often ignored the technical advice in favor of political decisions. Furthermore, integrating a number of distinct systems to work seamlessly was not an easy job. We had to learn how each system worked and come up with ideas of how to integrate them as fast as possible, with a team of mostly inexperienced developers.

We also had to manage the diversity of the SPB team members. This team was composed of approximately 50 undergraduate students (not all simultaneously), 3 professors, 2 masters students, 2 professional designers and 6 senior developers from Free and Open Source Software (FOSS) community. Undergraduate students have received a fellowship and, for most of them, this R&D project was their first professional experience. Seniors and masters students had two important contributions to the project: transfer knowledge to undergraduate students and address hard tasks. Finally, professors were responsible for interacting with the Brazilian government and control the political pressures applied to the project.

Our third point to be handled was the communication with the group of government representatives: requirements analysts and deployment technicians. requirements analysts and deployment technicians. Requirements analysts usually tested new features, provided feedback, and reported for directors. The deployment technicians had the access to the host machines wherein SPB platform was running. They were also responsible for deploying the project, even though they never made one. They tried their best to avoid the task because they would not have our support at the end of the project and would have to take care of the entire infrastructure.

Nevertheless, besides those three elements, a new factor abruptly emerged after nine months of project: the Brazilian political and economical crisis that started in 2014 and culminated in the president’s impeachment in 2016[r].

As a result, our meetings with government representatives became tense, since leadership constantly changed their project view. Also, the developers were worried with the project instability and many of them started looking for another job. Finally, government agents were under political pressure and became resistant to our work since any mistake could impact their careers.

In that context, we realised that we needed to take control over the deployment process. We would use CD as a mean to keep the government satisfied and provide quick response times to their requests. We believed that would keep the project alive in this delicate scenario. However, this task was not simple since we did not have any DevOps strategy or CD pipeline. To solve this problem, during one month we did a movement to automate all the deploy. At the same time, we did a political movement to convince the government to follow our advice and give us partial access to the infrastructure. That motivated us to create a team dedicated to the deployment process with the mission to develop a CD pipeline that would give us confidence to meet the government’s requirements faster.


Our Continuous Delivery Pipeline

conjunto de features -> testes automatizados → criação de tag → empacotamento →deploy em homologação → testes de aceitação → deploy em produção

Figure X represents our CD pipeline. After each release, we have defined a set of features we would develop to comply with government’s requirements and then we have followed this pipeline.

Automated tests

The SPB portal is composed of a set of FOSS systems. All of them have their own automated test suite. So, we encouraged the teams to keep unit tests coverage high for each system. We implemented the systems integration using a plugin architecture and wrote integration tests to check if they were working properly. The plugin architecture made the PSPB’s components decoupled. That allowed us to be confident that changes on one system would not affect others. This characteristic allowed the test execution of only the component with changes instead of the entire system.

Our CD’s first step was to execute each system’s automated test suite. If any error was found, the pipeline stopped and the developers were notified. Only after all tests passed we move to preparing the release. Preparing a new release

Our release system was divided into two perspectives: the application and the PSPB. The application tag refers to the specific feature or bug fix and is a monotonically increasing. A new tag on any system yielded a new PSPB tag.

When all tests passed for a given component, we manually created a new application tag for it. As a consequence, that automatically created a new tag for the PSPB. Notice that we have forks of the original softwares and, as consequence, we had different tag values. Packaging

The PSPB platform is running under the CentOS GNU/Linux distribution. Basically, packaging a software for that distribution has three steps: (1) write the script for the specific environment (RPM), (2) build the package, and (3) upload it to a package repository. We chose to package our components for several reasons:

  • Not all software was packaged by the community;
  • And those that existed were outdated;
  • Packaging makes it easy to manage the software on a given distribution;
  • It simplifies the deployment;
  • Packaging follows the distribution’s best practices and,
  • Allows configurations and permissions control.

After creating a new tag for one component, the DevOps team was notified and packaging process began. In the normal case, the three packaging steps aforementioned are fully automated by a set of scripts. However, if the team reports to DevOps any eventual dependency change, the first step has to be manual. For instance, one system could start requiring another system to be initialized and that made necessary for someone to manually update the package script.

After all these scripts have run successfully, the new packages would be ready to use by our subsequent deployment scripts.

Validation Environment

The Validation Environment (VE) is a replica of the Production Environment (PE), with two exceptions: only the government officers and us had access to it and all the data is anonymised. To configure the environment, we use a configuration management tool. That maintained environment consistency which makes the deployment process simpler. Additionally, the packages we built on the last step were readily available to use by the management tool.

The VE was used by the government agents to validate new features and required changes. Also, the VE was useful to verify the integrity of the entire portal as part of the next step in the pipeline. Acceptance Tests

After we completely deploy a new PSPB version in the VE, the government agents are responsible for checking features and/or bug fixes required by them. If the technicians identify a problem, they notify the developers, the problems are fixed and the pipeline restarts from scratch. If everything is validated, we move to production deployment. Production Deployment

After the government authorizes the VE, we can finally begin the production deployment. We use the same configuration management tool with the same scripts. After the deploy is completed, both VE and PE are on the same state. The new features and bug fixes are finally available to end users.


Benefits

We had to handle many tensions between development and political issues. Our CD pipeline gave us strong mechanisms to tackle most of the problems. As a result we came with some benefits from our decision to adopt CD.

Response to tensions

The direct benefit from the CD pipeline was the fast response to the changes required by the government. That was vital for the project’s renewal over the years. We could manage the tension between the government and the development team better. Every meeting with the government leader was delicate and resulted on many new requirements, most of them motivated by political needs. For example, once it was demanded a completely layout change because one director suddenly decided to make a marketing campaign about the portal. They would use undelivered requirements as a means to suggest the project’s cancellation. We believed that if we took too long to attend their demands, the project would end. CD helped us to move fast on deploying to production, even of smaller parts of the requirements. That way, we always had something to show on the meetings, reducing their eagerness to end the project. For our team, it made the developers more confident the project would last a little longer and they would not go looking for another jobs.

Build client’s trust

After we established the CD, the government agents started to be more confident in our work. First, because they noticed that each new deploy made by us in the VE was stable and reliable. Second, they could see new features fast since we constantly updated the VE based on their feedback. This made our relation strong and in moments that needed quick action they would rather give us access to production.


Challenges

We successfully built a functional CD pipeline. In the end, we took over the deployment process from the government. That allowed us to survive into an unstable political scenario. However, we recognized that many challenges still need to be addressed by the industry and academia together. Build CD from scratch

Taking on CD responsibilities had a significant impact on the team. We did not have the know-how and had little time to come up with a working pipeline. To make things worse, we were not aware of how companies normally organized their teams to make CD feasible.

The seniors were crucial at this point. They came up with an initial solution to get us started. That already enabled us to automatize the deploy, even though the process was still rudimentary. We had to evolve our solution on-the-fly. We dedicated a few developers to this task.

Handling inexperienced teams

After the developers learned how CD worked, it was difficult to pass the knowledge along to other teammates. We tried to mitigate this by encouraging a member's migration to the DevOps team. Further research on how to effectively spread knowledge across inexperienced developers in a scenario with a high turnover are needed.

Building trust

In the project’s first half we struggled with deploy related problems in the government structure. We were in a paradoxical situation. The government demanded speedy deliveries but would not give access to their production infrastructure. As an example, only in a very specific situation the government allowed us to access the PE. After some interactions with the government we convinced them to create the VE as an isolated replica of the PE in their own infrastructure. The government agents then realized that it could be good for the project if they granted us access to part of the structure since we could deliver new features to them faster. We believe it is required more research on development protocols and policies to improve the relation between industry and government, specially regarding CD.


Sobre a chamada

Release Engineering 3.0 – Call for Papers: https://www.computer.org/software-magazine/2016/12/14/release-engineering-3-0-call-for-papers/ Orientações: https://www.computer.org/web/peer-review/magazines

TODO

  • [ ] Manter distância de qualquer análise política ou opinião. Relatar os fatos que podem ser facilmente verificados
  • [x] Lembrar que o SPB também tinha uma pegada social, uma vez que produzia bolsas para mais de 50 alunos. Podíamos ter acatado o fim do projeto, mas resistimos pelo alunos e pelo fato de que acreditávamos que o nosso trabalho era útil para a sociedade. Abortado por não fazer sentido
  • [ ] Procurar uma referência meio fortinho em inglês que indique a instabilidade política do Brasil
  • [ ] Lembrar que da introdução tem que ser uma história bem contada e conectada para mostrar como o CD deliver nos ajudou
  • [x] Precisamos mostrar que levou um tempo para se adaptar e que também tínhamos um forte trabalho em passar o conhecimento. feito
  • [x] Faz sentido falar sobre a infra de homologação e prod? SIM, falamos
  • [ ] Faz sentido falar das dificuldades em testar o deploy? (Meu computador não tinha capacidade suficiente pra rodar o deploy completo, por exemplo)
  • [ ] Faz sentido falar do shak mesmo sem um artigo para referenciar?
  • [ ] A argumentação sobre termos assumido o deploy está muito centralizada na política. Não tínhamos motivação sobre a qualidade do código?
  • [ ] Ao revisar, verificar a consistência do uso dos tempos verbais
  • https://www.forbes.com/search/?q=car%20wash%20brazil#10d6d8bd279f
  • Talvez adicionar esse trecho na parte de challenges
    • Estabelecer cultura devops: Mania de alterar coisas em VE na mão
  • [ ] Os benefícios estão fracos. Precisamos melhorar eles. Segue algumas ideias que sobraram.
    1. Adaptamos o escopo do projeto seguindo a demanda do mpog
    2. Adaptamos a forma de interação com o ministério
    3. Conseguimos uma boa sintonia entre o time
  • Bom link: https://www.linux.com/blog/learn/chapter/dev-ops/2017/7/devops-fundamentals-part-3-continuous-delivery-and-deployment