The final step in the SDLC, and arguably the most crucial, is the testing, deployment, and maintenance of development environments and applications. DZone's category for these SDLC stages serves as the pinnacle of application planning, design, and coding. The Zones in this category offer invaluable insights to help developers test, observe, deliver, deploy, and maintain their development and production environments.
In the SDLC, deployment is the final lever that must be pulled to make an application or system ready for use. Whether it's a bug fix or new release, the deployment phase is the culminating event to see how something works in production. This Zone covers resources on all developers’ deployment necessities, including configuration management, pull requests, version control, package managers, and more.
The cultural movement that is DevOps — which, in short, encourages close collaboration among developers, IT operations, and system admins — also encompasses a set of tools, techniques, and practices. As part of DevOps, the CI/CD process incorporates automation into the SDLC, allowing teams to integrate and deliver incremental changes iteratively and at a quicker pace. Together, these human- and technology-oriented elements enable smooth, fast, and quality software releases. This Zone is your go-to source on all things DevOps and CI/CD (end to end!).
A developer's work is never truly finished once a feature or change is deployed. There is always a need for constant maintenance to ensure that a product or application continues to run as it should and is configured to scale. This Zone focuses on all your maintenance must-haves — from ensuring that your infrastructure is set up to manage various loads and improving software and data quality to tackling incident management, quality assurance, and more.
Modern systems span numerous architectures and technologies and are becoming exponentially more modular, dynamic, and distributed in nature. These complexities also pose new challenges for developers and SRE teams that are charged with ensuring the availability, reliability, and successful performance of their systems and infrastructure. Here, you will find resources about the tools, skills, and practices to implement for a strategic, holistic approach to system-wide observability and application monitoring.
The Testing, Tools, and Frameworks Zone encapsulates one of the final stages of the SDLC as it ensures that your application and/or environment is ready for deployment. From walking you through the tools and frameworks tailored to your specific development needs to leveraging testing practices to evaluate and verify that your product or application does what it is required to do, this Zone covers everything you need to set yourself up for success.
Kubernetes in the Enterprise
In 2022, Kubernetes has become a central component for containerized applications. And it is nowhere near its peak. In fact, based on our research, 94 percent of survey respondents believe that Kubernetes will be a bigger part of their system design over the next two to three years. With the expectations of Kubernetes becoming more entrenched into systems, what do the adoption and deployment methods look like compared to previous years?DZone's Kubernetes in the Enterprise Trend Report provides insights into how developers are leveraging Kubernetes in their organizations. It focuses on the evolution of Kubernetes beyond container orchestration, advancements in Kubernetes observability, Kubernetes in AI and ML, and more. Our goal for this Trend Report is to help inspire developers to leverage Kubernetes in their own organizations.
Scaling Up With Kubernetes
Getting Started With OpenTelemetry
Docker is a compelling platform to package and run web applications, especially when paired with one of the many Platform-as-a-Service (PaaS) offerings provided by cloud platforms. NGINX has long provided DevOps teams with the ability to host web applications on Linux and also provides an official Docker image to use as the base for custom web applications. In this post, I explain how DevOps teams can use the NGINX Docker image to build and run web applications on Docker. Getting Started With the Base Image NGINX is a versatile tool with many uses, including a load balancer, reverse proxy, and network cache. However, when running NGINX in a Docker container, most of these high-level functions are delegated to other specialized platforms or other instances of NGINX. Typically, NGINX fulfills the function of a web server when running in a Docker container. To create an NGINX container with the default website, run the following command: docker run -p 8080:80 nginx This command will download the nginx image (if it hasn't already been downloaded) and create a container exposing port 80 in the container to port 8080 on the host machine. You can then open http://localhost:8080/index.html to view the default "Welcome to nginx!" website. To allow the NGINX container to expose custom web assets, you can mount a local directory inside the Docker container. Save the following HTML code to a file called index.html: <html> <body> Hello from Octopus! </body> </html> Next, run the following command to mount the current directory under /usr/share/nginx/html inside the NGINX container with read-only access: docker run -v $(pwd):/usr/share/nginx/html:ro -p 8080:80 nginx Open http://localhost:8080/index.html again and you see the custom HTML page displayed. One of the benefits of Docker images is the ability to bundle all related files into a single distributable artifact. To realize this benefit, you must create a new Docker image based on the NGINX image. Creating Custom Images Based on NGINX To create your own Docker image, save the following text to a file called Dockerfile: FROM nginx COPY index.html /usr/share/nginx/html/index.html Dockerfile contains instructions for building a custom Docker image. Here you use the FROM command to base your image on the NGINX one, and then use the COPY command to copy your index.html file into the new image under the /usr/share/nginx/html directory. Build the new image with the command: docker build . -t mynginx This builds a new image called mynginx. Run the new image with the command: docker run -p 8080:80 mynginx Note that you didn't mount any directories this time. However, when you open http://localhost:8080/index.html your custom HTML page is displayed because it was embedded in your custom image. NGINX is capable of much more than hosting static files. To unlock this functionality, you must use custom NGINX configuration files. Advanced NGINX Configuration NGINX exposes its functionality via configuration files. The default NGINX image comes with a simple default configuration file designed to host static web content. This file is located at /etc/nginx/nginx.conf in the default image, and has the following contents: user nginx; worker_processes auto; error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; #tcp_nopush on; keepalive_timeout 65; #gzip on; include /etc/nginx/conf.d/*.conf; } There's no need to understand this configuration file in detail, but there is one line of interest that instructs NGINX to load additional configuration files from the /etc/nginx/conf.d directory: include /etc/nginx/conf.d/*.conf; The default /etc/nginx/conf.d file configures NGINX to function as a web server. Specifically the location / block-loading files from /usr/share/nginx/html is why you mounted your HTML files to that directory previously: server { listen 80; server_name localhost; #access_log /var/log/nginx/host.access.log main; location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } You can take advantage of the instructions to load any *.conf configuration files in /etc/nginx to customize NGINX. In this example, you add a health check via a custom location listening on port 90 that responds to requests to the /nginx-health path with an HTTP 200 OK. Save the following text to a file called health-check.conf: server { listen 90; server_name localhost; location /nginx-health { return 200 "healthy\n"; add_header Content-Type text/plain; } } Modify the Dockerfile to copy the configuration file to /etc/nginx/conf.d: FROM nginx COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build the image with the command: docker build . -t mynginx Run the new image with the command. Note the new port exposed on 9090: docker run -p 8080:80 -p 9090:90 mynginx Now open http://localhost:9090/nginx-health. The health check response is returned to indicate that the web server is up and running. The examples above base your custom images on the default nginx image. However, there are other variants that provide much smaller image sizes without sacrificing any functionality. Choosing NGINX Variants The default nginx image is based on Debian. However, NGINX also provides images based on Alpine. Alpine is frequently used as a lightweight base for Docker images. To view the sizes of Docker images, they must first be pulled down to your local workstation: docker pull nginx docker pull nginx:stable-alpine You can then find the image sizes with the command: docker image ls From this, you can see the Debian image weighs around 140 MB while the Alpine image weighs around 24 MB. This is quite a saving in image sizes. To base your images on the Alpine variant, you need to update the Dockerfile: FROM nginx:stable-alpine COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build and run the image with the commands: docker build . -t mynginx docker run -p 8080:80 -p 9090:90 mynginx Once again, open http://localhost:9090/nginx-health or http://localhost:8080/index.html to view the web pages. Everything continues to work as it did previously, but your custom image is now much smaller. Conclusion NGINX is a powerful web server, and the official NGINX Docker image allows DevOps teams to host custom web applications in Docker. NGINX also supports advanced scenarios thanks to its ability to read configuration files copied into a custom Docker image. In this post, you learned how to create a custom Docker image hosting a static web application, added advanced NGINX configuration files to provide a health check endpoint, and compared the sizes of Debian and Alpine NGINX images. Resources NGINX Docker image source code Dockerfile reference Happy deployments!
Delivering new features and updates to users without causing disruptions or downtime is a crucial challenge in the quick-paced world of software development. This is where the blue-green deployment strategy is useful. Organizations can roll out new versions of their software in a secure and effective way by using the release management strategy known as “blue-green deployment.” Organizations strive for quick and dependable deployment of new features and updates in the fast-paced world of software development. Rolling out changes, however, can be a difficult task because there is a chance that it will introduce bugs or result in downtime. An answer to this problem can be found in the DevOps movement’s popular blue-green deployment strategy. Blue-green deployment enables uninterrupted software delivery with little interruption by utilizing parallel environments and careful traffic routing. In this article, we will explore the principles, benefits, and best practices of blue-green deployment, shedding light on how it can empower organizations to release software with confidence. In this article, we will explore the concept of blue-green deployment, its benefits, and how it can revolutionize the software development process. Understanding Blue-Green Deployment In order to reduce risks and downtime when releasing new versions or updates of an application, blue-green deployment is a software deployment strategy. It entails running two parallel instances of the same production environment, with the “blue” environment serving as a representation of the current stable version and the “green” environment. With this configuration, switching between the two environments can be done without upsetting end users. without disrupting end-users. The fundamental idea behind blue-green deployment is to automatically route user traffic to the blue environment to protect the production system's stability and dependability. Developers and QA teams can validate the new version while the green environment is being set up and thoroughly tested before it is made available to end users. The deployment process typically involves the following steps: Initial Deployment: The blue environment is the initial production environment running the stable version of the application. Users access the application through this environment, and it serves as the baseline for comparison with the updated version. Update Deployment: The updated version of the application is deployed to the green environment, which mirrors the blue environment in terms of infrastructure, configuration, and data. The green environment remains isolated from user traffic initially. Testing and Validation: The green environment is thoroughly tested to ensure that the updated version functions correctly and meets the desired quality standards. This includes running automated tests, performing integration tests, and potentially conducting user acceptance testing or canary releases. Traffic Switching: Once the green environment passes all the necessary tests and validations, the traffic routing mechanism is adjusted to start directing user traffic from the blue environment to the green environment. This switch can be accomplished using various techniques such as DNS changes, load balancer configuration updates, or reverse proxy settings. Monitoring and Verification: Throughout the deployment process, both the blue and green environments are monitored to detect any issues or anomalies. Monitoring tools and observability practices help identify performance problems, errors, or inconsistencies in real-time. This ensures the health and stability of the application in a green environment. Rollback and Cleanup: In the event of unexpected issues or unsatisfactory results, a rollback strategy can be employed to switch the traffic back to the blue environment, reverting to the stable version. Additionally, any resources or changes made in the green environment during the deployment process may need to be cleaned up or reverted. The advantages of blue-green deployment are numerous. By maintaining parallel environments, organizations can significantly reduce downtime during deployments. They can also mitigate risks by thoroughly testing the updated version before exposing it to users, allowing for quick rollbacks if issues arise. Blue-green deployment also supports scalability testing, continuous delivery practices, and experimentation with new features. Overall, blue-green deployment is a valuable approach for organizations seeking to achieve seamless software updates, minimize user disruption, and ensure a reliable and efficient deployment process. Benefits of Blue-Green Deployment Blue-green deployment offers several significant benefits for organizations looking to deploy software updates with confidence and minimize the impact on users. Here are the key benefits of implementing blue-green deployment: Minimized Downtime: Blue-green deployment significantly reduces downtime during the deployment process. By maintaining parallel environments, organizations can prepare and test the updated version (green environment) alongside the existing stable version (blue environment). Once the green environment is deemed stable and ready, the switch from blue to green can be accomplished seamlessly, resulting in minimal or no downtime for end-users. Rollback Capability: Blue-green deployment provides the ability to roll back quickly to the previous version (blue environment) if issues arise after the deployment. In the event of unforeseen problems or performance degradation in the green environment, organizations can redirect traffic back to the blue environment, ensuring a swift return to a stable state without impacting users. Risk Mitigation: With blue-green deployment, organizations can mitigate the risk of introducing bugs, errors, or performance issues to end-users. By maintaining two identical environments, the green environment can undergo thorough testing, validation, and user acceptance testing before directing live traffic to it. This mitigates the risk of impacting users with faulty or unstable software and increases overall confidence in the deployment process. Scalability and Load Testing: Blue-green deployment facilitates load testing and scalability validation in the green environment without affecting production users. Organizations can simulate real-world traffic and user loads in the green environment to evaluate the performance, scalability, and capacity of the updated version. This helps identify potential bottlenecks or scalability issues before exposing them to the entire user base, ensuring a smoother user experience. Continuous Delivery and Continuous Integration: Blue-green deployment aligns well with continuous delivery and continuous integration (CI/CD) practices. By automating the deployment pipeline and integrating it with version control and automated testing, organizations can achieve a seamless and streamlined delivery process. CI/CD practices enable faster and more frequent releases, reducing time-to-market for new features and updates. Flexibility for Testing and Experimentation: Blue-green deployment provides a controlled environment for testing and experimentation. Organizations can use the green environment to test new features, conduct A/B testing, or gather user feedback before fully rolling out changes. This allows for data-driven decision-making and the ability to iterate and improve software based on user input. Improved Reliability and Fault Tolerance: By maintaining two separate environments, blue-green deployment enhances reliability and fault tolerance. In the event of infrastructure or environment failures in one of the environments, the other environment can continue to handle user traffic seamlessly. This redundancy ensures that the overall system remains available and minimizes the impact of failures on users. Implementing Blue-Green Deployment To successfully implement blue-green deployment, organizations need to follow a series of steps and considerations. The process involves setting up parallel environments, managing infrastructure, automating deployment pipelines, and establishing efficient traffic routing mechanisms. Here is a step-by-step guide on how to implement blue-green deployment effectively: Duplicate Infrastructure: Duplicate the infrastructure required to support the application in both the blue and green environments. This includes servers, databases, storage, and any other components necessary for the application’s functionality. Ensure that the environments are identical to minimize compatibility issues. Automate Deployment: Implement automated deployment pipelines to ensure consistent and repeatable deployments. Automation tools such as Jenkins, Travis CI, or GitLab CI/CD can help automate the deployment process. Create a pipeline that includes steps for building, testing, and deploying the application to both the blue and green environments. Version Control and Tagging: Adopt proper version control practices to manage different releases effectively. Use a version control system like Git to track changes and create clear tags or branches for each environment. This helps in identifying and managing the blue and green versions of the software. Automated Testing: Implement comprehensive automated testing to validate the functionality and stability of the green environment before routing traffic to it. Include unit tests, integration tests, and end-to-end tests in your testing suite. Automated tests help catch issues early in the deployment process and ensure a higher level of confidence in the green environment. Traffic Routing Mechanisms: Choose appropriate traffic routing mechanisms to direct user traffic between the blue and green environments. Popular options include DNS switching, reverse proxies, or load balancers. Configure the routing mechanism to gradually shift traffic from the blue environment to the green environment, allowing for a controlled transition. Monitoring and Observability: Implement robust monitoring and observability practices to gain visibility into the performance and health of both environments. Monitor key metrics, logs, and user feedback to detect any anomalies or issues. Utilize monitoring tools like Prometheus, Grafana, or ELK Stack to ensure real-time visibility into the system. Incremental Rollout: Adopt an incremental rollout approach to minimize risks and ensure a smoother transition. Gradually increase the percentage of traffic routed to the green environment while monitoring the impact and collecting feedback. This allows for early detection of issues and quick response before affecting the entire user base. Rollback Strategy: Have a well-defined rollback strategy in place to revert back to the stable blue environment if issues arise in the green environment. This includes updating the traffic routing mechanism to redirect traffic back to the blue environment. Ensure that the rollback process is well-documented and can be executed quickly to minimize downtime. Continuous Improvement: Regularly review and improve your blue-green deployment process. Collect feedback from the deployment team, users, and stakeholders to identify areas for enhancement. Analyze metrics and data to optimize the deployment pipeline, automate more processes, and enhance the overall efficiency and reliability of the blue-green deployment strategy. By following these implementation steps and considering key aspects such as infrastructure duplication, automation, version control, testing, traffic routing, monitoring, and continuous improvement, organizations can successfully implement blue-green deployment. This approach allows for seamless software updates, minimized downtime, and the ability to roll back if necessary, providing a robust and efficient deployment strategy. Best Practices for Blue-Green Deployment Blue-green deployment is a powerful strategy for seamless software delivery and minimizing risks during the deployment process. To make the most of this approach, consider the following best practices: Version Control and Tagging: Implement proper version control practices to manage different releases effectively. Clearly label and tag the blue and green environments to ensure easy identification and tracking of each version. This helps in maintaining a clear distinction between the stable and updated versions of the software. Automated Deployment and Testing: Leverage automation for deployment pipelines to ensure consistent and repeatable deployments. Automation helps streamline the process and reduces the chances of human error. Implement automated testing at different levels, including unit tests, integration tests, and end-to-end tests. Automated testing helps verify the functionality and stability of the green environment before routing traffic to it. Infrastructure Duplication: Duplicate the infrastructure and set up identical environments for blue and green. This includes replicating servers, databases, and any other dependencies required for the application. Keeping the environments as similar as possible ensures a smooth transition without compatibility issues. Traffic Routing Mechanisms: Choose appropriate traffic routing mechanisms to direct user traffic from the blue environment to the green environment seamlessly. Popular techniques include DNS switching, reverse proxies, or load balancers. Carefully configure and test these mechanisms to ensure they handle traffic routing accurately and efficiently. Incremental Rollout: Consider adopting an incremental rollout approach rather than switching all traffic from blue to green at once. Gradually increase the percentage of traffic routed to the green environment while closely monitoring the impact. This allows for real-time feedback and rapid response to any issues that may arise, minimizing the impact on users. Canary Releases: Implement canary releases by deploying the new version to a subset of users or a specific geographic region before rolling it out to the entire user base. Canary releases allow you to collect valuable feedback and perform additional validation in a controlled environment. This approach helps mitigate risks and ensures a smoother transition to the updated version. Rollback Strategy: Always have a well-defined rollback strategy in place. Despite thorough testing and validation, issues may still occur after the deployment. Having a rollback plan ready allows you to quickly revert to the stable blue environment if necessary. This ensures minimal disruption to users and maintains the continuity of service. Monitoring and Observability: Implement comprehensive monitoring and observability practices to gain visibility into the performance and health of both the blue and green environments. Monitor key metrics, logs, and user feedback to identify any anomalies or issues. This allows for proactive detection and resolution of problems, enhancing the overall reliability of the deployment process. By following these best practices, organizations can effectively leverage blue-green deployment to achieve rapid and reliable software delivery. The careful implementation of version control, automation, traffic routing, and monitoring ensures a seamless transition between different versions while minimizing the impact on users and mitigating risks. Conclusion Deploying software in a blue-green fashion is a potent method for ensuring smooth and dependable releases. Organizations can minimize risks, cut down on downtime, and boost confidence in their new releases by maintaining two parallel environments and converting user traffic gradually. This method enables thorough testing, validation, and scalability evaluation and perfectly complies with the continuous delivery principles. Adopting blue-green deployment as the software development landscape changes can be a game-changer for businesses looking to offer their users top-notch experiences while maintaining a high level of reliability. Organizations can use the effective blue-green deployment strategy to deliver software updates with confidence. This method allows teams to seamlessly release new features and updates by reducing downtime, providing rollback capabilities, and reducing risks. Organizations can use blue-green deployment to achieve quicker and more reliable software delivery if the appropriate infrastructure is set up, deployment pipelines are automated, and traffic routing mechanisms are effective. Organizations can fully utilize blue-green deployment by implementing the recommended best practices discussed in this article. This will guarantee a positive user experience while lowering the risk of deployment-related disruptions. In conclusion, blue-green deployment has a lot of advantages, such as decreased downtime, rollback capability, risk reduction, scalability testing, alignment with CI/CD practices, flexibility for testing and experimentation, and increased reliability. Organizations can accomplish seamless software delivery, boost deployment confidence, and improve user experience throughout the deployment process by utilizing parallel environments and careful traffic routing.
Continuous integration and continuous delivery's recommendation of deployment automation is hugely important for large organizations with complex software. That's especially true for multi-tenancy software delivered with tenanted deployments. But what if we didn't have deployment automation? How would tenanted deployments even work? In this article, we ask what tenanted deployments to physical branches look like using manual processes. A Common Tenanted Deployment Scenario for Physical Locations You work for a large financial institution in the United Kingdom that has over 200 branches all over the country. Each branch has its own server and several computers that tellers use to help customers. You must regularly deploy updates to local servers at all branches for the company's bespoke finance software. How the Deployments Would Work Without Automation There are a couple of options for delivering software like this without automation. Travel to Each Location As with our global deployments scenario, traveling to each branch to do updates is the easiest way to ensure things go smoothly. Here, where our fictional company operates only in the UK, you wouldn't need to pay airfare, and your team could drive or take the train to branches instead. Similar organizations in larger countries like the US or Australia wouldn't have that luxury, however. Even if your team drives to your branches, travel would still cost time and money. Most organizations set up like this have the financial burden of fleet cars or fuel and travel reimbursement for staff. As for time, teams would waste days and months moving around the country to do updates. For an added wrinkle, if your business went global, you'd end up with the combined pain points of 2 of our 3 scenarios. Software updates could become "Planes, Trains, and Automobiles"-style ordeals but without Steve Martin and John Candy's comedic timing. Have Branch Staff Run the Updates Again, a bit like our global deployments scenario, but here you'd have a branch staff member run the update for you. It might sound strange to let an on-site, non-technical colleague run updates, but I worked at 2 organizations that did this. One of those only a few years ago. And yes, we'd mail out a disc rather than use better, more secure options. The problem is that things can and will go wrong no matter how simple you make the install or how you get files to branches. Updating software like this introduces a bunch of risky variables and room for error. Maybe there's a problem with the update itself. Maybe the instructions aren't clear for the non-technical. Maybe the last manual update missed some important prerequisites for new releases. When there are problems, it's not a branch staff member's job to fix them. That means some of the time saved by not traveling may get spent troubleshooting over the phone or, much worse, moving the time cost to your business's service desk. Whichever way you cut it, if there's a problem, someone's time is getting wasted. Remote Installation In this scenario, you're deploying updates to your own organization's hardware, so remote access is the easiest and most sensible way to go. However, that might not be an option if you're delivering software to another organization's branches instead. Remote updates would solve many deployment problems when dealing with brick-and-mortar locations, though. You can oversee the updates to ensure every step happens, that hardware's configured right, and that installs complete successfully. You can even run tests. Though gaining access becomes quicker and easier, the process may not. Without deployment automation, you still have the problem of manual processes at scale. We have over 200 branches to serve in the scenario we set at the top. That's a lot of tiring, repetitive work for a manual process, even if you divide the number of installs among a group of people. Okay, But What About Scripts? As with the other posts in this series, I think scripts count as deployment automation. We'll cover them anyway because there are potential and avoidable risks, even with scripting. That's largely because scripts can change depending on a given update and might not behave the same for every target. Like the other manual deployment methods, you must think about scale when scripting, too. Experts could manage all branch deployments in one script, which is less work but can cause problems if any locations hit setbacks. Equally, a script per branch is a safer route and makes it easier to track failures, but that's more time-consuming — it's a lot of scripts to copy or write. Writing a new script for every deployment means a huge chance of human error. You could make typos, get your syntax wrong, or forget to add a key step. Reusing older scripts may feel safer, but you risk carrying unknown errors forward — errors that could cause problems way down the line. The Result Without deployment automation, whatever update strategy you pick will result in: Much slower software delivery: Bad for bug or vulnerability fixing and delivering new features A higher risk of technical problems: Bad for customer relations And in this scenario, where updates can impact your frontline colleagues and your organization's customers, you can't really afford either. Thankfully, deployment automation does exist. Happy deployments!
Unused code adds time and burden to maintaining the codebase, and removing it is the only cure for this side of “more cowbell.” Unfortunately, it’s not always obvious whether developers can remove certain code without breaking the application. As the codebase becomes cluttered and unwieldy, development teams can become mired in mystery code that slows development and lowers morale. Do you remember the first time you walked into your garage, empty and sparkling, yawning with the promise of protecting your vehicles and power tools? How did it look the last time you walked in? If you’re like many of us, the clutter of long-closed boxes taunts you every time you walk around them, losing precious minutes before you can get to the objects you need while your car sits in the driveway. Sadly, development teams have a similar problem with their source code, which has grown into a cluttered mess. Over the last few months, I’ve been working on a way to help development teams maintain less code. Everything we normally read is about working with new frameworks, new tools, and new techniques — but one thing many of us ignore is improving velocity by simply getting rid of things we no longer need. Essentially, as it runs, the JVM streams off its first-call method invocation log to a central location to track "have we used this method recently." When the method appears in the code inventory, the answer is yes — if the method does not appear, then it becomes a candidate for removal of that unused code. Dead Code Removal If you’re a senior developer helping new teammates, consider the work it takes to onboard new members and for them to learn your codebase. Each time they change something, they scroll past methods. Although our IDEs and analyzers can identify fully dead code, the frustration point is code that looks alive but just isn’t used. Often, these are public methods or classes that just aren’t called or have commented/modified annotations. As I’ve talked to teams about the idea that we hoard unused code, I’ve heard comments like these: “I don’t know what this code does, so I don’t want to get rid of it, but I would love to.” "I could clean that up, but I have other priority issues and don’t have time for that." “We never prioritize clean up. We just do new features.” What if Java developers had an easier way to identify dead code for removal — a way where we could prioritize code cleanup during our sprints to reduce technical debt without taking time away from business needs to add features? Code removal is complex and generally takes a back seat to new features. Over time, code becomes unused as teams refactor without removal: commenting on an annotation, changing a path, or moving functionality. Most senior engineers would have to allocate time in their sprints to find what to remove: evaluating missing log statements or reviewing code with static analyzers. Both are problematic from a time perspective, so many teams just leave it in the code repository, active but dead: a problem for a future team lead or delayed until the next big rewrite. The JVM, however, has an overlooked capability to identify dead code and simplify the prioritization problem. By re-purposing the bytecode interpreter, the JVM can identify when a method is first called per execution. When tracked in a central location, these logs produce a treasure map you can follow to remove dead code. reducing the overall cognitive burden and improving team velocity. If a method hasn’t run in a year, you can probably remove it. Team leads can then take classes and methods that haven’t been executed and remove that code either at one time or throughout several sprints. Why remove unused code at all? For many groups, updating libraries and major Java versions requires touching a lot of code. Between Java 8 and Java 17, the XML libraries were deprecated and removed — as you port your application, do you still use all that XML processing? Instead of touching the code and all associated unit tests, what if you could get rid of that code and remove the test? If the code doesn’t run, team members shouldn’t spend hours changing the code and updating tests to pass: removing the dead code is faster and reduces the mental complexity of figuring that code out. Similar situations arise from updates to major frameworks like Spring, iText, and so on. Imagine you paid your neighbor’s kids to mow your lawn with your mower, and it was hidden behind a wall of boxes, expired batteries, old clothes, and old electronics. How hard do you think they would try to navigate around your junk before they gave up and went home? Senior engineers are doing the same thing. What should be an hour’s work of mowing becomes two hours. The problem of cluttered and unused code also affects teams working on decomposing a monolith or re-architecting for the cloud. Without a full measurement of what code is still used, teams end up breaking out huge microservices that are difficult to manage because they include many unnecessary pieces brought out of the monolith. Instead of producing the desired streamlined suite of microservices, these re-architecture projects take longer, cost more, and feel like they need to be rewritten right away because the clutter the team was trying to avoid was never removed. Difficulties stick with the project until teams can decrease the maintenance burden: removing unnecessary code is a rapid way to decrease that burden. Instead of clamoring for a rewrite, reduce the maintenance burden to tidy up what you have. The Benefits of Tracking Used/Unused Code The distinguishing benefit of tracking life vs. unused code from the JVM is that teams can gather data from production applications without impacting performance. The JVM knows when a method is first called, and logging it doesn’t add any measurable overhead. This way, teams that aren’t sure about the robustness of their test environments can rely on the result. A similar experience exists for projects that have had different levels of test-driven development over their lifetime. Changing a tiny amount of code could result in several hours of test refactoring to make tests pass and get that green bar. I’ve seen many projects where the unit tests were the only thing that used the code. Removing the code and the unnecessary tests was more satisfying than updating all the code to the newer library just to get a green bar. The best way of identifying unused code for removal is to passively track what code runs. Instead of figuring it out manually or taking time from sprints, tune your JVM to record the first invocation of each method. It’s like a map of your unused boxes next to your automatic garage door opener. Later on, during sprints or standard work, run a script to compare your code against the list to see what classes and methods never ran. While the team works to build new features and handle normal development, start removing code that never ran. Perform your standard tests – if tests fail, look into removing or changing the test as well because it was just testing unused code. By removing this unused code over time, teams will have less baggage, less clutter, and less mental complexity to sift through as they work on code. If you’ve been working on a project for a long time or just joined a team and your business is pressuring you to go faster, consider finally letting go of unnecessary code. Track Code Within the JVM The JVM provides plenty of capabilities that help development teams create fast-running applications. It already knows when a method will be first called, so unlike profilers, there’s no performance impact on tracking when this occurs. By consolidating this first-call information, teams can identify unused code and finally tidy up that ever-growing codebase.
Our industry is in the early days of an explosion in software using LLMs, as well as (separately, but relatedly) a revolution in how engineers write and run code, thanks to generative AI. Many software engineers are encountering LLMs for the very first time, while many ML engineers are being exposed directly to production systems for the very first time. Both types of engineers are finding themselves plunged into a disorienting new world — one where a particular flavor of production problem they may have encountered occasionally in their careers is now front and center. Namely, that LLMs are black boxes that produce nondeterministic outputs and cannot be debugged or tested using traditional software engineering techniques. Hooking these black boxes up to production introduces reliability and predictability problems that can be terrifying. It’s important to understand this, and why. 100% Debuggable? Maybe Not Software is traditionally assumed to be testable, debuggable, and reproducible, depending on the flexibility and maturity of your tooling and the complexity of your code. The original genius of computing was one of constraint; that by radically constraining language and mathematics to a defined set, we could create algorithms that would run over and over and always return the same result. In theory, all software is debuggable. However, there are lots of things that can chip away at that beauteous goal and make your software mathematically less than 100% debuggable, like: Adding concurrency and parallelism. Certain types of bugs. Stacking multiple layers of abstractions (e.g., containers). Randomness. Using JavaScript (HA HA). There is a much longer list of things that make software less than 100% debuggable in practice. Some of these things are related to cost/benefit tradeoffs, but most are about weak telemetry, instrumentation, and tooling. If you have only instrumented your software with metrics, for example, you have no way of verifying that a spike in api_requests and an identical spike in 503 errors are for the same events (i.e., you are getting a lot of api_requests returning 503) or for a disjoint set of events (the spike in api_requests is causing general congestion causing a spike in 503s across ALL events). It is mathematically impossible; all you can do is guess. But if you have a log line that emits both the request_path and the error_code, and a tool that lets you break down and group by arbitrary dimensions, this would be extremely easy to answer. Or if you emit a lot of events or wide log lines but cannot trace them, or determine what order things executed in, there will be lots of other questions you won’t be able to answer. There is another category of software errors that are logically possible to debug, but prohibitively expensive in practice. Any time you see a report from a big company that tracked down some obscure error in a kernel or an ethernet device, you’re looking at one of the rare entities with 1) enough traffic for these one in a billion errors to be meaningful, and 2) enough raw engineering power to dedicate to something most of us just have to live with. But software is typically understandable because we have given it structure and constraints. IF (); THEN (); ELSE () is testable and reproducible. Natural languages, on the other hand, are infinitely more expressive than programming languages, query languages, or even a UI that users interact with. The most common and repeated patterns may be fairly predictable, but the long tail your users will create is very long, and they expect meaningful results there, as well. For complex reasons that we won’t get into here, LLMs tend to have a lot of randomness in the long tail of possible results. So with software, if you ask the exact same question, you will always get the exact same answer. With LLMs, you might not. LLMs Are Their Own Beast Unit testing involves asserting predictable outputs for defined inputs, but this obviously cannot be done with LLMs. Instead, ML teams typically build evaluation systems to evaluate the effectiveness of the model or prompt. However, to get an effective evaluation system bootstrapped in the first place, you need quality data based on real use of an ML model. With software, you typically start with tests and graduate to production. With ML, you have to start with production to generate your tests. Even bootstrapping with early access programs or limited user testing can be problematic. It might be ok for launching a brand new feature, but it’s not good enough for a real production use case. Early access programs and user testing often fail to capture the full range of user behavior and potential edge cases that may arise in real-world usage when there are a wide range of users. All these programs do is delay the inevitable failures you’ll encounter when an uncontrolled and unprompted group of end users does things you never expected them to do. Instead of relying on an elaborate test harness to give you confidence in your software a priori, it’s a better idea to embrace a “ship to learn” mentality and release features earlier, then systematically learn from what is shipped and wrap that back into your evaluation system. And once you have a working evaluation set, you also need to figure out how quickly the result set is changing. Phillip gives this list of things to be aware of when building with LLMs: Failure will happen — it’s a question of when, not if. Users will do things you can’t possibly predict. You will ship a “bug fix” that breaks something else. You can’t really write unit tests for this (nor practice TDD). Latency is often unpredictable. Early access programs won’t help you. Sound at all familiar? Observability-Driven Development Is Necessary With LLMs Over the past decade or so, teams have increasingly come to grips with the reality that the only way to write good software at scale is by looping in production via observability — not by test-driven development, but observability-driven development. This means shipping sooner, observing the results, and wrapping your observations back into the development process. Modern applications are dramatically more complex than they were a decade ago. As systems get increasingly more complex, and nondeterministic outputs and emergent properties become the norm, the only way to understand them is by instrumenting the code and observing it in production. LLMs are simply on the far end of a spectrum that has become ever more unpredictable and unknowable. Observability — both as a practice and a set of tools — tames that complexity and allows you to understand and improve your applications. We have written a lot about what differentiates observability from monitoring and logging, but the most important bits are 1) the ability to gather and store telemetry as very wide events, ordered in time as traces, and 2) the ability to break down and group by any arbitrary, high-cardinality dimension. This allows you to explore your data and group by frequency, input, or result. In the past, we used to warn developers that their software usage patterns were likely to be unpredictable and change over time; now we inform you that if you use LLMs, your data set is going to be unpredictable, and it will absolutely change over time, and you must have a way of gathering, aggregating, and exploring that data without locking it into predefined data structures. With good observability data, you can use that same data to feed back into your evaluation system and iterate on it in production. The first step is to use this data to evaluate the representativity of your production data set, which you can derive from the quantity and diversity of use cases. You can make a surprising amount of improvements to an LLM based product without even touching any prompt engineering, simply by examining user interactions, scoring the quality of the response, and acting on the correctable errors (mainly data model mismatches and parsing/validation checks). You can fix or handle for these manually in the code, which will also give you a bunch of test cases that your corrections actually work! These tests will not verify that a particular input always yields a correct final output, but they will verify that a correctable LLM output can indeed be corrected. You can go a long way in the realm of pure software, without reaching for prompt engineering. But ultimately, the only way to improve LLM-based software is by adjusting the prompt, scoring the quality of the responses (or relying on scores provided by end users), and readjusting accordingly. In other words, improving software that uses LLMs can only be done by observability and experimentation. Tweak the inputs, evaluate the outputs, and every now and again, consider your dataset for representivity drift. Software engineers who are used to boolean/discrete math and TDD now need to concern themselves with data quality, representivity, and probabilistic systems. ML engineers need to spend more time learning how to develop products and concern themselves with user interactions and business use cases. Everyone needs to think more holistically about business goals and product use cases. There’s no such thing as a LLM that gives good answers that don’t serve the business reason it exists, after all. So, What Do You Need to Get Started With LLMs? Do you need to hire a bunch of ML experts in order to start shipping LLM software? Not necessarily. You cannot (there aren’t enough of them), you should not (this is something everyone needs to learn), and you don’t want to (these are changes that will make software engineers categorically more effective at their jobs). Obviously, you will need ML expertise if your goal is to build something complex or ambitious, but entry-level LLM usage is well within the purview of most software engineers. It is definitely easier for software engineers to dabble in using LLMs than it is for ML engineers to dabble in writing production applications. But learning to write and maintain software in the manner of LLMs is going to transform your engineers and your engineering organizations. And not a minute too soon. The hardest part of software has always been running it, maintaining it, and understanding it — in other words, operating it. But this reality has been obscured for many years by the difficulty and complexity of writing software. We can’t help but notice the upfront cost of writing software, while the cost of operating it gets amortized over many years, people, and teams, which is why we have historically paid and valued software engineers who write code more than those who own and operate it. When people talk about the 10x engineer, everyone automatically assumes it means someone who churns out 10x as many lines of code, not someone who can operate 10x as much software. But generative AI is about to turn all of these assumptions upside down. All of a sudden, writing software is as easy as sneezing. Anyone can use ChatGPT or other tools to generate reams of code in seconds. But understanding it, owning it, operating it, extending and maintaining it... all of these are more challenging than ever, because in the past, the way most of us learned to understand software was by writing it. What can we possibly do to make sure our code makes sense and works, and is extendable and maintainable (and our code base is consistent and comprehensible) when we didn’t go through the process of writing it? Well, we are in the early days of figuring that out, too. If you’re an engineer who cares about your craft: Do code reviews. Follow coding standards and conventions. Write (or generate) tests for it. But ultimately, the only way you can know for sure whether or not it works is to ship it to production and watch what happens. This has always been true, by the way. It’s just more true now. If you’re an engineer adjusting to the brave new era: Take some of that time you used to spend writing lines of code and reinvest it back into understanding, shipping under controlled circumstances, and observing. This means instrumenting your code with intention, and inspecting its output. This means shipping as soon as possible into the production environment. This means using feature flags to decouple deploys from releases and gradually roll new functionality out in a controlled fashion. Invest in these — and other — guardrails to make the process of shipping software more safe, fine-grained, and controlled. Most of all, it means developing the habit of looking at your code in production, through the lens of your telemetry, and asking yourself: Does this do what I expected it to do? Does anything else look weird? Or maybe I should say “looking at your systems” instead of “looking at your code,” since people might confuse the latter with an admonition to “read the code.” The days when you could predict how your system would behave simply by reading lines of code are long, long gone. Software behaves in unpredictable, emergent ways, and the important part is observing your code as it’s running in production, while users are using it. Code in a buffer can tell you very little. This Future Is a Breath of Fresh Air This, for once, is not a future I am afraid of. It’s a future I cannot wait to see manifest. For years now, I’ve been giving talks on modern best practices for software engineering — developers owning their code in production, testing in production, observability-driven development, continuous delivery in a tight feedback loop, separating deploys from releases using feature flags. No one really disputes that life is better, code is better, and customers are happier when teams adopt these practices. Yet, only 11% of teams can deploy their code in less than a day, according to the DORA report. Only a tiny fraction of teams are operating in the way everybody agrees we all should! Why? The answers often boil down to organizational roadblocks, absurd security/compliance policies, or lack of buy-in/prioritizing. Saddest of all are the ones who say something like, “our team just isn’t that good” or “our people just aren’t that smart” or “that only works for world-class teams like the Googles of the world.” Completely false. Do you know what’s hard? Trying to build, run, and maintain software on a two month delivery cycle. Running with a tight feedback loop is so much easier. Just Do the Thing So how do teams get over this hump and prove to themselves that they can have nice things? In my experience, only one thing works: When someone joins the team who has seen it work before, has confidence in the team’s abilities, and is empowered to start making progress against those metrics (which they tend to try to do, because people who have tried writing code the modern way become extremely unwilling to go back to the bad old ways). And why is this relevant? I hypothesize that over the course of the next decade, developing with LLMs will stop being anything special, and will simply be one skill set of many, alongside mobile development, web development, etc. I bet most engineers will be writing code that interacts with an LLM. I bet it will become not quite as common as databases, but up there. And while they’re doing that, they will have to learn how to develop using short feedback loops, testing in production, observability-driven development, etc. And once they’ve tried it, they too may become extremely unwilling to go back. In other words, LLMs might ultimately be the Trojan Horse that drags software engineering teams into the modern era of development best practices. (We can hope.) In short, LLMs demand we modify our behavior and tooling in ways that will benefit even ordinary, deterministic software development. Ultimately, these changes are a gift to us all, and the sooner we embrace them, the better off we will be.
Microservices architecture has become extremely popular in recent years because it allows for the creation of complex applications as a collection of discrete, independent services. Comprehensive testing, however, is essential to guarantee the reliability and scalability of the software due to the microservices’ increased complexity and distributed nature. Due to its capacity to improve scalability, flexibility, and resilience in complex software systems, microservices architecture has experienced a significant increase in popularity in recent years. The distributed nature of microservices, however, presents special difficulties for testing and quality control. In this thorough guide, we’ll delve into the world of microservices testing and examine its significance, methodologies, and best practices to guarantee the smooth operation of these interconnected parts. Understanding Microservices The functionality of an application is provided by a collection of independent, loosely coupled microservices. Each microservice runs independently, has a database, and uses its business logic. This architecture supports continuous delivery, scalability, and flexibility. In order to build a strong foundation, we must first understand the fundamentals of microservices architecture. Microservices are teeny, independent services that join forces to create a full software program. Each service carries out a particular business function and communicates with other services using clear APIs. Organizations can more effectively develop, deploy, and scale applications using this modular approach. However, with the increase in services, thorough testing is essential to find and fix any potential problems. Challenges in Microservices Testing Testing microservices introduces several unique challenges, including: Distributed nature: Microservices are distributed across different servers, networks, and even geographical locations. This requires testing to account for network latency, service discovery, and inter-service communication. Dependency management: Microservices often rely on external dependencies such as databases, third-party APIs, and message queues. Testing must consider these dependencies and ensure their availability during testing. Data consistency: Maintaining data consistency across multiple microservices is a critical challenge. Changes made in one service should not negatively impact the functionality of other services. Deployment complexity: Microservices are typically deployed independently, and coordinating testing across multiple services can be challenging. Versioning, rollbacks, and compatibility testing become vital considerations. Integration testing: Microservices architecture demands extensive integration testing to ensure seamless communication and proper behavior among services. Importance of Microservices Testing Microservices testing plays a vital role in guaranteeing the overall quality, reliability, and performance of the system. The following points highlight its significance: Isolation and Independence: Testing each microservice individually ensures that any issues or bugs within a specific service can be isolated, minimizing the impact on other services. Continuous Integration and Delivery (CI/CD): Microservices heavily rely on CI/CD pipelines to enable frequent deployments. Effective testing enables faster feedback loops, ensuring that changes and updates can be delivered reliably without causing disruptions. Fault Isolation and Resilience: By testing the interactions between microservices, organizations can identify potential points of failure and design resilient strategies to handle failures gracefully. Scalability and Performance: Testing enables organizations to simulate high loads and stress scenarios to identify bottlenecks, optimize performance, and ensure that microservices can scale seamlessly. Types of Microservices Testing Microservices testing involves various types of testing to ensure the quality, functionality, and performance of individual microservices and the system as a whole. Here are some important types of testing commonly performed in microservices architecture: Unit Testing Unit testing focuses on testing individual microservices in isolation. It verifies the functionality of each microservice at a granular level, typically at the code level. Unit tests ensure that individual components or modules of microservices behave as expected and meet the defined requirements. Mocking frameworks are often used to isolate dependencies and simulate interactions for effective unit testing. Integration Testing Integration testing verifies the interaction and integration between multiple microservices. It ensures that microservices can communicate correctly and exchange data according to the defined contracts or APIs. Integration tests validate the interoperability and compatibility of microservices, identifying any issues related to data consistency, message passing, or service coordination. Contract Testing Contract testing validates the contracts or APIs exposed by microservices. It focuses on ensuring that the contracts between services are compatible and adhere to the agreed-upon specifications. Contract testing verifies the request and response formats, data structures, and behavior of the services involved. This type of testing is essential for maintaining the integrity and compatibility of microservices during development and evolution. End-to-End Testing End-to-end (E2E) testing evaluates the functionality and behavior of the entire system, including multiple interconnected microservices, databases, and external dependencies. It tests the complete flow of a user request through various microservices and validates the expected outcomes. E2E tests help identify issues related to data consistency, communication, error handling, and overall system behavior. Performance Testing Performance testing assesses the performance and scalability of microservices. It involves testing the system under different loads, stress conditions, or peak usage scenarios. Performance tests measure response times, throughput, resource utilization, and other performance metrics to identify bottlenecks, optimize performance, and ensure that the microservices can handle expected loads without degradation. Security Testing Security testing is crucial in microservices architecture due to the distributed nature and potential exposure of sensitive data. It involves assessing the security of microservices against various vulnerabilities, attacks, and unauthorized access. Security testing encompasses techniques such as penetration testing, vulnerability scanning, authentication, authorization, and data protection measures. Chaos Engineering Chaos engineering is a proactive testing approach where deliberate failures or disturbances are injected into the system to evaluate its resilience and fault tolerance. By simulating failures or stress scenarios, chaos engineering validates the system’s ability to handle failures, recover gracefully, and maintain overall stability. It helps identify weaknesses and ensures that microservices can handle unexpected conditions without causing a system-wide outage. Data Testing Data testing focuses on validating the accuracy, integrity, and consistency of data stored and processed by microservices. It involves verifying data transformations, data flows, data quality, and data integration between microservices and external systems. Data testing ensures that data is correctly processed, stored, and retrieved, minimizing the risk of data corruption or inconsistency. These are some of the key types of testing performed in microservices architecture. The selection and combination of testing types depend on the specific requirements, complexity, and characteristics of the microservices system being tested. A comprehensive testing strategy covering these types of testing helps ensure the reliability, functionality, and performance of microservices-based applications. Best Practices for Microservices Testing Microservices testing presents unique challenges due to the distributed nature of the architecture. To ensure comprehensive testing and maintain the quality and reliability of microservices, it’s essential to follow best practices. Here are some key best practices for microservices testing: Test at Different Levels Microservices testing should be performed at multiple levels, including unit testing, integration testing, contract testing, end-to-end testing, performance testing, and security testing. Each level of testing verifies specific aspects of the microservices and their interactions. Comprehensive testing at various levels helps uncover issues early and ensures the overall functionality and integrity of the system. Prioritize Test Isolation Microservices are designed to be independent and loosely coupled. It’s crucial to test each microservice in isolation to identify and resolve issues specific to that service without impacting other services. Isolating tests ensures that failures or changes in one microservice do not cascade to other parts of the system, enhancing fault tolerance and maintainability. Use Mocking and Service Virtualization Microservices often depend on external services or APIs. Mocking and service virtualization techniques allow for testing microservices independently of their dependencies. By replacing dependencies with mocks or virtualized versions of the services, you can control the behavior and responses during testing, making it easier to simulate different scenarios, ensure test repeatability, and avoid testing delays caused by external service availability. Implement Contract Testing Microservices rely on well-defined contracts or APIs for communication. Contract testing verifies the compatibility and compliance of these contracts between services. By testing contracts, you ensure that services can communicate effectively, preventing integration issues and reducing the risk of breaking changes. Contract testing tools like Pact or Spring Cloud Contract can assist in defining and validating contracts. Automate Testing Automation is crucial for effective microservices testing. Implementing a robust test automation framework and CI/CD pipeline allows for frequent and efficient testing throughout the development lifecycle. Automated testing enables faster feedback, reduces human error, and facilitates the continuous delivery of microservices. Tools like Cucumber, Postman, or JUnit can be leveraged for automated testing at different levels. Emphasize Performance Testing Scalability and performance are vital aspects of microservices architecture. Conduct performance testing to ensure that microservices can handle expected loads and perform optimally under various conditions. Load testing, stress testing, and performance profiling tools like Gatling, Apache JMeter, or Locust can help assess the system’s behavior, identify bottlenecks, and optimize performance. Implement Chaos Engineering Chaos engineering is a proactive testing methodology that involves intentionally injecting failures or disturbances into a microservices environment to evaluate its resilience. By simulating failures and stress scenarios, you can identify weaknesses, validate fault tolerance mechanisms, and improve the overall robustness and reliability of the system. Tools like Chaos Monkey, Gremlin, or Pumba can be employed for chaos engineering experiments. Include Security Testing Microservices often interact with sensitive data and external systems, making security testing crucial. Perform security testing to identify vulnerabilities, ensure data protection, and prevent unauthorized access. Techniques such as penetration testing, vulnerability scanning, and adherence to security best practices should be incorporated into the testing process to mitigate security risks effectively. Monitor and Analyze System Behavior Monitoring and observability are essential during microservices testing. Implement monitoring tools and techniques to gain insights into the behavior, performance, and health of microservices. Collect and analyze metrics, logs, and distributed traces to identify issues, debug problems, and optimize the system’s performance. Tools like Prometheus, Grafana, ELK stack, or distributed tracing systems aid in monitoring and analyzing microservices. Test Data Management Managing test data in microservices testing can be complex. Ensure proper test data management by using techniques like data virtualization or synthetic data generation. These approaches allow for realistic and consistent test scenarios, minimizing dependencies on production data and external systems. By following these best practices, organizations can establish a robust testing process for microservices, ensuring quality, reliability, and performance in distributed systems. Adapting these practices to specific project requirements, technologies, and organizational needs is important to achieve optimal results. Test Environment and Infrastructure Creating an effective test environment and infrastructure is crucial for successful microservices testing. A well-designed test environment ensures that the testing process is reliable and efficient and replicates the production environment as closely as possible. Here are some key considerations for setting up a robust microservices test environment and infrastructure: Containerization and Orchestration Containerization platforms like Docker and orchestration tools such as Kubernetes provide a flexible and scalable infrastructure for deploying and managing microservices. By containerizing microservices, you can encapsulate each service and its dependencies, ensuring consistent environments across testing and production. Container orchestration tools enable efficient deployment, scaling, and management of microservices, making it easier to replicate the production environment for testing purposes. Environment Configuration Management Maintaining consistent configurations across different testing environments is crucial. Configuration management tools like Ansible, Chef, or Puppet help automate the setup and configuration of test environments. They allow you to define and manage environment-specific configurations, such as database connections, service endpoints, and third-party integrations, ensuring consistency and reproducibility in testing. Test Data Management Microservices often interact with databases and external systems, making test data management complex. Proper test data management ensures that test scenarios are realistic and cover different data scenarios. Techniques such as data virtualization, where virtual test data is generated on the fly, or synthetic data generation, where realistic but non-sensitive data is created, can be employed. Additionally, tools like Flyway or Liquibase help manage database schema migrations during testing. Service Virtualization Service virtualization allows you to simulate or virtualize the behavior of dependent microservices that are not fully developed or available during testing. It helps decouple testing from external service dependencies, enabling continuous testing even when certain services are unavailable or undergoing changes. Tools like WireMock, Mountebank, or Hoverfly provide capabilities for creating virtualized versions of dependent services, allowing you to define custom responses and simulate various scenarios. Continuous Integration and Delivery (CI/CD) Pipeline A robust CI/CD pipeline is essential for continuous testing and seamless delivery of microservices. The CI/CD pipeline automates the build, testing, and deployment processes, ensuring that changes to microservices are thoroughly tested before being promoted to higher environments. Tools like Jenkins, GitLab CI/CD, or CircleCI enable the automation of test execution, test result reporting, and integration with version control systems and artifact repositories. Test Environment Provisioning Automated provisioning of test environments helps in reducing manual effort and ensures consistency across environments. Infrastructure-as-Code (IaC) tools like Terraform or AWS CloudFormation enable the provisioning and management of infrastructure resources, including virtual machines, containers, networking, and storage, in a programmatic and reproducible manner. This allows for quick and reliable setup of test environments with the desired configurations. Monitoring and Log Aggregation Monitoring and log aggregation are essential for gaining insights into the behavior and health of microservices during testing. Tools like Prometheus, Grafana, or ELK (Elasticsearch, Logstash, Kibana) stack can be used for collecting and analyzing metrics, logs, and traces. Monitoring helps identify performance bottlenecks, errors, and abnormal behavior, allowing you to optimize and debug microservices effectively. Test Environment Isolation Isolating test environments from production environments is crucial to prevent any unintended impact on the live system. Test environments should have separate infrastructure, networking, and data resources to ensure the integrity of production data. Techniques like containerization, virtualization, or cloud-based environments provide effective isolation and sandboxing of test environments. Scalability and Performance Testing Infrastructure Microservices architecture emphasizes scalability and performance. To validate these aspects, it is essential to have a dedicated infrastructure for load testing and performance testing. This infrastructure should include tools like Gatling, Apache JMeter, or Locust, which allow simulating high loads, measuring response times, and analyzing system behavior under stress conditions. By focusing on these considerations, organizations can establish a robust microservices test environment and infrastructure that closely mirrors the production environment. This ensures accurate testing, faster feedback cycles, and reliable software delivery while minimizing risks and ensuring the overall quality and reliability of microservices-based applications. Test Automation Tools and Frameworks Microservices testing can be significantly enhanced by utilizing various test automation tools and frameworks. These tools help streamline the testing process, improve efficiency, and ensure comprehensive test coverage. In this section, we will explore some popular microservices test automation tools and frameworks: Cucumber Cucumber is a widely used tool for behavior-driven development (BDD) testing. It enables collaboration between stakeholders, developers, and testers by using a plain-text format for test scenarios. With Cucumber, test scenarios are written in a Given-When-Then format, making it easier to understand and maintain test cases. It supports multiple programming languages and integrates well with other testing frameworks and tools. Postman Postman is a powerful API testing tool that allows developers and testers to create and automate tests for microservices APIs. It provides a user-friendly interface for sending HTTP requests, validating responses, and performing functional testing. Postman supports scripting and offers features like test assertions, test data management, and integration with CI/CD pipelines. Rest-Assured Rest-Assured is a Java-based testing framework specifically designed for testing RESTful APIs. It provides a rich set of methods and utilities to simplify API testing, including support for request and response specification, authentication, data validation, and response parsing. Rest-Assured integrates well with popular Java testing frameworks like JUnit and TestNG. WireMock WireMock is a flexible and easy-to-use tool for creating HTTP-based mock services. It allows you to simulate the behavior of external dependencies or unavailable services during testing. WireMock enables developers and testers to stub out dependencies, define custom responses, and verify requests made to the mock server. It supports features like request matching, response templating, and record/playback of requests. Pact Pact is a contract testing framework that focuses on ensuring compatibility and contract compliance between microservices. It enables teams to define and verify contracts, which are a set of expectations for the interactions between services. Pact supports various programming languages and allows for generating consumer-driven contracts that can be used for testing both the provider and consumer sides of microservices. Karate Karate is an open-source API testing framework that combines API testing, test data preparation, and assertions in a single tool. It uses a simple and expressive syntax for writing tests and supports features like request chaining, dynamic payloads, and parallel test execution. Karate also provides capabilities for testing microservices built on other protocols like SOAP and GraphQL. Gatling Gatling is a popular open-source tool for load and performance testing. It allows you to simulate high user loads, measure response times, and analyze system behavior under stress conditions. Gatling provides a domain-specific language (DSL) for creating test scenarios and supports distributed load generation for scalability. It integrates well with CI/CD pipelines and offers detailed performance reports. Selenium Selenium is a widely used web application testing framework that can also be leveraged for testing microservices with web interfaces. It provides a range of tools and APIs for automating browser interactions and performing UI-based tests. Selenium supports various programming languages and offers capabilities for cross-browser testing, test parallelization, and integration with test frameworks like TestNG and JUnit. These are just a few examples of the many tools and frameworks available for microservices test automation. The choice of tool depends on factors such as project requirements, programming languages, team expertise, and integration capabilities with the existing toolchain. It’s essential to evaluate the features, community support, and documentation of each tool to select the most suitable one for your specific testing needs. Monitoring and Observability Monitoring and observability are essential for gaining insights into the health, performance, and behavior of microservices. Key monitoring aspects include: Log Aggregation and Analysis: Collecting and analyzing log data from microservices helps in identifying errors, diagnosing issues, and understanding the system’s behavior. Metrics and Tracing: Collecting and analyzing performance metrics and distributed traces provides visibility into the end-to-end flow of requests and highlights bottlenecks or performance degradation. Alerting and Incident Management: Establishing effective alerting mechanisms enables organizations to proactively respond to issues and incidents. Integrated incident management workflows ensure timely resolution and minimize disruptions. Distributed Tracing: Distributed tracing techniques allow for tracking and visualizing requests as they traverse multiple microservices, providing insights into latency, dependencies, and potential bottlenecks. Conclusion The performance, scalability, and reliability of complex distributed systems depend on the reliability of microservices. Organizations can lessen the difficulties brought about by microservices architecture by adopting a thorough testing strategy that includes unit testing, integration testing, contract testing, performance testing, security testing, chaos testing, and end-to-end testing. The overall quality and resilience of microservices-based applications are improved by incorporating best practices like test automation, containerization, CI/CD, service virtualization, scalability testing, and efficient monitoring, which results in better user experiences and successful deployments. The performance, dependability, and quality of distributed software systems are all dependent on the results of microservices testing. Organizations can find and fix problems at different levels, from specific microservices to end-to-end scenarios, by implementing a thorough testing strategy. Teams can successfully validate microservices throughout their lifecycle with the right test environment, infrastructure, and monitoring tools, facilitating quicker and more dependable software delivery. In today’s fast-paced technological environment, adopting best practices and using the appropriate testing tools and frameworks will enable organizations to create robust, scalable, and resilient microservices architectures, ultimately improving customer satisfaction and business success.
The abstract of this study has established trust is a crucial change for developers, and it develops trust among developers working at the different sites that facilitate team collaborations when discussing the development of distributed software. The existing research focused on how effectively to spread and build trust in not the presence of face-to-face and direct communications that overlooked the effects of trust propensity, which means traits of different personalities representing an individual disposition to perceive another as trustworthy. The preliminary quantitative analysis has been presented in this study to analyze how the trust propensity affects the collaboration success in the different distributed projects in software engineering projects. Here, the success is mainly represented through the request of pull that codes contribute, and changes are successfully merged to the repository projects. 1. Introduction In global software engineering, trust is considered the critical factor affecting software engineering success globally. However, decreased trust in software engineering has been reported to: Aggravate the separated team's feelings by developing conflicting goals Reduce the willingness to cooperate and share information to resolve issues Affect the nature of goodwill in perspective towards others in case of disagreements and objections [1] Face-to-face interactions (F2F) in current times help to grow trust among team members, and they successfully gain awareness related to both in terms of personal aspects and technical aspects [2]. On the contrary, the F2F interaction is more active now, which may be reduced in software-distributed projects [3]. Previous empirical research has shown that online or social media interactions among members over chat or email build trust among team members of OSS (open-source and software projects) that mainly have no chance related to meeting with team members [4]. Furthermore, a necessary aspect, especially for better understanding the trust development and cooperation in the working team, is the simple way of "propensity to trust" [5]. It is spreading the personal disposition of an individual or trust or taking the risk internally addicted to the trustee's beliefs and will behave as evaluated and expected [6]. However, the propensity of trust also refers to team members or individuals who generally tend to be perceived by others as trustworthy [7]. Moreover, we formulated the supporting research questions for this study that include as: Research Question (RQ): How does the propensity of trust of individuals facilitate successful collaboration even in software globally distributed projects? The most common limitation of this research is its relevance to empirical research findings based on supporting trust [8]. The trust represents no explicit extent measures to which individual developers' trust directly contributed to the performances of projects [9]. However, in this study, the team mainly intended to overcome the supporting limitation by approximating overall project performances, including duration, productivity, and completion of requirements. Through successful collaborations, the researcher's team indicated a situation where mainly two developers are working together and developed cooperation successfully due to yielding the supporting project advancement through adding new features or fixing bugs. By such nuanced and supporting gain unit analysis, the research aim is to measure more directly how trust facilitates cooperation in distributed software projects. Modern distributed and software projects support their coordination and workflow in remote work with the best version of control systems. The supporting pull request is considered the popular best way to successfully submit contributions to projects by using the distributed new version of the control system called Git. Furthermore, in reference to the development of pull-based development and model [10], the central repository of projects is to avoid the share among the different developers. On the contrary, developers mainly contributed through forking, that is, cloning repositories and successfully making the changes from each other. The primary condition and working of the pull repository is that when a set of supporting changes is successfully ready to finally submit the results into the central repository at that time, potential contributors mainly create the supporting pull request. After that, an integration manager, also known as a core developer, is effectively assigned the responsibility to individuals to inspect the integrated change in line with the project's central development. While working on software-distributed projects, an integration manager's main role is to ensure the best project quality. However, after the pull repository contribution's successful receipt, the pull request was closed, which is suitable for projects. That means the request of pull is either changed and accepted are integrated into the repository of the leading project or can be considered incorrect that considered as the pull request is changed or declined are rejected. Whether declined or accepted, the request for closed pull requires that the consensus is successfully reached by discussion. Furthermore, project managers also use the development collaborative platforms, including Bitbucket and GitHub, that make projects easier for developers or team members to collaborate by pull request [11]. It helps to provide a supporting and user-friendly interface web environment for discussing the supporting or proposed changes before the supporting integrating them into the successful source code of the project. Accordingly, researchers represent the successful collaborations between the individual developers concerning accepting the request of pull and refine the best research question as supporting or follows: Research Question (RQ):How does the propensity of trust of individuals facilitate successful collaboration even in software globally distributed projects? The researchers successfully investigated the refined questions of this research by analyzing the contributions history of pull requests from the famous developers of project Apache Groovy. Apache Groovy mainly provides archived supporting history through email-dependent communications. Researchers analyze the trace interactions over the supporting channels that assess the developers in an effective way known as “Propensity to trust." The remainder of this research paper is well organized: the next section mainly discusses the challenge and supporting solutions to propensity quantifying trust. In sections three and four, researchers described the supporting empirical study and related its results. In the fifth section, researchers discuss the limitations and findings. At last, the research finally drew a conclusion and better described the future in the sections of this research. 2. Background Measuring the Term “Propensity To Trust” The five-factor or big five personality effective model is used as a general taxonomy to evaluate personality traits [12] mentioned in Figure 1. It includes supporting higher levels such as openness, extraversion, conscientiousness, neuroticism, and agreeableness. However, each higher-level dimension has six supporting subdimensions that are further evaluated according to the supporting dimensions [13]. Previous research has confirmed personality traits that can be successfully derived from practical analysis of emails or written text [14]. According to Tausczik & Pennebaker [15], in the big five models, each trait is significantly and strongly associated with theoretically appropriated word usage patterns that indicate the most substantial connection between personality and language use [16]. Figure 1: Big-Five traits of personality model (Source: Developed by learner) Furthermore, the existing research based on trust has directly relied on data self-reported through survey questionnaires that help to measure the trust of individuals on the supporting given scale [17], [18], and [19]. In addition, one of the reliable and notable exceptions is mainly represented through the supporting work of Wang and Redmiles [7], who successfully studied how trusts mainly spread in the supporting OSS projects. The researchers use the word count and linguistic inquiry psycholinguistics dictionary to help analyze and better use in supporting writing [15] and [20]. The trust quantitative measure is obtained based on the critical term “Tone Analyser”, an LIWC IBM Watson service leveraging. However, which uses the supporting linguistic analysis that helps to detect the three significant types of supporting tones from supporting written text, including writing, emotional, and social style, it is significantly driven by social tone measures to connect with the social tendencies in supporting the people writing (that is a trait of a big personality). In supporting ways, researchers focused on agreeableness recognition, one of the personality traits that indicated the tendency of people to be cooperative and compassionate towards others. However, one supporting trust related to agreeableness is trusting others to avoid being suspicious [21] efficiently. About the following, the researcher uses agreeable personality traits similar to proxy supporting measures of an individual's propensity to trust. Factors Influenced the Acceptance of Pull-Requests According to [22] and [23], the factors influencing the contribution acceptances related to requests are considered both technical and social. Both factors include technical and social factors explained in context with the pull request. Technical aspects, the existing research based on patch acceptance as mentioned in [24], code reviewing, and bug training as mentioned in [25] and [26], has analyzed that the contribution of merge decision is directly affected through both project size such as team and KLOC size including the patch itself. Similarly, according to [10], it was analyzed that approximately 13% of pull requests reviewed were successfully closed to avoid merging, especially for supporting or purely technical reasons. However, the researchers found the supporting decision related to merging was mainly affected by the changes involving supporting code areas actively under coverages and developing test attached cases. With the increasing demand for social and transparent coding platforms, including GitHub and Bitbucket, integrators refer to contribution quality by looking at both technical and quality and developing track records through acceptance of previous contributions [27]. Numbers of followers and stars in GitHub reputations as auxiliary indicators [28]. However, the iterating findings related to pulling requests are explored as "treated equally" related to submitters defined as "social status." That is whether the external contributors are the core team members' core development. Furthermore, Ducheneaut [30] examined that the main contribution directly comes from the submitters, who are also recognized as the core development teams, driving the higher chances of acceptance and using the supporting records of signal interactions for judging and driving the proposed changes in quality. Finally, the supporting findings help to provide compelling further motivations to others for looking at the non-technical and other factors that can influence the decision to merge the supporting pull request effectively. 3. Empirical Study Researchers designed the study effectively to quantitatively access and analyze the impact of trust propensity to pull requests or (PRs) acceptance. However, the researchers used the simple and logistic regression that helped to build the model, especially for estimating the success or probability of the supporting merged or pull request that given the propensity integrators to trust. That means it is agreeableness as measured through the analysis of IBM Watson Tone. Moreover, in our supporting framework, researchers treat the pull request acceptances as the supporting dependent variables, including the independent variables as measures of agreeableness integrator that is a predictor. For supporting the study, the two primary resources are successfully used to collect the data information that is pulled, especially in GitHub, and the second emails retrieved, especially from the project of Apache Groovy. The project is considered object-oriented programming used for the Java platform, and scripting languages are also used for the Java platform. However, among several projects, this project is supported through Apache Software Foundations. Researchers opportunistically chose Groovy due to: It can make faster mailing and archives freely accessible. It can follow a supporting pull request and rely on the development model. Dataset The researchers used the supporting database of GHTorrent to collect the information in chronological order to support lists of pull requests [31]. It opened based on GitHub, and for each request of pull, researchers stored the supporting information, including: The main contributor The supporting data when it was accessible or opened The merged status The integrator The supporting data when it merged or closed Not all the pull requests are supported by GitHub. The researchers looked at the comments of pull requests, as mentioned, to identify the closed and merged that are outside of the supporting GitHub. In support, the researchers searched especially for the presences related to: Main branch commits that closed related to pulling requests Comments, especially from the manager's integration who supported the acknowledgment of a successful merge Researchers reviewed all the pull request projects one by one and evaluated their status annotated and manually. Automated Albeit reviews similar procedures that help describe in [10]. Furthermore, the description related to the Apache Groovy project is shown in Table 1. DESCRIPTION FOR JAVA-SUPPORTING PLATFORMS, OBJECT-ORIENTED PROGRAMMING IS THE PREFERRED LANGUAGE Languages Java Number of project committers 12 PRs based on GitHub 476 Emails successfully achieved, especially in the mailing list 4,948 Unique email senders 367 Table 1: Apache Groovy Project description (Source: Developed by learner) From Table 1, researchers almost drive the 5,000 messages from project emails and use my stats' supporting tools to mine the supporting users and drive the mailing list. It is available on the websites of Groovy projects. The researchers first retrieve the committer's identities, which are core team members, including the writer who accesses the supporting repository. It was retrieved from the Groovy projects web page hosted at Apache and GitHub. Furthermore, researchers compared the names and IDs of users to integrate the pull request and shared the mailing list names. After that, researchers are able to identify the supporting messages from the ten integrators. Lastly, the team is filtered out by developers who mainly exchange up to 20 supporting mains in a specific time period. Integrators of Propensity To Trust Once the researchers successfully obtained the supporting mapping related to the communications records of team developers and core team members, they successfully computed the healthy scores of propensity to trust from the supporting content related to the entire corpus of emails. In support, researchers or developers process email content through the tone of Analyzer and obtain the agreeableness score. That is defined as the interval of 0 and 1. The supporting value is obviously smaller than one that is 0.5, associated with lower agreeableness. Therefore, it tends to be less cooperative and compassionate towards others. The values equal and more than 0.5 are directly connected with higher agreeableness. However, in the end, the high and low agreeableness scopes are considered by researchers or developers to drive the level of propensity trust integrators analyzed in Table 2. 4. Results In this section, researchers present the supporting results through a regression model to build a practical understanding of the propensity to trust. It is a predictor and pulls request acceptances. The researchers performed the regression simple logistics through R statistical and supporting packages. In Table 3, the analysis of results is reported, and researchers omit to evaluate the significant and positive efforts related to control variables, including #emails and #PRs reviewed to send because of space constraints. The results include the coefficient estimate related to + 1.49 and add odds ratio of 4.46 with drive statistical significance, including a p-value of 0.0009. Furthermore, the coefficient sign estimates and indicates the negative or positive associations related to the predictor, including the success of the pull request. In addition, the OR (odds ratio) weighs and drives the effect size that impacts closer to value one. It is a more negligible impact based on parameters based on the success of chance [32]. The results indicate trust propensity, which is significantly and positively associated with the probability of supporting pull requests that successfully merged. Furthermore, researchers can use the estimated coefficients from the supporting model explored in Table 2 and the equations below. It can directly affect the merging probability and pull requests from the project of Groovy. However, the estimated probability of acceptance PR includes the following: PR estimated probability: 1/ (1 + exponential (-(+1.49+4.46 * trust propensity))).............(i) From the above equation and example, the k pull request and probability acceptances through an integrator i with the supporting low propensity is 0.68. The overall PR correspondence increases as the results and supporting questions are analyzed. 5. Discussion The discussion section is the best analysis of knowledge evaluated by researchers in their first attempts to drive the quantifying effects on the developer's trust and personal traits. It is based on distributed software projects by following a pull request supporting a based development model. Similarly, the practical result of this study is to drive the initial evidence to evaluate the chances related to merging the contributions code that is correlated with the traits of personality of integrators who performed well in reviewing the code. Furthermore, the novel finding underlines the supporting role played by the researchers or developers related to the trust of propensity through personality related to the execution of review code and tasks. The discussion results are linked with primary sources such as [22] and [30], mainly observed through the social distance between the integrator and social contributors to influence the acceptance changes through pull requests. Integrators Reviewed PR:Merged Reviewed PR:Closed A score of propensity trust Developer 1 14 0 High Developer 2 57 6 Developer 3 99 7 High Developer 4 12 4 Low Developer 5 10 1 Low Developer 6 8 0 Low Total 200 18 - Table 2: Score of Propensity to trust include pull request (Source: Developed by learner) Predictors Estimate of Coefficient Odds Ratio Supporting P-Value Intercept Plus (+) 0.77 0.117 Trust Propensity Plus (+) 1.49 4.46 0.009 Table 3: Simple Logistic regression model results (Source: Developed by learner) From the supporting discussion and analysis of Tables 2 and 3, the developers mainly recommend making sure about the community before any contribution. The users mentioned in the supporting comments of the pull requests followed the recommendation explicitly to request the supporting review from the integrators [33]. It shows the willingness to help and cooperate with others to drive a higher trust propensity. The results and findings that PR accepted as p-value as 0.68. Moreover, the broader socio-technical framework congruences the research that finds the critical points for further studies and investigates far more as evaluated in research. The personality traits also match the needs of coordinates established with the help of technical domains supporting source code areas interested and evaluated with the proposed changes. Finally, in the supporting discussion, because of its preliminary nature, some limitations are also suffered in this study regarding the result's generalizability. The researchers mainly acknowledge the preliminary analysis based on supporting tools and numbers to involve developers from the side of single projects. However, only through the supporting replications, including the different values of datasets and settings, will researchers be able to evaluate and develop evidence on solid empirical changes. In addition, the other supporting limitation of this study is around the propensity validity and trust construct. Due to less practicality, researchers mainly decided to not only depend on the self-reported and traditional psychometric approaches that are used for measuring trust, including surveys [12]. In a replication of future perspective, researchers will support and investigate the reliability of analyzer tone services that connect with the technical domain that includes the software engineering from longer time linguistic resources, which are usually evaluated and trained based on the content of non-technical. 6. Conclusion and Recommendation Conclusion Therefore, the research has included six major chapters (sections) to evaluate the research topic based on a preliminary analysis of the propensity effect to trust in supporting distributed software development. The first chapter includes an overview of research supporting critical and real-time information. The second chapter includes personality traits and a significant five-factor analysis to build trust in teamwork. The third chapter has highlighted the empirical study related to research by connecting the real supporting projects such as Apache Groovy projects. The fourth chapter focuses on analysis and results to drive the research and projects. The five chapters have been discussed based on results and empirical study. The last chapter, or sixth chapter, concludes the research with supporting the recommendation for further study. This study represents the initial step in driving the broader research efforts that help collect the evidence of quantitative analysis and well-established trust among team members and developers and contribute to increased performances of projects related to distributed software engineering projects. In the analysis of personality models of the big five model, the trust propensity related to perceived others is driven by the trustworthy and stable personality traits that have changed from one person to another. Overall, the leveraging prior and supporting evidence that emerging personality traits unconsciously drive from the lexicon personally used in supporting written communications. The researchers and developers have used the tone of IBM Watson and analyzer services that help to measure the trust propensity by analysis of written emails archived through the projects of Apache Groovy. Furthermore, we found initial supporting evidence that researchers and developers who have posed with a higher propensity are likely to develop and trust more to accept the significant external contribution analyzed from the pull request. Recommendation For future work, it is recommended by researchers to replicate the supporting experiences to drive solid evidence. Researchers have followed and compared the analyzer tone, including the supporting tools, to better assess the reliability of research in extracting the personality, especially from the text that mainly contains the information of technical content. The researchers intended or used to enlarge the supporting database for both projects and supporting pull requests to understand the research better. It suggests that developers' personalities will change by relying on the project's participants and developing mutual trust through involving pairs of developers who mainly interact in supporting dyadic cooperation. For future study, it is suggested that developers of distributed software engineering projects use the network-centric approach, especially for estimating the supporting trust between software developers and open sources. Developers will follow the study's three stages of the network-centric approach to build trust while working from different physical locations globally on the distributed software engineering project. The first stage will be CDN (community-wide network developers) of this approach, which means community-wide network developers to construct better the information connecting to projects and developers. The second stage will compare the supporting trust between the supporting pairs directly linked to developers with the CDN. The last stage is computed with the trust between the developer pairs indirectly linked in the CDN. Figure 3: Future suggestion for a network-centric approach for estimating the trust (Source: Developed by learner) From Figure 3 above, CDN is the main stage connected with the other two stages to build trust and trustworthiness to drive potential contributions to an OSS project. The developers or researchers can construct the network-centric approach by focusing on CDN that provides supporting and valuable information about the effective collaboration between the OSS community and developers. The role of CDN in Figure 3 represents the community of developers driven by the OSS multiple projects that share some most common characteristics, including the same case and programming languages. Furthermore, developers or researchers can follow the main four features to label the supporting data to the regression train models for feature extractions. Word embedding can be used by developers such as Google Word2Vec to analyze and vectorize every comment to avoid the pre-trained model. The developers can train their own Word2Vec model related to data of software engineering to drive domain-specific and supporting models to better semantic representation analysis. It can be compared with the pre-trained and generic models. The developers can also use the 300 vector dimensional models to get comments and evaluate the vector representation. In addition, social can also be considered a strength for developers to build a connection between two or more developers to influence trust. Researchers and developers can assign the supporting integer values, especially for every role, to build the pull request and analyze the comment to build trust among individuals. References [1] B. Al-Ani, H. Wilensky, D. Redmiles, and E. Simmons, “An Understanding of the Role of Trust in Knowledge Seeking and Acceptance Practices in Distributed Development Teams,” in 2011 IEEE Sixth International Conference on Global Software Engineering, 2011. [2] F. Abbattista, F. Calefato, D. Gendarmi, and F. Lanubile, “Incorporating Social Software into Agile Distributed Development Environments.” Proc. 1st ASE Workshop on Social Sofware Engineering and Applications (SOSEA’08), 2008. [3] F. Calefato, F. Lanubile, N. Sanitate, and G. Santoro, “Augmenting social awareness in a collaborative development environment,” in Proceedings of the 4th international workshop on Social software engineering - SSE ’11, 2011. [4] F. Lanubile, F. Calefato, and C. Ebert, “Group Awareness in Global Software Engineering,” IEEE Softw., vol. 30, no. 2, pp. 18–23. [5] A. Guzzi, A. Bacchelli, M. Lanza, M. Pinzger, and A. van Deursen, “Communication in open source software development mailing lists,” in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. [6] Y. Wang and D. Redmiles, “Cheap talk, cooperation, and trust in global software engineering,” Empirical Software Engineering, 2015. [7] Y. Wang and D. Redmiles, “The Diffusion of Trust and Cooperation in Teams with Individuals’ Variations on Baseline Trust,” in Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 2016, pp. 303–318. [8] S. L. Jarvenpaa, K. Knoll, and D. E. Leidner, “Is Anybody out There? Antecedents of Trust in Global Virtual Teams,” Journal of Management Information Systems, vol. 14, no. 4, pp. 29–64, 1998. [9] J. W. Driscoll, “Trust and Participation in Organizational Decision Making as Predictors of Satisfaction,” Acad. Manage. J., vol. 21, no. 1, pp. 44–56, 1978. [10] G. Gousios, M. Pinzger, and A. van Deursen, “An exploratory study of the pull-based software development model,” in Proceedings of the 36th International Conference on Software Engineering - ICSE 2014, 2014. [11] F. Lanubile, C. Ebert, R. Prikladnicki, and A. Vizcaino, “Collaboration Tools for Global Software Engineering,” IEEE Softw., vol. 27, no. 2, pp. 52–55, 2010. [12] P. T. Costa and R. R. McCrae, “The Five-Factor Model, Five-Factor Theory, and Interpersonal Psychology,” in Handbook of Interpersonal Psychology, 2012, pp. 91–104. [13] J. B. Hirsh and J. B. Peterson, "Personality and language use in self-narratives," J. Res. Pers., vol. 43, no. 3, pp. 524–527, 2009. [14] J. Shen, O. Brdiczka, J.J. Liu, Understanding email writers: personality prediction from email messages. Proc. of 21st Int’l Conf. on User Modeling, Adaptation and Personalization (UMAP Conf), 2013. [15] Y. R. Tausczik and J. W. Pennebaker, "The Psychological Meaning of Words: LIWC and Computerised Text Analysis Methods," J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, Mar. 2010. [16] B. Al-Ani and D. Redmiles, “In Strangers We Trust? Findings of an Empirical Study of Distributed Teams,” in 2009 Fourth IEEE International Conference on Global Software Engineering, Limerick, Ireland, pp. 121–130. [17] J. Schumann, P. C. Shih, D. F. Redmiles, and G. Horton, “Supporting initial trust in distributed idea generation and idea evaluation,” in Proceedings of the 17th ACM international conference on Supporting group work - GROUP ’12, 2012. [18] F. Calefato, F. Lanubile, and N. Novielli, “The role of social media in affective trust building in customer–supplier relationships,” Electr. Commerce Res., vol. 15, no. 4, pp. 453–482, Dec. 2015. [19] J. Delhey, K. Newton, and C. Welzel, "How General Is Trust in 'Most People'? Solving the Radius of Trust Problem," Am. Social. Rev., vol. 76, no. 5, pp. 786–807, 2011. [20] Pennebaker, James W., Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth, "The Development and Psychometric Properties of LIWC2007," LIWC2007 Manual, 2007. [21] P. T. Costa and R. R. MacCrae, Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO FFI): Professional Manual. 1992. [22] J. Tsay, L. Dabbish, and J. Herbsleb. Influence of social and technical factors for evaluating contribution in GitHub. In Proc. of 36th Int’l Conf. on Software Engineering (ICSE’14), 2014. [23] G. Gousios, M.-A. Storey, and A. Bacchelli, “Work practices and challenges in pull-based development: The contributor’s perspective,” in Proceedings of the 38th International Conference on Software Engineering - ICSE ’16, 2016. [24] C. Bird, A. Gourley, and P. Devanbu, “Detecting Patch Submission and Acceptance in OSS Projects,” in Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), 2007. [25] P. C. Rigby, D. M. German, L. Cowen, and M.-A. Storey, “Peer Review on Open-Source Software Projects,” ACM Trans. Softw. Eng. Methodol., vol. 23, no. 4, pp. 1–33, 2014. [26] J. Anvik, L. Hiew, and G. C. Murphy, "Who should fix this bug?," in Proceeding of the 28th international conference on Software Engineering - ICSE '06, 2006. [27] L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in GitHub: Transparency and Collaboration in an Open Source Repository,” in Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW ’12, 2012. [28] J. Marlow, L. Dabbish, and J. Herbsleb, “Impression formation in online peer production: Activity traces and personal profiles in GitHub,” in Proceedings of the 2013 conference on Computer supported cooperative work - CSCW ’13, 2013. [29] G. Gousios, A. Zaidman, M.-A. Storey, and A. van Deursen, “Work practices and challenges in pull-based development: the integrator’s perspective,” in Proceedings of the 37th International Conference on Software Engineering - Volume 1, 2015, pp. 358–368. [30] N. Ducheneaut, "Socialisation in an Open Source Software Community: A Socio-Technical Analysis," Comput. Support. Coop. Work, vol. 14, no. 4, pp. 323–368, 2005. [31] G. Gousios, “The GHTorrent dataset and tool suite,” in 2013 10th Working Conference on Mining Software Repositories (MSR), 2013. [32] J. W. Osborne, “Bringing Balance and Technical Accuracy to Reporting Odds Ratios and the Results of Logistic Regression Analyses,” in Best Practices in Quantitative Methods, pp. 385–389. [33] O. Baysal, R. Holmes, and M. W. Godfrey. Mining usage data and development artefacts. In Proceedings of MSR '12, 2012. [34] Sapkota, H., Murukannaiah, P. K., & Wang, Y. (2019). A network-centric approach for estimating trust between open source software developers. Plos one, 14(12), e0226281. [35] Wang, Y. (2019). A network-centric approach for estimating trust between open source software developers.
Isn’t it magic when all your DevOps team, including new members, can access the company’s repository fast and securely by simply logging in once? It isn’t a dream! You can easily arrange it using SAML single sign-on (SSO). What You Should Know About SAML Before jumping into the technical details of SAML, let’s try to understand what SAML is in a simple language. For that reason, let's look at an example. Imagine you are managing the activity of a big DevOps team – all of them use a bunch of apps to make their work easier. It can include GitHub, Jira, AWS, and many more. All those apps your organization uses on a daily basis. Then it comes the moment to expand your DevOps team. After a thorough search, your headhunter’s work is done – your company hires a new developer and gives him a work email address and access to a dashboard. As soon as he signs in to the dashboard, he gains access to all the apps there. All he needs to do is to click the app he wants to use without entering any other credentials. Not bad, eh? And it is how Security Assertion Markup Language (SAML) actually works. Using a more technical language, SAML is an open standard used to authenticate the user. Web applications use this standard, which is based on the XML format, to transfer authentication data between two parties. They are: The identity provider (IdP), which performs authentication and passes the user’s identity information and the level of authorization to the service provider The service provider (SP), which authorizes the use and permits him to access the requested resources. Benefits of SAML Authentication SAML brings a lot of benefits to the company. Among them, we can name improved user experience, increased level of security, and reduced costs for different service providers, and it permits skipping multiple questionnaires as all the information is synchronized between directories. Specific Benefit of SAML What it brings to the team Boosts User Experience Your team members need to sign in just 1 time to access multiple apps. It makes the authentication process much faster and reduces your team’s stress, as your DevOps don’t need to remember the bulk of login credentials for each application they use. Improves Security SAML brings forth a single point of authentication – a secure identity provider. So, the credentials are only sent to the IdP directly. You eliminate the risk of using weak and repeatable passwords by employees. Reduces Responsibility You don’t need to maintain all your account details across multiple services. With SAML turned on, the identity provider takes this responsibility. Empowers Single Identity As a user, you don’t need to maintain and synchronize your information between directories. SAML makes a user’s life easier. GitHub and SAML SSO: All You Need To Know If your company uses GitHub Enterprise Cloud, then SAML SSO will be a great benefit for you. It permits organization owners to control and secure the access to company’s GitHub data, including repositories, pull requests, and issues. How Does SAML SSO Work on GitHub? Once you decide to configure SAML SSO, members of your organization still need to log in to their personal accounts on the GitHub platform. Though, in this case, the process will look this way: As soon as a member of your DevOps team tries to access some resources within your organization, GitHub will redirect those members to your company’s IdP for authentication reasons. Once the authentication is successful, your IdP will redirect the member of your DevOps team back to the git hosting service. However, it’s worth remembering that SAML SSO “doesn’t replace the normal sign-in process for GitHub,” as mentioned in GitHub Docs: You don’t need IdP authentication if you want to view the public repository overview page and its file contents on GitHub, fork, or clone the repo. However, if you plan to view issues, projects, pull requests, and releases, then authentication is a must-have. What Enforcement of SAML SSO for Your GitHub Brings to Your DevOps Team If you enforce SAML SSO for your GitHub organization, all members of your DevOps team will have to authenticate through your IdP to access your GitHub repositories and metadata to start building their code. It will definitely help to eliminate unauthorized access. When the enforcement is turned on, it will remove any member (or administrator) who failed to authenticate with IdP from the organization. At the same time, you, as a removed user, will get an email notification from GitHub. Is it possible to rejoin the GitHub organization? Sure… All you need to do is authenticate with SAML SSO once again within 3 months after being removed. Within this time frame, all the access privileges and settings are preserved. When it comes to bots and service accounts, to retain access to them, the administrator of an organization can enable SAML SSO. If you need to enforce SAML SSO for your GitHub organization, you need to create an external identity for the bot or service account you want with your IdP. Otherwise, those bots and services will be removed from your organization. You can test your SAML SSO implementation during the setup phase. All you need to do is to leave Require SAML SSO authentication for all members of the [name of the organization] unchecked. In this case, the process of testing won’t affect your organization’s members. Are There Any Risks? What will you do if an attacker cracks your password and encrypts all your data? Yep, it’s possible to lose all your data and resources as all your web apps and accounts are linked with the same credentials. Or you can face a problem if there is any failure or breakdown in the SSO. In this case, all your applications will come to rest to log in. For example, in March 2022 due to a hacker attack on Okta, an identity and access management company, around 360 of its customers were potentially impacted and the incident lasted for “25 consecutive minutes.” Is it possible to avoid situations like those mentioned? Unfortunately, no. Yet you can try to eliminate the risks connected with those situations – downtime, data, and financial losses. Back up your GitHub repositories and metadata. In this case, even if your GitHub environment is encrypted (with ransomware, for example), you will be able to run your backup copy from another storage instance. GitHub Backup: What To Keep In Mind Let’s look at what your GitHub backup should include. If you build your backup strategy in advance and in an appropriate way, you can save time and money in the event of a threat. So, what features can bring your company reliability and any-time data access? They are: Meeting the 3-2-1 backup rule (when you have at least 3 copies in 2 different storage instances, one of which is out of your infrastructure) Possibility for infinite retention (to recover your data from any point in time) Encryption (AES, in-flight and at rest with your own encryption key) Ransomware protection (including immutable storage) Restore and disaster recovery technology (permitting to perform granular recovery, restore to the same or new repo/organization account/local device, or cross-over recovery to another Git hosting platform – from GitHub to Bitbucket or GitLab) Takeaway SAML Single sign-on (SSO) is a very useful function that brings your team convenience and permits you to avoid remembering multiple passwords and logging in just once into all your web apps. Despite being a reliable and secure way of accessing your apps, it still needs an additional layer of protection – backup, which can help to eliminate downtime and data loss. You can try to arrange your backup by yourself or use third-party backup tools.
This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report Organizations today rely on data to make decisions, innovate, and stay competitive. That data must be reliable and trustworthy to be useful. Many organizations are adopting a data observability culture that safeguards their data accuracy and health throughout its lifecycle. This culture involves putting in motion a series of practices that enable you and your organization to proactively identify and address issues, prevent potential disruptions, and optimize their data ecosystems. When you embrace data observability, you protect your valuable data assets and maximize their effectiveness. Understanding Data Observability "In a world deluged by irrelevant information, clarity is power.”- Yuval Noah Harari, 21 Lessons for the 21st Century, 2018 As Yuval Noah Harari puts it, data is an incredibly valuable asset today. As such, organizations must ensure that their data is accurate and dependable. This is where data observability comes in, but what is data observability exactly? Data observability is the means to ensure our data's health and accuracy, which means understanding how data is collected, stored, processed, and used, plus being able to discover and fix issues in real time. By doing so, we can optimize our system's effectiveness and reliability by identifying and addressing discrepancies while ensuring compliance with regulations like GDPR or CCPA. We can gather valuable insights that prevent errors from recurring in the future by taking such proactive measures. Why Is Data Observability Critical? Data reliability is vital. We live in an era where data underpins crucial decision-making processes, so we must safeguard it against inaccuracies and inconsistencies to ensure our information is trustworthy and precise. Data observability allows organizations to proactively identify and address issues before they can spread downstream, preventing potential disruptions and costly errors. One of the advantages of practicing data observability is that it'll ensure your data is reliable and trustworthy. This means continuously monitoring your data to avoid making decisions based on incomplete or incorrect information, giving you more confidence. Figure 1: The benefits of companies using analytics Data source: The Global State of Enterprise Analytics, 2020, MicroStrategy Analyzing your technology stack can also help you find inefficiencies and areas where resources are underutilized, saving you money. But incorporating automation tools into your data observability process is the cherry on top of the proverbial cake, making everything more efficient and streamlined. Data observability is a long-run approach to safeguarding the integrity of your data so that you can confidently harness its power, whether it's for informed decision-making, regulatory compliance, or operational efficiency. Advantages and Disadvantages of Data Observability When making decisions based on data, it's essential to be quick. But what if the data isn't dependable? That's where data observability comes in. However, like any tool, it has its advantages and disadvantages. IMPLEMENTING DATA OBSERVABILITY: ADVANTAGES AND DISADVANTAGES Advantages Disadvantages Trustworthy insights for intelligent decisions: Data observability provides decision-makers with reliable insights, ensuring well-informed choices in business strategy, product development, and resource allocation. Resource-intensive setup: Implementing data observability demands time and resources to set up tools and processes, but the long-term benefits justify the initial costs. Real-time issue prevention: Data observability acts as a vigilant guardian for your data, instantly detecting issues and averting potential emergencies, thus saving time and resources while maintaining data reliability. Computational overhead from continuous monitoring: Balancing real-time monitoring with computational resources is essential to optimize observability. Enhanced team alignment through shared insights: Data observability fosters collaboration by offering a unified platform for teams to gather, analyze, and act on data insights, facilitating effective communication and problem-solving. Training requirements for effective tool usage: Data observability tools require skill, necessitating ongoing training investments to harness their full potential. Accurate data for sustainable planning: Data observability establishes the foundation for sustainable growth by providing dependable data that's essential for long-term planning, including forecasting and risk assessment. Privacy compliance challenges: Maintaining data observability while adhering to strict privacy regulations like GDPR and CCPA can be intricate, requiring a delicate balance between data visibility and privacy compliance. Resource savings: Data observability allows you to improve how resources are allocated by identifying areas where your technology stack is inefficient or underutilized. As a result, you can save costs and prevent over-provisioning resources, leading to a more efficient and cost-effective data ecosystem. Integration complexities: Integrating data observability into existing data infrastructure may pose challenges due to compatibility issues and legacy systems, potentially necessitating investments in specific technologies and external expertise for seamless integration. Table 1 To sum up, data observability has both advantages and disadvantages, such as providing reliable data, detecting real-time problems, and enhancing teamwork. However, it requires significant time, resources, and training while respecting data privacy. Despite these challenges, organizations that adopt data observability are better prepared to succeed in today's data-driven world and beyond. Cultivating a Data-First Culture Data plays a crucial role in today's fast-paced and competitive business environment. It enables informed decision-making and drives innovation. To achieve this, it's essential to cultivate an environment that values data. This culture should prioritize accuracy, dependability, and consistent monitoring throughout the data's lifecycle. To ensure effective data observability, strong leadership is essential. Leaders should prioritize data from the top down, allocate necessary resources, and set a clear vision for a data-driven culture. This leadership fosters team collaboration and alignment, encouraging them to work together towards the same objectives. When teams collaborate in a supportive work environment, critical data is properly managed and utilized for the organization's benefit. Technical teams and business users must work together to create a culture that values data. Technical teams build the foundation of data infrastructure while business users access data to make decisions. Collaboration between these teams leads to valuable insights that drive business growth. Figure 2: Data generated, gathered, copied, and consumed Data source: Data and Analytics Leadership Annual Executive Survey 2023, NewVantage Partners By leveraging data observability, organizations can make informed decisions, address issues quickly, and optimize their data ecosystem for the benefit of all stakeholders. Nurturing Data Literacy and Accountability Promoting data literacy and accountability is not only about improving efficiency but also an ethical consideration. Assigning both ownership and accountability for data management empowers people to make informed decisions based on data insights, strengthens transparency, and upholds principles of responsibility and integrity, ensuring accuracy, security, and compliance with privacy regulations. A data-literate workforce is a safeguard, identifying instances where data may be misused or manipulated for unethical purposes. Figure 3: The state of data responsibility and data ethics Data source: Amount of data created, consumed, and stored 2010- 2020, with forecasts to 2025, 2023, Statistica Overcoming Resistance To Change Incorporating observability practices is often a considerable challenge, and facing resistance from team members is not uncommon. However, you should confront these concerns and communicate clearly to promote a smooth transition. You can encourage adopting data-driven practices by highlighting the long-term advantages of better data quality and observability, which might inspire your coworkers to welcome changes. Showcasing real-life cases of positive outcomes, like higher revenue and customer satisfaction, can also help make a case. Implementing Data Observability Techniques You can keep your data pipelines reliable and at a high quality by implementing data observability. This implementation involves using different techniques and features that will allow you to monitor and analyze your data. Those processes include data profiling, anomaly detection, lineage, and quality checks. These tools will give you a holistic view of your data pipelines, allowing you to monitor its health and quickly identify any issues or inconsistencies that could affect its performance. Essential Techniques for Successful Implementation To ensure the smooth operation of pipelines, you must establish a proper system for monitoring, troubleshooting, and maintaining data. Employing effective strategies can help achieve this goal. Let's review some key techniques to consider. Connectivity and Integration For optimal data observability, your tools must integrate smoothly with your existing data stack. This integration should not require major modifications to your pipelines, data warehouses, or processing frameworks. This approach allows for an easy deployment of the tools without disrupting your current workflows. Data Monitoring at Rest Observability tools should be able to monitor data while it's at rest without needing to extract it from the current storage location. This method ensures that the monitoring process doesn't affect the speed of your data pipelines and is cost effective. Moreover, this approach makes your data safer as it doesn't require extraction. Automated Anomaly Detection Automated anomaly detection is an important component of data observability. Through machine learning models, patterns and behaviors in data are identified; this enables alerts to be sent when unexpected deviations occur, reducing the number of false positives and alleviating the workload of data engineers who would otherwise have to manage complex monitoring rules. Dynamic Resource Identification Data observability tools give you complete visibility into your data ecosystem. These tools should automatically detect important resources, dependencies, and invariants. They should be flexible enough to adapt to changes in your data environment, giving you insights into vital components without constant manual updates and making data observability extensive and easy to configure. Comprehensive Contextual Information For effective troubleshooting and communication, data observability needs to provide comprehensive contextual information. This information should cover data assets, dependencies, and reasons behind any data gaps or issues. Having the full context will allow data teams to identify and resolve any reliability concerns quickly. Preventative Measures Data observability implements monitoring data assets and offers preventive measures to avoid potential issues. With insights into data and suggesting responsible alterations or revisions, you can proactively address problems before they affect data pipelines. This approach leads to greater efficiency and time savings in the long run. If you need to keep tabs on data, it can be tough to ensure everything is covered. Only using batch and stream processing frameworks isn't enough. That's why it's often best to use a tool specifically made for this purpose. You could use a data platform, add it to your existing data warehouse, or opt for open-source tools. Each of these options has its own advantages and disadvantages: Use a data platform– Data platforms are designed to manage all of your organization's data in one place and grant access to that data through APIs instead of via the platform itself. There are many benefits to using a data platform, including speed, easy access to all your organization's information, flexible deployment options, and increased security. Additionally, many platforms include built-in capabilities for data observability, so you can ensure your databases perform well without having to implement an additional solution. Build data observability into your existing platform – If your organization only uses one application or tool to manage its data, this approach is probably the best for you, provided it includes an observability function. Incorporating data observability into your current setup is a must-have if you manage complex data stored in multiple sources, thus improving the reliability of your data flow cycle. Balancing Automation and Human Oversight Figure 4: Balancing automation and human oversight While automation is a key component of data observability, it's important to strike a balance between automation and human oversight. While automation can help with routine tasks, human expertise is necessary for critical decisions and ensuring data quality. Implementing data observability techniques involves seamless integration, automated anomaly detection, dynamic resource identification, and comprehensive contextual information. Balancing automation and human oversight is important for efficient and effective data observability, resulting in more reliable data pipelines and improved decision-making capabilities. Conclusion In conclusion, data observability empowers organizations to thrive in a world where data fuels decision-making by ensuring data's accuracy, reliability, and trustworthiness. We can start by cultivating a culture that values data integrity, collaboration between technical and business teams, and a commitment to nurturing data literacy and accountability. You will also need a strong data observability framework to monitor your data pipelines effectively. This includes a set of techniques that will help identify issues early and optimize your data ecosystems. But automated processes aren't enough, and we must balance our reliance on automation with human oversight, recognizing that while automation streamlines routine tasks, human expertise remains invaluable for critical decisions and maintaining data quality. With data observability, data integrity is safeguarded, and its full potential is unlocked — leading to innovation, efficiency, and success. This is an article from DZone's 2023 Data Pipelines Trend Report.For more: Read the Report
In my previous posting, I explained how to run Ansible scripts using a Linux virtual machine on Windows Hyper-V. This article aims to ease novices into Ansible IAC at the hand of an example. The example being booting one's own out-of-cloud Kubernetes cluster. As such, the intricacies of the steps required to boot a local k8s cluster are beyond the scope of this article. The steps can, however, be studied at the GitHub repo, where the Ansible scripts are checked in. The scripts were tested on Ubuntu20, running virtually on Windows Hyper-V. Network connectivity was established via an external virtual network switch on an ethernet adaptor shared between virtual machines but not with Windows. Dynamic memory was switched off from the Hyper-V UI. An SSH service daemon was pre-installed to allow Ansible a tty terminal to run commands from. Bootstrapping the Ansible User Repeatability through automation is a large part of DevOps. It cuts down on human error, after all. Ansible, therefore, requires a standard way to establish a terminal for the various machines under its control. This can be achieved using a public/private key pairing for SSH authentication. The keys can be generated for an Elliptic Curve Algorithm as follows: ssh-keygen -f ansible -t ecdsa -b 521 The Ansible script to create and match an account to the keys is: YAML --- - name: Bootstrap ansible hosts: all become: true tasks: - name: Add ansible user ansible.builtin.user: name: ansible shell: /bin/bash become: true - name: Add SSH key for ansible ansible.posix.authorized_key: user: ansible key: "{{ lookup('file', 'ansible.pub') }" state: present exclusive: true # to allow revocation # Join the key options with comma (no space) to lock down the account: key_options: "{{ ','.join([ 'no-agent-forwarding', 'no-port-forwarding', 'no-user-rc', 'no-x11-forwarding' ]) }" # noqa jinja[spacing] become: true - name: Configure sudoers community.general.sudoers: name: ansible user: ansible state: present commands: ALL nopassword: true runas: ALL # ansible user should be able to impersonate someone else become: true Ansible is declarative, and this snippet depicts a series of tasks that ensure that: The Ansible user exists; The keys are added for SSH authentication and The Ansible user can execute with elevated privilege using sudo Towards the top is something very important, and it might go unnoticed under a cursory gaze: hosts: all What does this mean? The answer to this puzzle can be easily explained at the hand of the Ansible inventory file: YAML masters: hosts: host1: ansible_host: "192.168.68.116" ansible_connection: ssh ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible comasters: hosts: co-master_vivobook: ansible_connection: ssh ansible_host: "192.168.68.109" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible workers: hosts: client1: ansible_connection: ssh ansible_host: "192.168.68.115" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible client2: ansible_connection: ssh ansible_host: "192.168.68.130" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible It is the register of all machines the Ansible project is responsible for. Since our example project concerns a high availability K8s cluster, it consists of sections for the master, co-masters, and workers. Each section can contain more than one machine. The root-enabled account atmin on display here was created by Ubuntu during installation. The answer to the question should now be clear — the host key above specifies that every machine in the cluster will have an account called Ansible created according to the specification of the YAML. The command to run the script is: ansible-playbook --ask-pass bootstrap/bootstrap.yml -i atomika/atomika_inventory.yml -K The locations of the user bootstrapping YAML and the inventory files are specified. The command, furthermore, requests password authentication for the user from the inventory file. The -K switch, on its turn, asks that the superuser password be prompted. It is required by tasks that are specified to be run as root. It can be omitted should the script run from the root. Upon successful completion, one should be able to login to the machines using the private key of the ansible user: ssh ansible@172.28.110.233 -i ansible Note that since this account is not for human use, the bash shell is not enabled. Nevertheless, one can access the home of root (/root) using 'sudo ls /root' The user account can now be changed to ansible and the location of the private key added for each machine in the inventory file: YAML host1: ansible_host: "192.168.68.116" ansible_connection: ssh ansible_user: ansible ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible One Master To Rule Them All We are now ready to boot the K8s master: ansible-playbook atomika/k8s_master_init.yml -i atomika/atomika_inventory.yml --extra-vars='kubectl_user=atmin' --extra-vars='control_plane_ep=192.168.68.119' The content of atomika/k8s_master_init.yml is: YAML # k8s_master_init.yml - hosts: masters become: yes become_method: sudo become_user: root gather_facts: yes connection: ssh roles: - atomika_base vars_prompt: - name: "control_plane_ep" prompt: "Enter the DNS name of the control plane load balancer?" private: no - name: "kubectl_user" prompt: "Enter the name of the existing user that will execute kubectl commands?" private: no tasks: - name: Initializing Kubernetes Cluster become: yes # command: kubeadm init --pod-network-cidr 10.244.0.0/16 --control-plane-endpoint "{{ ansible_eno1.ipv4.address }:6443" --upload-certs command: kubeadm init --pod-network-cidr 10.244.0.0/16 --control-plane-endpoint "{{ control_plane_ep }:6443" --upload-certs #command: kubeadm init --pod-network-cidr 10.244.0.0/16 --upload-certs run_once: true #delegate_to: "{{ k8s_master_ip }" - pause: seconds=30 - name: Create directory for kube config of {{ ansible_user }. become: yes file: path: /home/{{ ansible_user }/.kube state: directory owner: "{{ ansible_user }" group: "{{ ansible_user }" mode: 0755 - name: Copy /etc/kubernetes/admin.conf to user home directory /home/{{ ansible_user }/.kube/config. copy: src: /etc/kubernetes/admin.conf dest: /home/{{ ansible_user }/.kube/config remote_src: yes owner: "{{ ansible_user }" group: "{{ ansible_user }" mode: '0640' - pause: seconds=30 - name: Remove the cache directory. file: path: /home/{{ ansible_user }/.kube/cache state: absent - name: Create directory for kube config of {{ kubectl_user }. become: yes file: path: /home/{{ kubectl_user }/.kube state: directory owner: "{{ kubectl_user }" group: "{{ kubectl_user }" mode: 0755 - name: Copy /etc/kubernetes/admin.conf to user home directory /home/{{ kubectl_user }/.kube/config. copy: src: /etc/kubernetes/admin.conf dest: /home/{{ kubectl_user }/.kube/config remote_src: yes owner: "{{ kubectl_user }" group: "{{ kubectl_user }" mode: '0640' - pause: seconds=30 - name: Remove the cache directory. file: path: /home/{{ kubectl_user }/.kube/cache state: absent - name: Create Pod Network & RBAC. become_user: "{{ ansible_user }" become_method: sudo become: yes command: "{{ item }" with_items: kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml - pause: seconds=30 - name: Configure kubectl command auto-completion for {{ ansible_user }. lineinfile: dest: /home/{{ ansible_user }/.bashrc line: 'source <(kubectl completion bash)' insertafter: EOF - name: Configure kubectl command auto-completion for {{ kubectl_user }. lineinfile: dest: /home/{{ kubectl_user }/.bashrc line: 'source <(kubectl completion bash)' insertafter: EOF ... From the host keyword, one can see these tasks are only enforced on the master node. However, two things are worth explaining. The Way Ansible Roles The first is the inclusion of the atomika_role towards the top: YAML roles: - atomika_base The official Ansible documentation states that: "Roles let you automatically load related vars, files, tasks, handlers, and other Ansible artifacts based on a known file structure." The atomika_base role is included in all three of the Ansible YAML scripts that maintain the master, co-masters, and workers of the cluster. Its purpose is to lay the base by making sure that tasks common to all three member types have been executed. As stated above, an ansible role follows a specific directory structure that can contain file templates, tasks, and variable declaration, amongst other things. The Kubernetes and ContainerD versions are, for example, declared in the YAML of variables: YAML k8s_version: 1.28.2-00 containerd_version: 1.6.24-1 In short, therefore, development can be fast-tracked through the use of roles developed by the Ansible community that open-sourced it at Ansible Galaxy. Dealing the Difference The second thing of interest is that although variables can be passed in from the command line using the --extra-vars switch, as can be seen, higher up, Ansible can also be programmed to prompt when a value is not set: YAML vars_prompt: - name: "control_plane_ep" prompt: "Enter the DNS name of the control plane load balancer?" private: no - name: "kubectl_user" prompt: "Enter the name of the existing user that will execute kubectl commands?" private: no Here, prompts are specified to ask for the user that should have kubectl access and the IP address of the control plane. Should the script execute without error, the state of the cluster should be: atmin@kxsmaster2:~$ kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-flannel kube-flannel-ds-mg8mr 1/1 Running 0 114s 192.168.68.111 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-bkzgd 1/1 Running 0 3m31s 10.244.0.6 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-vzkw2 1/1 Running 0 3m31s 10.244.0.7 kxsmaster2 <none> <none> kube-system etcd-kxsmaster2 1/1 Running 0 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-apiserver-kxsmaster2 1/1 Running 0 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-controller-manager-kxsmaster2 1/1 Running 7 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-proxy-69cqq 1/1 Running 0 3m32s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-scheduler-kxsmaster2 1/1 Running 7 3m45s 192.168.68.111 kxsmaster2 <none> <none> All the pods required to make up the control plane run on the one master node. Should you wish to run a single-node cluster for development purposes, do not forget to remove the taint that prevents scheduling on the master node(s). kubectl taint node --all node-role.kubernetes.io/control-plane:NoSchedule- However, a cluster consisting of one machine is not a true cluster. This will be addressed next. Kubelets of the Cluster, Unite! Kubernetes, as an orchestration automaton, needs to be resilient by definition. Consequently, developers and a buggy CI/CD pipeline should not touch the master nodes by scheduling load on it. Therefore, Kubernetes increases resilience by expecting multiple worker nodes to join the cluster and carry the load: ansible-playbook atomika/k8s_workers.yml -i atomika/atomika_inventory.yml The content of k8x_workers.yml is: YAML # k8s_workers.yml --- - hosts: workers, vmworkers remote_user: "{{ ansible_user }" become: yes become_method: sudo gather_facts: yes connection: ssh roles: - atomika_base - hosts: masters tasks: - name: Get the token for joining the nodes with Kuberenetes master. become_user: "{{ ansible_user }" shell: kubeadm token create --print-join-command register: kubernetes_join_command - name: Generate the secret for joining the nodes with Kuberenetes master. become: yes shell: kubeadm init phase upload-certs --upload-certs register: kubernetes_join_secret - name: Copy join command to local file. become: false local_action: copy content="{{ kubernetes_join_command.stdout_lines[0] } --certificate-key {{ kubernetes_join_secret.stdout_lines[2] }" dest="/tmp/kubernetes_join_command" mode=0700 - hosts: workers, vmworkers #remote_user: k8s5gc #become: yes #become_metihod: sudo become_user: root gather_facts: yes connection: ssh tasks: - name: Copy join command to worker nodes. become: yes become_method: sudo become_user: root copy: src: /tmp/kubernetes_join_command dest: /tmp/kubernetes_join_command mode: 0700 - name: Join the Worker nodes with the master. become: yes become_method: sudo become_user: root command: sh /tmp/kubernetes_join_command register: joined_or_not - debug: msg: "{{ joined_or_not.stdout }" ... There are two blocks of tasks — one with tasks to be executed on the master and one with tasks for the workers. This ability of Ansible to direct blocks of tasks to different member types is vital for cluster formation. The first block extracts and augments the join command from the master, while the second block executes it on the worker nodes. The top and bottom portions from the console output can be seen here: YAML janrb@dquick:~/atomika$ ansible-playbook atomika/k8s_workers.yml -i atomika/atomika_inventory.yml [WARNING]: Could not match supplied host pattern, ignoring: vmworkers PLAY [workers, vmworkers] ********************************************************************************************************************************************************************* TASK [Gathering Facts] ************************************************************************************************************************************************************************ok: [client1] ok: [client2] ........................................................................... TASK [debug] **********************************************************************************************************************************************************************************ok: [client1] => { "msg": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Starting the kubelet\n[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...\n\nThis node has joined the cluster:\n* Certificate signing request was sent to apiserver and a response was received.\n* The Kubelet was informed of the new secure connection details.\n\nRun 'kubectl get nodes' on the control-plane to see this node join the cluster." } ok: [client2] => { "msg": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Starting the kubelet\n[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...\n\nThis node has joined the cluster:\n* Certificate signing request was sent to apiserver and a response was received.\n* The Kubelet was informed of the new secure connection details.\n\nRun 'kubectl get nodes' on the control-plane to see this node join the cluster." } PLAY RECAP ************************************************************************************************************************************************************************************client1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 client1 : ok=23 changed=6 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 client2 : ok=23 changed=6 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 host1 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Four tasks were executed on the master node to determine the join command, while 23 commands ran on each of the two clients to ensure they were joined to the cluster. The tasks from the atomika-base role accounts for most of the worker tasks. The cluster now consists of the following nodes, with the master hosting the pods making up the control plane: atmin@kxsmaster2:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8xclient1 Ready <none> 23m v1.28.2 192.168.68.116 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 kxsclient2 Ready <none> 23m v1.28.2 192.168.68.113 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 kxsmaster2 Ready control-plane 34m v1.28.2 192.168.68.111 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 With Nginx deployed, the following pods will be running on the various members of the cluster: atmin@kxsmaster2:~$ kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-7854ff8877-g8lvh 1/1 Running 0 20s 10.244.1.2 kxsclient2 <none> <none> kube-flannel kube-flannel-ds-4dgs5 1/1 Running 1 (8m58s ago) 26m 192.168.68.116 k8xclient1 <none> <none> kube-flannel kube-flannel-ds-c7vlb 1/1 Running 1 (8m59s ago) 26m 192.168.68.113 kxsclient2 <none> <none> kube-flannel kube-flannel-ds-qrwnk 1/1 Running 0 35m 192.168.68.111 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-pqp2s 1/1 Running 0 37m 10.244.0.9 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-rh577 1/1 Running 0 37m 10.244.0.8 kxsmaster2 <none> <none> kube-system etcd-kxsmaster2 1/1 Running 1 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-apiserver-kxsmaster2 1/1 Running 1 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-controller-manager-kxsmaster2 1/1 Running 8 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-proxy-bdzlv 1/1 Running 1 (8m58s ago) 26m 192.168.68.116 k8xclient1 <none> <none> kube-system kube-proxy-ln4fx 1/1 Running 1 (8m59s ago) 26m 192.168.68.113 kxsclient2 <none> <none> kube-system kube-proxy-ndj7w 1/1 Running 0 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-scheduler-kxsmaster2 1/1 Running 8 37m 192.168.68.111 kxsmaster2 <none> <none> All that remains is to expose the Nginx pod using an instance of NodePort, LoadBalancer, or Ingress to the outside world. Maybe more on that in another article... Conclusion This posting explained the basic concepts of Ansible at the hand of scripts booting up a K8s cluster. The reader should now grasp enough concepts to understand tutorials and search engine results and to make a start at using Ansible to set up infrastructure using code.