Development and programming tools are used to build frameworks, and they can be used for creating, debugging, and maintaining programs — and much more. The resources in this Zone cover topics such as compilers, database management systems, code editors, and other software tools and can help ensure engineers are writing clean code.
Exploring Apache Ignite With Spring Boot
Unveiling the Secret: Achieving 50K Concurrent User Load Using JMeter With 2.5G RAM Only
Docker is a compelling platform to package and run web applications, especially when paired with one of the many Platform-as-a-Service (PaaS) offerings provided by cloud platforms. NGINX has long provided DevOps teams with the ability to host web applications on Linux and also provides an official Docker image to use as the base for custom web applications. In this post, I explain how DevOps teams can use the NGINX Docker image to build and run web applications on Docker. Getting Started With the Base Image NGINX is a versatile tool with many uses, including a load balancer, reverse proxy, and network cache. However, when running NGINX in a Docker container, most of these high-level functions are delegated to other specialized platforms or other instances of NGINX. Typically, NGINX fulfills the function of a web server when running in a Docker container. To create an NGINX container with the default website, run the following command: docker run -p 8080:80 nginx This command will download the nginx image (if it hasn't already been downloaded) and create a container exposing port 80 in the container to port 8080 on the host machine. You can then open http://localhost:8080/index.html to view the default "Welcome to nginx!" website. To allow the NGINX container to expose custom web assets, you can mount a local directory inside the Docker container. Save the following HTML code to a file called index.html: <html> <body> Hello from Octopus! </body> </html> Next, run the following command to mount the current directory under /usr/share/nginx/html inside the NGINX container with read-only access: docker run -v $(pwd):/usr/share/nginx/html:ro -p 8080:80 nginx Open http://localhost:8080/index.html again and you see the custom HTML page displayed. One of the benefits of Docker images is the ability to bundle all related files into a single distributable artifact. To realize this benefit, you must create a new Docker image based on the NGINX image. Creating Custom Images Based on NGINX To create your own Docker image, save the following text to a file called Dockerfile: FROM nginx COPY index.html /usr/share/nginx/html/index.html Dockerfile contains instructions for building a custom Docker image. Here you use the FROM command to base your image on the NGINX one, and then use the COPY command to copy your index.html file into the new image under the /usr/share/nginx/html directory. Build the new image with the command: docker build . -t mynginx This builds a new image called mynginx. Run the new image with the command: docker run -p 8080:80 mynginx Note that you didn't mount any directories this time. However, when you open http://localhost:8080/index.html your custom HTML page is displayed because it was embedded in your custom image. NGINX is capable of much more than hosting static files. To unlock this functionality, you must use custom NGINX configuration files. Advanced NGINX Configuration NGINX exposes its functionality via configuration files. The default NGINX image comes with a simple default configuration file designed to host static web content. This file is located at /etc/nginx/nginx.conf in the default image, and has the following contents: user nginx; worker_processes auto; error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; #tcp_nopush on; keepalive_timeout 65; #gzip on; include /etc/nginx/conf.d/*.conf; } There's no need to understand this configuration file in detail, but there is one line of interest that instructs NGINX to load additional configuration files from the /etc/nginx/conf.d directory: include /etc/nginx/conf.d/*.conf; The default /etc/nginx/conf.d file configures NGINX to function as a web server. Specifically the location / block-loading files from /usr/share/nginx/html is why you mounted your HTML files to that directory previously: server { listen 80; server_name localhost; #access_log /var/log/nginx/host.access.log main; location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } You can take advantage of the instructions to load any *.conf configuration files in /etc/nginx to customize NGINX. In this example, you add a health check via a custom location listening on port 90 that responds to requests to the /nginx-health path with an HTTP 200 OK. Save the following text to a file called health-check.conf: server { listen 90; server_name localhost; location /nginx-health { return 200 "healthy\n"; add_header Content-Type text/plain; } } Modify the Dockerfile to copy the configuration file to /etc/nginx/conf.d: FROM nginx COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build the image with the command: docker build . -t mynginx Run the new image with the command. Note the new port exposed on 9090: docker run -p 8080:80 -p 9090:90 mynginx Now open http://localhost:9090/nginx-health. The health check response is returned to indicate that the web server is up and running. The examples above base your custom images on the default nginx image. However, there are other variants that provide much smaller image sizes without sacrificing any functionality. Choosing NGINX Variants The default nginx image is based on Debian. However, NGINX also provides images based on Alpine. Alpine is frequently used as a lightweight base for Docker images. To view the sizes of Docker images, they must first be pulled down to your local workstation: docker pull nginx docker pull nginx:stable-alpine You can then find the image sizes with the command: docker image ls From this, you can see the Debian image weighs around 140 MB while the Alpine image weighs around 24 MB. This is quite a saving in image sizes. To base your images on the Alpine variant, you need to update the Dockerfile: FROM nginx:stable-alpine COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build and run the image with the commands: docker build . -t mynginx docker run -p 8080:80 -p 9090:90 mynginx Once again, open http://localhost:9090/nginx-health or http://localhost:8080/index.html to view the web pages. Everything continues to work as it did previously, but your custom image is now much smaller. Conclusion NGINX is a powerful web server, and the official NGINX Docker image allows DevOps teams to host custom web applications in Docker. NGINX also supports advanced scenarios thanks to its ability to read configuration files copied into a custom Docker image. In this post, you learned how to create a custom Docker image hosting a static web application, added advanced NGINX configuration files to provide a health check endpoint, and compared the sizes of Debian and Alpine NGINX images. Resources NGINX Docker image source code Dockerfile reference Happy deployments!
What Is Kubernetes RBAC? Often, when organizations start their Kubernetes journey, they look up to implementing least privilege roles and proper authorization to secure their infrastructure. That’s where Kubernetes RBAC is implemented to secure Kubernetes resources such as sensitive data, including deployment details, persistent storage settings, and secrets. Kubernetes RBAC provides the ability to control who can access each API resource with what kind of access. You can use RBAC for both human (individual or group) and non-human users (service accounts) to define their types of access to various Kubernetes resources. For example, there are three different environments, Dev, Staging, and Production, which have to be given access to the team, such as developers, DevOps, SREs, App owners, and product managers. Before we get started, we would like to stress that we will treat users and service accounts as the same, from a level of abstraction- every request, either from a user or a service account, is finally an HTTP request. Yes, we understand users and service accounts (for non-human users) are different in nature in Kubernetes. How To Enable Kubernetes RBAC One can enable RBAC in Kubernetes by starting the API server with an authorization-mode flag on. Kubernetes resources used to apply RBAC on users are: Role, ClusterRole, RoleBinding, ClusterRoleBinding Service Account To manage users, Kubernetes provides an authentication mechanism, but it is usually advisable to integrate Kubernetes with your enterprise identity management for users such as Active Directory or LDAP. When it comes to non-human users (or machines or services) in a Kubernetes cluster, the concept of a Service Account comes into the picture. For example, The Kubernetes resources need to be accessed by a CD application such as Spinnaker or Argo to deploy applications, or one pod of service A needs to talk to another pod of service B. In such cases, a Service Account is used to create an account of a non-human user and specify the required authorization (using RoleBinding or ClusterRoleBinding). You can create a Service Account by creating a yaml like the below: YAML apiVersion: v1 kind: ServiceAccount metadata: name: nginx-sa spec: automountServiceAccountToken: false And then apply it. Shell $ kubectl apply -f nginx-sa.yaml serviceaccount/nginx-sa created And now you have to ServiceAccount for pods in the Deployments resource. YAML kind: Deployment metadata: name: nginx1 labels: app: nginx1 spec: replicas: 2 selector: matchLabels: app: nginx1 template: metadata: labels: app: nginx1 spec: serviceAccountName: nginx-sa containers: - name: nginx1 image: nginx ports: - containerPort: 80 In case you don’t specify about serviceAccountName in the Deployment resources, then the pods will belong to the default Service Account. Note there is a default Service Account for each namespace and one for clusters. All the default authorization policies as per the default Service Account will be applied to the pods where Service Account info is not mentioned. In the next section, we will see how to assign various permissions to a Service Account using RoleBinding and ClusterRoleBinding. Role and ClusterRole Role and ClusterRole are the Kubernetes resources used to define the list of actions a user can perform within a namespace or a cluster, respectively. In Kubernetes, the actors, such as users, groups, or ServiceAccount, are called subjects. A subject's actions, such as create, read, write, update, and delete, are called verbs. YAML apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: read-only namespace: dev-namespace rules: - apiGroups: - "" resources: ["*"] verbs: - get - list - watch In the above Role resource, we have specified that the read-only role is only applicable to the deb-ns namespace and to all the resources inside the namespace. Any ServiceAccount or users that would be bound to the read-only role can take these actions- get, list, and watch. Similarly, the ClusterRole resource will allow you to create roles pertinent to clusters. An example is given below: YAML apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: chief-role rules: - apiGroups: - "" resources: ["*"] verbs: - get - list - watch - create - update - patch - delete Any user/group/ServiceAccount bound to the chief-role will be able to take any action in the cluster. In the next section, we will see how to grant roles to subjects using RoleBinding and ClusterRoleBinding. Also, note Kubernetes allows you to configure custom roles using Role resources or use default user-facing roles such as the following: Cluster-admin: For cluster administrators, Kubernetes provides a superuser Role. The Cluster admin can perform any action on any resource in a cluster. One can use a superuser in a ClusterRoleBinding to grant full control over every resource in the cluster (and in all namespaces) or in a RoleBinding to grant full control over every resource in the respective namespace. Admin: Kubernetes provides an admin Role to permit unlimited read/write access to resources within a namespace. admin role can create roles and role bindings within a particular namespace. It does not permit write access to the namespace itself. This can be used in the RoleBinding resource. Edit: edit role grants read/write access within a given Kubernetes namespace. It cannot view or modify roles or role bindings. View: view role allows read-only access within a given namespace. It does not allow viewing or modifying of roles or role bindings. RoleBinding and ClusterRoleBinding To apply the Role to a subject (user/group/ServiceAccount), you must define a RoleBinding. This will give the user the least privileged access to required resources within the namespace with the permissions defined in the Role configuration. YAML apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: Role-binding-dev roleRef: kind: Role name: read-only #The role name you defined in the Role configuration apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: Roy #The name of the user to give the role to apiGroup: rbac.authorization.k8s.io - kind: ServiceAccount name: nginx-sa#The name of the ServiceAccount to give the role to apiGroup: rbac.authorization.k8s.io Similarly, ClusterRoleBinding resources can be created to define the Role of users. Note we have used the default superuser ClusterRole reference provided by Kubernetes instead of using our custom role. This can be applied to cluster administrators. YAML apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: superuser-binding roleRef: kind: ClusterRole name: superuser apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: Aditi apiGroup: rbac.authorization.k8s.io Benefits of Kubernetes RBAC The advantage of Kubernetes RBAC is it allows you to “natively” implement the least privileges to various users and machines in your cluster. The key benefits are: Proper Authorization With the least privileges to various users and Service Accounts to Kubernetes resources, DevOps and architects can implement one of the main pillars of zero trust. Organizations can reduce the risk of data breaches and data leakage and also avoid internal employees accidentally deleting or manipulating any critical resources. Separation of Duties Applying RBAC on Kubernetes resources will always facilitate separation of duties of users such as developers, DevOps, testers, SREs, etc., in an organization. For e.g., for creating/deleting a new resource in a dev environment, developers should not depend on admin. Similarly, deploying new applications into test servers and deleting the pods after testing should not be a bottleneck for DevOps or testers. Applying authorization and permissions to users such as developers and CI/CD deployment agents into respective workspaces (say namespaces or clusters) will decrease the dependencies and cut the slack. 100% Adherence to Compliance Many industry regulations, such as HIPAA, GDPR, SOX, etc., demand tight authentication and authorization mechanisms in the software field. Using Kubernetes RBAC, DevOps, and architects can quickly implement RBAC into their Kubernetes cluster and improve their posture to adhere to those standards. Disadvantages of Kubernetes RBAC For small and medium enterprises, using Kubernetes RBAC is justified, but it is not advisable to use Kubernetes RBAC for the below reasons: There can be many users and machines, and applying Kubernetes RBAC can be cumbersome to implement and maintain. Granular visibility of who performed what operation is difficult. For example, large enterprises would require information such as violations or malicious attempts against RBAC permissions.
In this article, we delve into the exciting realm of containerizing Helidon applications, followed by deploying them effortlessly to a Kubernetes environment. To achieve this, we'll harness the power of JKube’s Kubernetes Maven Plugin, a versatile tool for Java applications for Kubernetes deployments that has recently been updated to version 1.14.0. What's exciting about this release is that it now supports the Helidon framework, a Java Microservices gem open-sourced by Oracle in 2018. If you're curious about Helidon, we've got some blog posts to get you up to speed: Building Microservices With Oracle Helidon Ultra-Fast Microservices: When MicroStream Meets Helidon Helidon: 2x Productivity With Microprofile REST Client In this article, we will closely examine the integration between JKube’s Kubernetes Maven Plugin and Helidon. Here's a sneak peek of the exciting journey we'll embark on: We'll kick things off by generating a Maven application from Helidon Starter Transform your Helidon application into a nifty Docker image. Craft Kubernetes YAML manifests tailored for your Helidon application. Apply those manifests to your Kubernetes cluster. We'll bundle those Kubernetes YAML manifests into a Helm Chart. We'll top it off by pushing that Helm Chart to a Helm registry. Finally, we'll deploy our Helidon application to Red Hat OpenShift. An exciting aspect worth noting is that JKube’s Kubernetes Maven Plugin can be employed with previous versions of Helidon projects as well. The only requirement is to provide your custom image configuration. With this latest release, Helidon users can now easily generate opinionated container images. Furthermore, the plugin intelligently detects project dependencies and seamlessly incorporates Kubernetes health checks into the generated manifests, streamlining the deployment process. Setting up the Project You can either use an existing Helidon project or create a new one from Helidon Starter. If you’re on JDK 17 use 3.x version of Helidon. Otherwise, you can stick to Helidon 2.6.x which works with older versions of Java. In the starter form, you can choose either Helidon SE or Helidon Microprofile, choose application type, and fill out basic details like project groupId, version, and artifactId. Once you’ve set your project, you can add JKube’s Kubernetes Maven Plugin to your pom.xml: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> </plugin> Also, the plugin version is set to 1.14.0, which is the latest version at the time of writing. You can check for the latest version on the Eclipse JKube releases page. It’s not really required to add the plugin if you want to execute it directly from some CI pipeline. You can just provide a fully qualified name of JKube’s Kubernetes Maven Plugin while issuing some goals like this: Shell $ mvn org.eclipse.jkube:kubernetes-maven-plugin:1.14.0:resource Now that we’ve added the plugin to the project, we can start using it. Creating Container Image (JVM Mode) In order to build a container image, you do not need to provide any sort of configuration. First, you need to build your project. Shell $ mvn clean install Then, you just need to run k8s:build goal of JKube’s Kubernetes Maven Plugin. By default, it builds the image using the Docker build strategy, which requires access to a Docker daemon. If you have access to a docker daemon, run this command: Shell $ mvn k8s:build If you don’t have access to any docker daemon, you can also build the image using the Jib build strategy: Shell $ mvn k8s:build -Djkube.build.strategy=jib You will notice that Eclipse JKube has created an opinionated container image for your application based on your project configuration. Here are some key points about JKube’s Kubernetes Maven Plugin to observe in this zero configuration mode: It used quay.io/jkube/jkube-java as a base image for the container image It added some labels to the container image (picked from pom.xml) It exposed some ports in the container image based on the project configuration It automatically copied relevant artifacts and libraries required to execute the jar in the container environment. Creating Container Image (Native Mode) In order to create a container image for the native executable, we need to generate the native executable first. In order to do that, let’s build our project in the native-image profile (as specified in Helidon GraalVM Native Image documentation): Shell $ mvn package -Pnative-image This creates a native executable file in the target folder of your project. In order to create a container image based on this executable, we just need to run k8s:build goal but also specify native-image profile: Shell $ mvn k8s:build -Pnative-image Like JVM mode, Eclipse JKube creates an opinionated container image but uses a lightweight base image: registry.access.redhat.com/ubi8/ubi-minimal and exposes only the required ports by application. Customizing Container Image as per Requirements Creating a container image with no configuration is a really nice way to get started. However, it might not suit everyone’s use case. Let’s take a look at how to configure various aspects of the generated container image. You can override basic aspects of the container image with some properties like this: Property Name Description jkube.generator.name Change Image Name jkube.generator.from Change Base Image jkube.generator.tags A comma-separated value of additional tags for the image If you want more control, you can provide a complete XML configuration for the image in the plugin configuration section: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>${jkube.version}</version> <configuration> <images> <image> <name>${project.artifactId}:${project.version}</name> <build> <from>openjdk:11-jre-slim</from> <ports>8080</ports> <assembly> <mode>dir</mode> <targetDir>/deployments</targetDir> <layers> <layer> <id>lib</id> <fileSets> <fileSet> <directory>${project.basedir}/target/libs</directory> <outputDirectory>libs</outputDirectory> <fileMode>0640</fileMode> </fileSet> </fileSets> </layer> <layer> <id>app</id> <files> <file> <source>${project.basedir}/target/${project.artifactId}.jar</source> <outputDirectory>.</outputDirectory> </file> </files> </layer> </layers> </assembly> <cmd>java -jar /deployments/${project.artifactId}.jar</cmd> </build> </image> </images> </configuration> </plugin> The same is also possible by providing your own Dockerfile in the project base directory. Kubernetes Maven Plugin automatically detects it and builds a container image based on its content: Dockerfile FROM openjdk:11-jre-slim COPY maven/target/helidon-quickstart-se.jar /deployments/ COPY maven/target/libs /deployments/libs CMD ["java", "-jar", "/deployments/helidon-quickstart-se.jar"] EXPOSE 8080 Pushing the Container Image to Quay.io: Once you’ve built a container image, you most likely want to push it to some public or private container registry. Before pushing the image, make sure you’ve renamed your image to include the registry name and registry user. If I want to push an image to Quay.io in the namespace of a user named rokumar, this is how I would need to rename my image: Shell $ mvn k8s:build -Djkube.generator.name=quay.io/rokumar/%a:%v %a and %v correspond to project artifactId and project version. For more information, you can check the Kubernetes Maven Plugin Image Configuration documentation. Once we’ve built an image with the correct name, the next step is to provide credentials for our registry to JKube’s Kubernetes Maven Plugin. We can provide registry credentials via the following sources: Docker login Local Maven Settings file (~/.m2/settings.xml) Provide it inline using jkube.docker.username and jkube.docker.password properties Once you’ve configured your registry credentials, you can issue the k8s:push goal to push the image to your specified registry: Shell $ mvn k8s:push Generating Kubernetes Manifests In order to generate opinionated Kubernetes manifests, you can use k8s:resource goal from JKube’s Kubernetes Maven Plugin: Shell $ mvn k8s:resource It generates Kubernetes YAML manifests in the target directory: Shell $ ls target/classes/META-INF/jkube/kubernetes helidon-quickstart-se-deployment.yml helidon-quickstart-se-service.yml JKube’s Kubernetes Maven Plugin automatically detects if the project contains io.helidon:helidon-health dependency and adds liveness, readiness, and startup probes: YAML $ cat target/classes/META-INF/jkube/kubernetes//helidon-quickstart-se-deployment. yml | grep -A8 Probe livenessProbe: failureThreshold: 3 httpGet: path: /health/live port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 -- readinessProbe: failureThreshold: 3 httpGet: path: /health/ready port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 Applying Kubernetes Manifests JKube’s Kubernetes Maven Plugin provides k8s:apply goal that is equivalent to kubectl apply command. It just applies the resources generated by k8s:resource in the previous step. Shell $ mvn k8s:apply Packaging Helm Charts Helm has established itself as the de facto package manager for Kubernetes. You can package generated manifests into a Helm Chart and apply it on some other cluster using Helm CLI. You can generate a Helm Chart of generated manifests using k8s:helm goal. The interesting thing is that JKube’s Kubernetes Maven Plugin doesn’t rely on Helm CLI for generating the chart. Shell $ mvn k8s:helm You’d notice Helm Chart is generated in target/jkube/helm/ directory: Shell $ ls target/jkube/helm/helidon-quickstart-se/kubernetes Chart.yaml helidon-quickstart-se-0.0.1-SNAPSHOT.tar.gz README.md templates values.yaml Pushing Helm Charts to Helm Registries Usually, after generating a Helm Chart locally, you would want to push it to some Helm registry. JKube’s Kubernetes Maven Plugin provides k8s:helm-push goal for achieving this task. But first, we need to provide registry details in plugin configuration: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> <configuration> <helm> <snapshotRepository> <name>ChartMuseum</name> <url>http://example.com/api/charts</url> <type>CHARTMUSEUM</type> <username>user1</username> </snapshotRepository> </helm> </configuration> </plugin> JKube’s Kubernetes Maven Plugin supports pushing Helm Charts to ChartMuseum, Nexus, Artifactory, and OCI registries. You have to provide the applicable Helm repository type and URL. You can provide the credentials via environment variables, properties, or ~/.m2/settings.xml. Once you’ve all set up, you can run k8s:helm-push goal to push chart: Shell $ mvn k8s:helm-push -Djkube.helm.snapshotRepository.password=yourpassword Deploying To Red Hat OpenShift If you’re deploying to Red Hat OpenShift, you can use JKube’s OpenShift Maven Plugin to deploy your Helidon application to an OpenShift cluster. It contains some add-ons specific to OpenShift like S2I build strategy, support for Routes, etc. You also need to add the JKube’s OpenShift Maven Plugin plugin to your pom.xml. Maybe you can add it in a separate profile: XML <profile> <id>openshift</id> <build> <plugins> <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>openshift-maven-plugin</artifactId> <version>${jkube.version}</version> </plugin> </plugins> </build> </profile> Then, you can deploy the application with a combination of these goals: Shell $ mvn oc:build oc:resource oc:apply -Popenshift Conclusion In this article, you learned how smoothly you can deploy your Helidon applications to Kubernetes using Eclipse JKube’s Kubernetes Maven Plugin. We saw how effortless it is to package your Helidon application into a container image and publish it to some container image registry. We can alternatively generate Helm Charts of our Kubernetes YAML manifests and publish Helm Charts to some Helm registry. In the end, we learned about JKube’s OpenShift Maven Plugin, which is specifically designed for Red Hat OpenShift users who want to deploy their Helidon applications to Red Hat OpenShift. You can find the code used in this blog post in this GitHub repository. In case you’re interested in knowing more about Eclipse JKube, you can check these links: Documentation Github Issue Tracker StackOverflow YouTube Channel Twitter Gitter Chat
With the advent of cloud computing, managing network traffic and ensuring optimal performance have become critical aspects of system architecture. Amazon Web Services (AWS), a leading cloud service provider, offers a suite of load balancers to manage network traffic effectively for applications running on its platform. Two such offerings are the Application Load Balancer (ALB) and Network Load Balancer (NLB). This extensive guide aims to provide an in-depth comparison between these two types of load balancers, helping you choose the most suitable option for your application's needs. Overview The primary role of a load balancer is to distribute network traffic evenly among multiple servers or 'targets' to ensure smooth performance and prevent any single server from being overwhelmed. AWS provides three types of load balancers: Classic Load Balancer (CLB), Application Load Balancer (ALB), and Network Load Balancer (NLB). The ALB operates at Layer 7 of the OSI model, handling HTTP/HTTPS traffic. It offers advanced request routing based on the content of the request, making it ideal for complex web applications. On the other hand, the NLB operates at Layer 4, dealing with TCP traffic. It's designed for extreme performance and low latencies, offering static IP addresses per Availability Zone (AZ). Choosing the right load balancer is crucial as it directly impacts your application’s performance, availability, security, and cost. For instance, if your application primarily handles HTTP requests and requires sophisticated routing rules, an ALB would be more appropriate. Conversely, if your application requires high throughput, low latency, or a static IP address, you should opt for an NLB. Fundamentals of Load Balancing The Network Load Balancer is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. Unpredictable traffic patterns do not affect its performance, thanks to its ability to handle sudden and volatile traffic. Furthermore, it supports long-lived TCP connections that are ideal for WebSocket-type applications. The Application Load Balancer, on the other hand, is best suited for load balancing HTTP and HTTPS traffic. It operates at the request level, allowing advanced routing, microservices, and container-based architecture. It can route requests to different services based on the content of the request, which is ideal for modern, complex web applications. Key Features and Capabilities The NLB provides several important features, such as static IP support, zonal isolation, and low-latency performance. It distributes traffic across multiple targets within one or more AZs, ensuring a robust and reliable performance. Furthermore, it offers connection multiplexing and stickiness, enabling efficient utilization of resources. On the other hand, the ALB comes with built-in features like host and path-based routing, SSL/TLS decryption, and integration with AWS WAF, protecting your applications from various threats. It also supports advanced routing algorithms, slow start mode for new targets, and integration with container services. These features make it ideal for modern, modular, and microservices-based applications. Both ALB and NLB offer unique advantages. While ALB's strength lies in flexible application management and advanced routing features, NLB shines in areas of extreme performance and support for static IP addresses. It's also worth noting that while ALB can handle HTTP/1, HTTP/2, and gRPC protocols, NLB is designed for lower-level TCP and UDP traffic. Performance and Efficiency NLB excels in terms of performance due to its design. As it operates at the transport layer (Layer 4), it merely forwards incoming TCP or UDP connections to a target without inspecting the details of every request. This makes NLB significantly faster and more efficient in forwarding incoming requests, reducing latency. In contrast, ALB operates at the application layer (Layer 7), inspecting details of every incoming HTTP/HTTPS request. While this introduces a slight overhead compared to NLB, it allows ALB to perform advanced routing based on the content of the request, providing flexibility and control. When it comes to raw performance and low latency, NLB has an advantage due to its simple operation at Layer 4. However, ALB offers additional flexibility and control at Layer 7, which can lead to more efficient request handling in complex applications. Handling Traffic Spikes NLB is designed to handle sudden and massive spikes in traffic without requiring any pre-warming or scaling. This is because NLB does not need to scale the number of nodes processing incoming connections, allowing it to adapt instantly to increased traffic. ALB, on the other hand, adapts to an increase in connections and requests automatically. However, this scaling process takes some time, so during sudden, substantial traffic spikes, ALB might not be able to handle all incoming requests immediately. In such cases, AWS recommends informing them in advance about expected traffic spikes so they can pre-warm the ALB. While both NLB and ALB can handle traffic spikes, NLB's design allows it to respond more quickly to sudden increases in traffic, making it a better choice for applications with unpredictable or highly volatile traffic patterns. However, with proper planning and communication with AWS, ALB can also effectively manage large traffic spikes. Security NLB provides robust security features, including TLS termination and integration with VPC security groups. However, it lacks some advanced security features, such as support for AWS WAF and user authentication, which are available in ALB. ALB offers advanced security features like integration with AWS WAF, SSL/TLS termination, and user authentication using OpenID Connect and SAML. It also allows the creation of custom security policies, making it more flexible in terms of security. Both NLB and ALB offer robust security features, but ALB provides additional flexibility and control with its support for AWS WAF and user authentication. However, the choice between the two should be based on your specific security requirements. If your application primarily deals with HTTP/HTTPS traffic and requires advanced security controls, ALB would be a better choice. On the other hand, for applications requiring high throughput and low latency, NLB might be a more suitable option despite its limited advanced security features. Costs and Pricing The cost of using an NLB is largely dependent on the amount of data processed, the duration of usage, and whether you use additional features like cross-zone load balancing. While NLB pricing is relatively lower than ALB, it can cause more connections and hence, a higher load on targets, potentially leading to increased costs. Like NLB, the cost of ALB is based on the amount of data processed and the duration of usage. However, due to its additional features, ALB generally has a higher cost than NLB. However, it's important to note that ALB's sophisticated routing and management features could lead to more efficient resource usage, potentially offsetting its higher price. While NLB may appear cheaper at first glance, the total cost of operation should take into account the efficiency of resource usage, which is where ALB excels with its advanced routing and management features. Ultimately, the most cost-effective choice will depend on your application's specific needs and architecture. Integration and Compatibility NLB integrates seamlessly with other AWS services, such as AWS Auto Scaling Groups, Amazon EC2 Container Service (ECS), and Amazon EC2 Spot Fleet. It also works well with containerized applications and supports both IPv4 and IPv6 addresses. ALB offers extensive integration options with a wide range of AWS services, including AWS Auto Scaling Groups, Amazon ECS, AWS Fargate, and AWS Lambda. It also supports both IPv4 and IPv6 addresses and integrates with container-based and serverless architectures. Both NLB and ALB integrate seamlessly into existing AWS infrastructure. They support various AWS services, making them versatile choices for different application architectures. However, with its additional features and capabilities, ALB may require slightly more configuration than NLB. Conclusion While both ALB and NLB are powerful tools for managing network traffic in AWS, they cater to different needs and scenarios. ALB operates at the application layer, handling HTTP/HTTPS traffic with advanced request routing capabilities, making it suitable for complex web applications. NLB operates at the transport layer, dealing with TCP/UDP traffic, providing high performance and low latency, making it ideal for applications requiring high throughput. The choice between ALB and NLB depends on your specific application requirements. If your application handles HTTP/HTTPS traffic and requires advanced routing capabilities, ALB is the right choice. If your application requires high performance, low latency, and static IP addresses, then NLB is more suitable. For microservices architecture or container-based applications that require advanced routing and flexible management, go for ALB. For applications requiring high throughput and low latency, such as multiplayer gaming, real-time streaming, or IoT applications, choose NLB. As always, the best choice depends on understanding your application's requirements and choosing the tool that best fits those needs.
In this blog, you will take a closer look at Podman Desktop, a graphical tool when you are working with containers. Enjoy! Introduction Podman is a container engine, just as Docker is. Podman commands are to be executed by means of a CLI (Command Line Interface), but it would come in handy when a GUI would be available. That is exactly the purpose of Podman Desktop! As stated on the Podman Desktop website: “Podman Desktop is an open source graphical tool enabling you to seamlessly work with containers and Kubernetes from your local environment.” In the next sections, you will execute most of the commands as executed in the two previous posts. If you are new to Podman, it is strongly advised to read those two posts first before continuing. Is Podman a Drop-in Replacement for Docker? Podman Equivalent for Docker Compose Sources used in this blog can be found on GitHub. Prerequisites Prerequisites for this blog are: Basic Linux knowledge, Ubuntu 22.04 is used during this blog; Basic Podman knowledge, see the previous blog posts; Podman version 3.4.4 is used in this blog because that is the version available for Ubuntu although the latest stable release is version 4.6.0 at the time of writing. Installation and Startup First of all, Podman Desktop needs to be installed, of course. Go to the downloads page. When using the Download button, a flatpak file will be downloaded. Flatpak is a framework for distributing desktop applications across various Linux distributions. However, this requires you to install flatpak. A tar.gz file is also available for download, so use this one. After downloading, extract the file to /opt: Shell $ sudo tar -xvf podman-desktop-1.2.1.tar.gz -C /opt/ In order to start Podman Desktop, you only need to double-click the podman-desktop file. The Get Started with Podman Desktop screen is shown. Click the Go to Podman Desktop button, which will open the Podman Desktop main screen. As you can see from the screenshot, Podman Desktop detects that Podman is running but also that Docker is running. This is already a nice feature because this means that you can use Podman Desktop for Podman as well as for Docker. At the bottom, a Docker Compatibility warning is shown, indicating that the Docker socket is not available and some Docker-specific tools will not function correctly. But this can be fixed, of course. In the left menu, you can find the following items from top to bottom: the dashboard, the containers, the pods, the images, and the volumes. Build an Image The container image you will try to build consists out of a Spring Boot application. It is a basic application containing one Rest endpoint, which returns a hello message. There is no need to build the application. You do need to download the jar-file and put it into a target directory at the root of the repository. The Dockerfile you will be using is located in the directory podman-desktop. Choose in the left menu the Images tab. Also note that in the screenshot, both Podman images and Docker images are shown. Click the Build an Image button and fill it in as follows: Containerfile path: select file podman-desktop/1-Dockerfile. Build context directory: This is automatically filled out for you with the podman-desktop directory. However, you need to change this to the root of the repository; otherwise, the jar-file is not part of the build context and cannot be found by Podman. Image Name: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Container Engine: Podman Click the Build button. This results in the following error: Shell Uploading the build context from <user directory>/mypodmanplanet...Can take a while... Error:(HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/bError:Error: (HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/build/.git): Check /etc/subuid and /etc/subgid: lchown /var/tmp/libpod_builder2108531042/build/.git: invalid argument This error sounds familiar because the error was also encountered in a previous blog. Let’s try to build the image via the command line: Shell $ podman build . --tag docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f podman-desktop/1-Dockerfile The image is built without any problem. An issue has been raised for this problem. At the time of writing, building an image via Podman Desktop is not possible. Start a Container Let’s see whether you can start the container. Choose in the left menu the Containers tab and click the Create a Container button. A choice menu is shown. Choose an Existing image. The Images tab is shown. Click the Play button on the right for the mypodmanplanet image. A black screen is shown, and no container is started. Start the container via CLI: Shell $ podman run -p 8080:8080 --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The running container is now visible in Podman Desktop. Test the endpoint, and this functions properly. Shell $ curl http://localhost:8080/hello Hello Podman! Same conclusion as for building the image. At the time of writing, it is not possible to start a container via Podman Desktop. What is really interesting is the actions menu. You can view the container logs. The Inspect tab shows you the details of the container. The Kube tab shows you what the Kubernetes deployment yaml file will look like. The Terminal tab gives you access to a terminal inside the container. You can also stop, restart, and remove the container from Podman Desktop. Although starting the container did not work, Podman Desktop offers some interesting features that make it easier to work with containers. Volume Mount Remove the container from the previous section. You will create the container again, but this time with a volume mount to a specific application.properties file, which will ensure that the Spring Boot application runs on port 8082 inside the container. Execute the following command from the root of the repository: Shell $ podman run -p 8080:8082 --volume ./properties/application.properties:/opt/app/application.properties:ro --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The container is started successfully, but an error message is shown in Podman Desktop. This error will show up regularly from now on. Restarting Podman Desktop resolves the issue. An issue has been filed for this problem. Unfortunately, the issue cannot be reproduced consistently. The volume is not shown in the Volumes tab, but that’s because it is an anonymous volume. Let’s create a volume and see whether this shows up in the Volumes tab. Shell $ podman volume create myFirstVolume myFirstVolume The volume is not shown in Podman Desktop. It is available via the command line, however. Shell $ podman volume ls DRIVER VOLUME NAME local myFirstVolume Viewing volumes is not possible with Podman Desktop at the time of writing. Delete the volume. Shell $ podman volume rm myFirstVolume myFirstVolume Create Pod In this section, you will create a Pod containing two containers. The setup is based on the one used for a previous blog. Choose in the left menu the Pods tab and click the Play Kubernetes YAML button. Select the YAML file Dockerfiles/hello-pod-2-with-env.yaml. Click the Play button. The Pod has started. Check the Containers tab, and you will see the three containers which are part of the Pod. Verify whether the endpoints are accessible. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! The Pod can be stopped and deleted via Podman Desktop. Sometimes, Podman Desktop stops responding after deleting the Pod. After a restart of Podman Desktop, the Pod can be deleted without experiencing this issue. Conclusion Podman Desktop is a nice tool with some fine features. However, quite some bugs were encountered when using Podman Desktop (I did not create an issue for all of them). This might be due to the older version of Podman, which is available for Ubuntu, but then I would have expected that an incompatibility warning would be raised when starting Podman Desktop. However, it is a nice tool, and I will keep on using it for the time being.
Learn how to record SSH sessions on a Red Hat Enterprise Linux VSI in a Private VPC network using in-built packages. The VPC private network is provisioned through Terraform and the RHEL packages are installed using Ansible automation. What Is Session Recording and Why Is It Required? As noted in "Securely record SSH sessions on RHEL in a private VPC network," a Bastion host and a jump server are both security mechanisms used in network and server environments to control and enhance security when connecting to remote systems. They serve similar purposes but have some differences in their implementation and use cases. The Bastion host is placed in front of the private network to take SSH requests from public traffic and pass the request to the downstream machine. Bastion hosts and jump servers are vulnerable to intrusion as they are exposed to public traffic. Session recording helps an administrator of a system to audit user SSH sessions and comply with regulatory requirements. In the event of a security breach, you as an administrator would like to audit and analyze the user sessions. This is critical for a security-sensitive system. Before deploying the session recording solution, you need to provision a private VPC network following the instructions in the article, "Architecting a Completely Private VPC Network and Automating the Deployment." Alternatively, if you are planning to use your own VPC infrastructure, you need to attach a floating IP to the virtual server instance and a public gateway to each of the subnets. Additionally, you need to allow network traffic from public internet access. Deploy Session Recording Using Ansible To be able to deploy the Session Recording solution you need to have the following packages installed on the RHEL VSI: tlog SSSD cockpit-session-recording The packages will be installed through Ansible automation on all the VSIs both bastion hosts and RHEL VSI. If you haven't done so yet, clone the GitHub repository and move to the Ansible folder. Shell git clone https://github.com/VidyasagarMSC/private-vpc-network cd ansible Create hosts.ini from the template file. Shell cp hosts_template.ini hosts.ini Update the hosts.ini entries as per your VPC IP addresses. Plain Text [bastions] 10.10.0.13 10.10.65.13 [servers] 10.10.128.13 [bastions:vars] ansible_port=22 ansible_user=root ansible_ssh_private_key_file=/Users/vmac/.ssh/ssh_vpc packages="['tlog','cockpit-session-recording','systemd-journal-remote']" [servers:vars] ansible_port=22 ansible_user=root ansible_ssh_private_key_file=/Users/vmac/.ssh/ssh_vpc ansible_ssh_common_args='-J root@10.10.0.13' packages="['tlog','cockpit-session-recording','systemd-journal-remote']" Run the Ansible playbook to install the packages from an IBM Cloud private mirror/repository. Shell ansible-playbook main_playbook.yml -i hosts.ini --flush-cache Running Ansible playbooks You can see in the image that after you SSH into the RHEL machine now, you will see a note saying that the current session is being recorded. Check the Session Recordings, Logs, and Reports If you closely observe the messages post SSH, you will see a URL to the web console that can be accessed using the machine name or private IP over port 9090. To allow traffic on port 9090, in the Terraform code, Change the value of the allow_port_9090 variable to true and run terraform apply. The latest terraform apply will add ACL and security group rules to allow traffic on port 9090. Now, open a browser and navigate to http://10.10.128.13:9090 . To access using the VSI name, you need to set up a private DNS (out of scope for this article). You need a root password to access the web console. RHEL web console Navigate to session recording to see the list of session recordings. Along with session recordings, you can check the logs, diagnostic reports, etc. Session recording on the Web console Recommended Reading How to use Schematics - Terraform UI to provision the cloud resources Automation, Ansible, AI
In this article, we’ll explain how to use Ansible to build and deploy a Quarkus application. Quarkus is an exciting, lightweight Java development framework designed for cloud and Kubernetes deployments, and Red Hat Ansible Automation Platform is one of the most popular automation tools and a star product from Red Hat. Set Up Your Ansible Environment Before discussing how to automate a Quarkus application deployment using Ansible, we need to ensure the prerequisites are in place. First, you have to install Ansible on your development environment. On a Fedora or a Red Hat Enterprise Linux machine, this is achieved easily by utilizing the dnf package manager: Shell $ dnf install ansible-core The only other requirement is to install the Ansible collection dedicated to Quarkus: Shell $ ansible-galaxy collection install middleware_automation.quarkus This is all you need to prepare the Ansible control machine (the name given to the machine executing Ansible). Generally, the control node is used to set up other systems that are designated under the name targets. For the purpose of this tutorial, and for simplicity's sake, we are going to utilize the same system for both the control node and our (only) target. This will make it easier to reproduce the content of this article on a single development machine. Note that you don’t need to set up any kind of Java development environment, because the Ansible collection will take care of that. The Ansible collection dedicated to Quarkus is a community project, and it’s not supported by Red Hat. However, both Quarkus and Ansible are Red Hat products and thus fully supported. The Quarkus collection might be supported at some point in the future but is not at the time of the writing of this article. Inventory File Before we can execute Ansible, we need to provide the tool with an inventory of the targets. There are many ways to achieve that, but the simplest solution for a tutorial such as this one is to write up an inventory file of our own. As mentioned above, we are going to use the same host for both the controller and the target, so the inventory file has only one host. Here again, for simplicity's sake, this machine is going to be the localhost: Shell $ cat inventory [all] localhost ansible_connection=local Refer to the Ansible documentation for more information on Ansible inventory. Build and Deploy the App With Ansible For this demonstration, we are going to utilize one of the sample applications provided as part of the Quarkus quick starts project. We will use Ansible to build and deploy the Getting Started application. All we need to provide to Ansible is the application name, repository URL, and the destination folder, where to deploy the application on the target. Because of the directory structure of the Quarkus quick start, containing several projects, we'll also need to specify the directory containing the source code: Shell $ ansible-playbook -i inventory middleware_automation.quarkus.playbook \ -e app_name='optaplanner-quickstart' \ -e quarkus_app_source_folder='optaplanner-quickstart' \ -e quarkus_path_to_folder_to_deploy=/opt/optplanner \ -e quarkus_app_repo_url='https://github.com/quarkusio/quarkus-quickstarts.git' Below is the output of this command: PLAY [Build and deploy a Quarkus app using Ansible] **************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [Build the Quarkus from https://github.com/quarkusio/quarkus-quickstarts.git.] *** TASK [middleware_automation.quarkus.quarkus : Ensure required parameters are provided.] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Define path to mvnw script.] ***** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure that builder host localhost has appropriate JDK installed: java-17-openjdk] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Delete previous workdir (if requested).] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure app workdir exists: /tmp/workdir] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Checkout the application source code.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Build the App using Maven] ******* ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Display build application log] *** skipping: [localhost] TASK [Deploy Quarkus app on target.] ******************************************* TASK [middleware_automation.quarkus.quarkus : Ensure required parameters are provided.] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure requirements on target system are fullfilled.] *** included: /root/.ansible/collections/ansible_collections/middleware_automation/quarkus/roles/quarkus/tasks/deploy/prereqs.yml for localhost TASK [middleware_automation.quarkus.quarkus : Ensure required OpenJDK is installed on target.] *** skipping: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus system group exists on target system] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus user exists on target system.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure deployement directory exits: /opt/optplanner.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Set Quarkus app source dir (if not defined).] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Deploy application as a systemd service on target system.] *** included: /root/.ansible/collections/ansible_collections/middleware_automation/quarkus/roles/quarkus/tasks/deploy/service.yml for localhost TASK [middleware_automation.quarkus.quarkus : Deploy application from to target system] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Deploy Systemd configuration for Quarkus app] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Perform daemon-reload to ensure the changes are picked up] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus app service is running.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure firewalld configuration is appropriate (if requested).] *** skipping: [localhost] PLAY RECAP ********************************************************************* localhost : ok=19 changed=8 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0 As you can see, the Ansible collection for Quarkus does all the heavy lifting for us: its content takes care of checking out the source code from GitHub and builds the application. It also ensures the system used for this step has the required OpenJDK installed on the target machine. Once the application is successfully built, the collection takes care of the deployment. Here again, it checks that the appropriate OpenJDK is available on the target system. Then, it verifies that the required user and group exist on the target and if not, creates them. This is recommended mostly to be able to run the Quarkus application with a regular user, rather than with the root account. With those requirements in place, the jars produced during the build phase are copied over to the target, along with the required configuration for the application integration into systemd as a service. Any change to the systemd configuration requires reloading its daemon, which the collection ensures will happen whenever it is needed. With all of that in place, the collection starts the service itself. Validate the Execution Results Let’s take a minute to verify that all went well and that the service is indeed running: Shell # systemctl status optaplanner-quickstart.service ● optaplanner-quickstart.service - A Quarkus service named optaplanner-quickstart Loaded: loaded (/usr/lib/systemd/system/optaplanner-quickstart.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-04-26 09:40:13 UTC; 3h 19min ago Main PID: 934 (java) CGroup: /system.slice/optaplanner-quickstart.service └─934 /usr/bin/java -jar /opt/optplanner/quarkus-run.jar Apr 26 09:40:13 be44b3acb1f3 systemd[1]: Started A Quarkus service named optaplanner-quickstart. Apr 26 09:40:14 be44b3acb1f3 java[934]: __ ____ __ _____ ___ __ ____ ______ Apr 26 09:40:14 be44b3acb1f3 java[934]: --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ Apr 26 09:40:14 be44b3acb1f3 java[934]: -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \ Apr 26 09:40:14 be44b3acb1f3 java[934]: --\___\_\____/_/ |_/_/|_/_/|_|\____/___/ Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,843 INFO [io.quarkus] (main) optaplanner-quickstart 1.0.0-SNAPSHOT on JVM (powered by Quarkus 2.16.6.Final) started in 1.468s. Listening on: http://0.0.0.0:8080 Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,848 INFO [io.quarkus] (main) Profile prod activated. Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,848 INFO [io.quarkus] (main) Installed features: [agroal, cdi, hibernate-orm, hibernate-orm-panache, hibernate-orm-rest-data-panache, jdbc-h2, narayana-jta, optaplanner, optaplanner-jackson, resteasy-reactive, resteasy-reactive-jackson, resteasy-reactive-links, smallrye-context-propagation, vertx, webjars-locator] Having the service running is certainly good, but it does not guarantee by itself that the application is available. To double-check, we can simply confirm the accessibility of the application by connecting to it: PowerShell # curl -I http://localhost:8080/ HTTP/1.1 200 OK accept-ranges: bytes content-length: 8533 cache-control: public, immutable, max-age=86400 last-modified: Wed, 26 Apr 2023 10:00:18 GMT date: Wed, 26 Apr 2023 13:00:19 GMT Writing up a Playbook The default playbook provided with the Ansible collection for Quarkus is quite handy and allows you to bootstrap your automation with a single command. However, most likely, you’ll need to write your own playbook so you can add the automation required around the deployment of your Quarkus app. Here is the content of the playbook provided with the collection that you can simply use as a base for your own: YAML --- - name: "Build and deploy a Quarkus app using Ansible" hosts: all gather_facts: false vars: quarkus_app_repo_url: 'https://github.com/quarkusio/quarkus-quickstarts.git' app_name: optaplanner-quickstart' quarkus_app_source_folder: 'optaplanner-quickstart' quarkus_path_to_folder_to_deploy: '/opt/optaplanner' pre_tasks: - name: "Build the Quarkus from {{ quarkus_app_repo_url }." ansible.builtin.include_role: name: quarkus tasks_from: build.yml tasks: - name: "Deploy Quarkus app on target." ansible.builtin.include_role: name: quarkus tasks_from: deploy.yml To run this playbook, you again use the ansible-playbook command, but providing the path to the playbook: Shell $ ansible-playbook -i inventory playbook.yml Conclusion Thanks to the Ansible collection for Quarkus, the work needed to automate the deployment of a Quarkus application is minimal. The collection takes care of most of the heavy lifting and allows its user to focus on the automation needs specific to their application and business needs.
As businesses shift operations to the cloud, robust security is crucial. DDoS attacks pose significant threats to cloud-based services, aiming to disrupt infrastructure and cause downtime and financial losses. AWS Shield from Amazon Web Services provides comprehensive DDoS protection, fortifying cloud security. This article explores how AWS Shield safeguards applications and resources from evolving DDoS threats. Understanding DDoS Attacks To understand the role of AWS Shield, it's essential to grasp how DDoS attacks work. They involve compromised devices flooding a target with excessive traffic, blocking legitimate users from accessing it. DDoS attacks can target different network layers, making mitigation easier with specialized protection. Introducing AWS Shield AWS Shield is a DDoS protection service provided by AWS. It offers two tiers of protection: AWS Shield Standard and AWS Shield Advanced. AWS Shield Standard Automatic protection: AWS Shield Standard is automatically integrated with AWS resources such as Amazon CloudFront, Amazon Route 53, and Elastic Load Balancing (ELB). It provides automatic protection against common DDoS attacks at no extra cost. Global network resilience: By leveraging the robust AWS global network, Shield Standard can distribute and absorb DDoS traffic across multiple Availability Zones, ensuring uninterrupted services. Cost-effective solution: Customers can use Shield Standard, which is included in the AWS resource fees. This provides a cost-effective security solution that requires minimal setup and management. AWS Shield Advanced Real-time attack monitoring: AWS Shield Advanced allows proactive monitoring and analysis of ongoing DDoS attacks in real-time, providing visibility into potential threats. Advanced DDoS mitigation: Shield Advanced offers enhanced protection against complex and sophisticated DDoS attacks by employing additional security features like AWS Web Application Firewall (WAF) and AWS Firewall Manager. 24/7 DDoS Response Team (DRT): Subscribers to Shield Advanced can rely on the AWS DDoS Response Team, a group of DDoS mitigation experts available 24/7, for personalized assistance during active attacks. Integration With Other AWS Services AWS CloudWatch Integration AWS Shield integrates with AWS CloudWatch to monitor and analyze DDoS protection metrics, enabling automated threat responses. AWS CloudTrail Integration Integrating with AWS CloudTrail gives users enhanced visibility into security logs and events, strengthening cloud security. Scalable Mitigation and Resilience AWS Shield scales effectively to handle large-scale DDoS attacks, distributing traffic and mitigating attacks closer to their source. This reduces latency and improves application availability. A Layered Approach to Cloud Security AWS Shield provides a foundational layer of security for cloud-based applications. To create a comprehensive security strategy, businesses can combine AWS Shield with other security services like AWS WAF, AWS Firewall Manager, and AWS Security Hub. This layered approach addresses various security concerns. Conclusion As the cloud landscape expands, safeguarding cloud-based applications and resources from DDoS attacks becomes crucial. AWS Shield provides a reliable solution to defend against DDoS threats, fortifying cloud security and ensuring uninterrupted availability of essential services. Whether utilizing AWS Shield Standard's automated protection or AWS Shield Advanced's advanced features, businesses can rely on AWS's expertise to protect their cloud infrastructure. This allows them to concentrate on innovation and growth with confidence in their cloud security.
In today's data-driven world, organizations face the challenge of handling massive volumes of data across various systems. To extract valuable insights, powerful tools are needed. Enter Trino, an open-source distributed SQL query engine that empowers organizations to process and query large datasets from multiple sources. But to unleash Trino's full potential, you need a trusty sidekick like DbVisualizer. This superhero of database management and development tools offers a user-friendly interface and a complete platform for working with different databases. DbVisualizer acts as a centralized hub, effortlessly connecting you to Trino and other data stores like Hadoop, Cassandra, and MySQL. With DbVisualizer, exploring databases, building queries, and visualizing data becomes a breeze. Its query builder tool simplifies query construction, making it easy to manipulate data visually. Moreover, DbVisualizer's data visualization powers are truly impressive, allowing you to create stunning charts, graphs, and dashboards. By connecting DbVisualizer with Trino, you seamlessly blend data from various sources into these visualizations, revealing a world of insights. Prerequisites 1. Basic knowledge of databases and SQL. 2. Docker 3. DbVisualizer What Is Trino? Trino, formerly known as PrestoSQL, is a powerful open-source distributed SQL query engine designed for large-scale data processing and analysis. It offers a unified interface to query data from various sources, including traditional databases and distributed storage systems. With its distributed architecture, Trino scales horizontally and processes queries in parallel, enabling efficient handling of massive datasets. It supports standard SQL syntax and provides advanced functions for complex data manipulation. Trino can push down query execution to data sources, reducing data movement and improving performance. Widely adopted by organizations, Trino is valued for its flexibility, speed, and ease of use, making it an indispensable tool for data analytics and real-time insights. What Is Trino SQL? Trino SQL is a powerful language used to query data in Trino, the distributed SQL query engine. It follows the SQL standard and provides a familiar syntax for data analysis tasks. Trino SQL supports a wide range of operations, including querying, filtering, joining, aggregating, and transforming datasets. It includes advanced features such as subqueries and a rich set of functions for data manipulation. Trino SQL leverages the distributed nature of Trino for fast and scalable query execution. It supports various data sources and formats, making it versatile for heterogeneous environments. Overall, Trino SQL offers a robust and efficient solution for querying and analyzing data in Trino. Setting up Trino For this tutorial, we will be running Trino locally on a docker container. Follow these steps to install Trino on your docker container: Step 1: Pull the Trino Docker Image The Trino project provides the "trinodb/trino" Docker image, which includes the Trino server and a default configuration. Pull the image from Docker Hub using the following command: docker pull trinodb/trino This command will download the latest version of the Trino Docker image. Step 2: Run the Trino Container Create a container from the Trino image using the following command: docker run --name trino -d -p 8080:8080 trinodb/trino This command creates a container named "trino" from the "trinodb/trino" image. The container runs in the background and maps the default Trino port, 8080, from inside the container to port 8080 on your workstation. Step 3: Verify the Container To verify that the Trino container is running, use the following command: docker ps This command displays all the running containers. Look for the "trino" container and ensure that it is listed with the appropriate status and port mapping. Step 4: Wait for Trino To Start When the Trino container starts, it might take a few moments for it to become fully ready. You can check its status using the following command: docker logs trino This command displays the container logs. Look for the "health: starting" status initially, and once it becomes ready, it should display "(healthy)." Congratulations! You have successfully installed Trino on a Docker container. You can now access Trino by visiting these local hosts in your web browser and start running SQL queries against your Trino cluster. Setting up the Trino Connection in DbVisualizer Setting up the Trino connection in DbVisualizer is a straightforward process that allows you to unleash the power of Trino's distributed SQL query capabilities within the user-friendly environment of DbVisualizer. Here's how you can get started: Now that we have a running Trino database in docker, we can connect [DbVisualizer] to it by following the steps below: 1. Go to the Connection tab. Click the "Create a Connection" button to create a new connection. Creating a database connection in DbVisualizer 2. Select your server type. For this tutorial, we will be choosing Trino as the driver. Choosing the driver in DbVisualizer 3. In the Driver Connection tab, enter the following information: Database server: localhost Database Port: 8080 UseId: “user_name” Connection Details for the Trino Server in DbVisualizer 4. Click the "Connect" button to test the connection. If you haven't updated your Trino driver, you will receive a prompt to do so. Driver download Open the Driver Manager tab and update the driver to connect to your Trino database. Trino download jdbc driver in DbVisulaizer Click on “Connect” again to test your connection. If the connection is successful, you should see a message indicating that the connection was established. You can now browse the database using DbVisualizer. A Message Signifying a Successful Connection 5. Explore and Query Trino Data With the Trino connection established in DbVisualizer, you are now ready to explore and query your Trino data. Utilize DbVisualizer's intuitive interface, query builder, and visualization tools to interact with Trino and extract valuable insights from your distributed datasets. The Trino server tree Now follow along as we walk you through the CLI capabilities of Trino as well! Trino CLI Trino CLI is your go-to command-line buddy for seamless interaction with Trino. The command-line interface allows interaction with Trino, providing capabilities to execute queries, manage connections, and retrieve results directly from your terminal. With its SQL prowess, you can write queries with ease, thanks to nifty features like auto-completion and syntax highlighting. Trino CLI goes the extra mile by allowing you to fine-tune your query experience through configurable session properties and optimized performance options. And guess what? It offers a plethora of output formats to jazz up your query results! To run Trino CLI on your docker container, use the following command: docker exec -it trino trino Then, enter your Trino SQL query in the terminal and run it to execute the query on your Trino server. Executing a query in the Trino CLI But hold on! There's an exciting alternative that takes your Trino journey to the next level. Imagine stepping into a world of graphical interfaces and advanced visualization wonders. That's where tools like DbVisualizer enter the scene. By harnessing the power of a JDBC driver, you can connect with Trino in DbVisualizer and unlock a universe of interactive exploration, query building, and mind-blowing visualizations. It's like adding a touch of magic to your Trino experience. So, whether you're a command-line aficionado or prefer the captivating realm of graphical tools, Trino CLI and DbVisualizer offer you the best of both worlds. Get ready to embark on an exhilarating data exploration journey fueled by the boundless potential of Trino and the seamless connectivity of DbVisualizer. Executing Queries in DbVisualizer With Trino DbVisualizer provides a powerful interface for writing and executing SQL queries against Trino. You can leverage its user-friendly query editor to compose SQL statements efficiently. Simply expand the Trino server tree, pick any catalog from the list, and create an SQL query commander by clicking on the play icon with a plus next to it. The create sql commander button You can start writing SQL queries in the SQL commander editor. A good query example is one to count the number of nations in the nation table: select count(\*) from tpch.sf1.nation; Click on the play button above the SQL commander to execute the query. You will get the result shown in the image below: The Trino query result Now, we’ll visualize the queries in Trino with DbVisualizer. Follow along! Visualizing Trino Queries With DbVisualier By using SQL, we have the power to create a wide range of analytical queries on this table. For example, let's calculate the average length of the nation names across all regions: SELECT regionkey, AVG(LENGTH(name)) AS avg_name_length FROM tpch.sf1.nation GROUP BY regionkey; The modified query retrieves data from the nations table in Trino and calculates the average length of nation names (avg_name_length) for each region (regionkey). By grouping the results based on the regionkey column, the query provides a summary of the average name length for nations within each specific region. Running the query above will provide you with the results seen in the table below: You can use this statistic to create a visualization such as a line chart, bar chart, or area chart. To create a visualization for this table, click on the rightmost button in the result tab toolbar. The Show as Chart button Then, select the values for the x and y axis of your chart by clicking on the select button above the chart panel. Select the avg_name_length as the x-axis and the regionkey as the y-axis. Setting the chart axis Great! We have successfully created a line chart visualization of our Trino query data. The Trino Line chart By default, the visualization displays a line chart, but don't let that limit you. Get creative and explore the various customization options available to you. You can try out options like line chart, point chart, area chart, stacked area chart, bar chart, stacked bar chart, and pie chart by clicking on the chart icon above the chart panel to reveal a dropdown menu of various chart types. The Chart Type Dropdown Impressive, isn't it? DbVisualizer offers a range of customizable features. To explore these options, simply click on the tool button located at the top of the chart tab. From there, you have the freedom to fine-tune your charts according to your preferences. Once you've crafted the ideal chart, it's a breeze to export it as an image — just click on the document icon situated at the top of the chart tab. Configure chart and export chart buttons Conclusion In this tutorial, we've uncovered the power of Trino and DbVisualizer by unleashing the capabilities of distributed SQL queries for data analysis. Trino, the open-source SQL query engine, offers the muscle to handle massive data volumes across various systems. With DbVisualizer as our trusty sidekick, we effortlessly connect to Trino and other data stores. Its user-friendly interface and comprehensive tools make exploring and querying data a breeze. We've learned how to establish the Trino connection in DbVisualizer, executing SQL queries and retrieving results with ease. But the excitement doesn't stop there! DbVisualizer's visualization capabilities let us create stunning charts to bring our data to life. We can customize these visualizations to suit our needs and, with a simple click, export them as image masterpieces. By mastering the Trino connection with DbVisualizer, we can gain valuable insights and supercharge our data analysis. So, don't stop here — explore, experiment, and unlock the full potential of Trino and DbVisualizer using their documentation and blog in your data-driven journey, and until next time! FAQ (Frequently Asked Questions) 1. How do I install Trino on a Docker container? To install Trino on a Docker container, use the command docker pull trinodb/trino to download the Trino Docker image. Then, create a container from the image using docker run --name trino -d -p 8080:8080 trinodb/trino. Verify the container status with docker ps and ensure it is running. 2. How do I connect DbVisualizer to Trino? In DbVisualizer, go to the Connection tab and click "Create a Connection." Choose Trino as the driver and enter the connection details, such as localhost for the Database server and 8080 for the Database Port. Click "Connect" to establish the connection. 3. How can I execute SQL queries in DbVisualizer with Trino? To execute SQL queries in DbVisualizer with Trino, expand the Trino server tree, create an SQL commander, and write your SQL query in the editor. Click the play button to execute the query and view the results. 4. How can I visualize Trino queries using DbVisualizer? DbVisualizer allows you to visualize Trino queries by creating charts. After executing a query, click on the rightmost button in the result tab toolbar to show the chart panel. Select the desired values for the x and y axes, and customize the chart type and appearance as needed. 5. Can I export the charts created in DbVisualizer as images? Yes, you can export charts created in DbVisualizer as images. In the chart tab, click on the document icon located at the top to export the chart as an image file.
What Is Git? Git is a distributed revision control system. This definition sounds complicated, so let's break it down and look at the individual parts. The definition can be broken down into two parts: Git is distributed. Git is a revision control system. In this article, we'll elaborate on each of these characteristics of Git in order to understand how Git does what it does. Revision Control System A revision control system tracks content as it changes over time, which makes it a content tracker. Git tracks changes to contents by computing their SHA1 hash. If the hash of an object that Git is tracking has changed, Git treats it as a new object. To provide persistence, Git stores this map of SHA1 as a key and object as the value in a repository on the project's directory. So, at its very core, Git is essentially a persistent map. This is illustrated in the figure below. We'll start our journey from the core and explore Git layer-by-layer as we move outwards to understand the complete picture. Persistent Map In programming languages, a map is an interface that represents a collection of key-value pairs, where each key is associated with a unique value. Git computes the hash of an object that it stores in its repository. Shell git hash-object "Flash 9000" This returns the SHA1 of the string object having the content "Flash 9000". The Git repository is instantiated with the following command. This creates a repository in a hidden folder named .git. Shell $ git init Initialized empty Git repository in D:/git/test/.git/ To store an object in its repository, we can pass '-w' flag. Shell $ echo "Flash 9000" | git hash-object --stdin -w fc75e0215a2fcaeea1b949dab29c6014a2333399 Every object in Git has its own SHA1. Git is a map where keys are SHA1 and values are the content. Persistence is provided with the flag (-w) in the repository. Notice how Git stored the object in its repository in .git folder. Shell $ ls -ltr .git/objects/fc/ total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:01 75e0215a2fcaeea1b949dab29c6014a2333399 Now that we understand the very core of Git, i.e., it is a persistent map, let's look at the next layer - content tracker, i.e., how Git tracks changes made to an object over time. Content Tracker We store content in files and directories. So, let's create a file and store some content that we want to track. We initialize a Git repository and store our content in the repository. Create a file with the content shown below. Add the file and commit it to the repository. Shell # create a file $ touch storage_insights.txt $ echo "Flash 9000" >> storage_insights.txt $ git add . $ git commit -m "First commit" [master (root-commit) b7d1ea0] First commit 1 file changed, 1 insertion(+) create mode 100644 storage_insights.txt Let's check the .git folder to find out what objects Git created to track the single file we've in our repository. Shell $ ls -ltR .git/objects/ .git/objects/: total 0 drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 b7/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 af/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:56 fc/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 info/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 pack/ .git/objects/b7: total 1 -r--r--r-- 1 ragha 197121 137 Sep 13 16:57 d1ea0ff44167b0daa2b3016d3fced984618612 .git/objects/af: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 16:57 ddf78df335c4a85c0e05ba3804fa1ab64fd4fd .git/objects/fc: total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:56 75e0215a2fcaeea1b949dab29c6014a2333399 .git/objects/info: total 0 .git/objects/pack: total 0 Git created three objects with three SHA1. Let's check the kind of objects created and their contents. Git provides utility methods for this purpose. To fetch the content of an object, Git provides the utility method. Shell git cat-file -p <SHA1> There are different types of objects stored in the Git repository, Git provides the following method to find the type of an object. Shell git cat-file -t <SHA1> The types of three objects are shown below. Shell $ git cat-file -t b7d1ea0ff44167b0daa2b3016d3fced984618612 commit $ git cat-file -t afddf78df335c4a85c0e05ba3804fa1ab64fd4fd tree $ git cat-file -t fc75e0215a2fcaeea1b949dab29c6014a2333399 blob Let's show their contents to understand better. Shell $ git cat-file -p b7d1ea0ff44167b0daa2b3016d3fced984618612 tree afddf78df335c4a85c0e05ba3804fa1ab64fd4fd author Randhir Singh <randhirkumar.singh@gmail.com> 1694604421 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694604421 +0530 First commit $ git cat-file -p afddf78df335c4a85c0e05ba3804fa1ab64fd4fd 100644 blob fc75e0215a2fcaeea1b949dab29c6014a2333399 storage_insights.txt $ git cat-file -p fc75e0215a2fcaeea1b949dab29c6014a2333399 Flash 9000 The first object is a commit that is created as a result of the git commit command. The commit object is pointing to a tree object that refers to the file that we created. The tree object points to a blob object that has the content that we put in the file. Pictorially, the Git repository at this point can be depicted as shown below. Let's modify the content and commit the updated file. Shell $ echo "Storwize" >> storage_insights.txt $ git add . $ git commit -m "Second commit" [master b228401] Second commit 1 file changed, 1 insertion(+) How many objects are there in the Git repository now? Shell $ git count-objects 6 objects, 0 kilobytes Let's check the content of our Git repository. Shell $ ls -ltR .git/objects/ .git/objects/: total 0 drwxr-xr-x 1 ragha 197121 0 Sep 13 17:21 b2/ drwxr-xr-x 1 ragha 197121 0 Sep 13 17:21 b1/ drwxr-xr-x 1 ragha 197121 0 Sep 13 17:20 b0/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 b7/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 af/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:56 fc/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 info/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 pack/ .git/objects/b2: total 1 -r--r--r-- 1 ragha 197121 167 Sep 13 17:21 28401ce532180aa8fdffaa54731d9d2085f15d .git/objects/b1: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 17:21 f1a7abd35ad7178efe94a13ccf6de2868f68ce .git/objects/b0: total 1 -r--r--r-- 1 ragha 197121 36 Sep 13 17:20 0f271ba3e94459a48af1620ec9d2050df8e8f5 .git/objects/b7: total 1 -r--r--r-- 1 ragha 197121 137 Sep 13 16:57 d1ea0ff44167b0daa2b3016d3fced984618612 .git/objects/af: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 16:57 ddf78df335c4a85c0e05ba3804fa1ab64fd4fd .git/objects/fc: total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:56 75e0215a2fcaeea1b949dab29c6014a2333399 .git/objects/info: total 0 .git/objects/pack: total 0 Git created three new objects. Shell $ git cat-file -t b228401ce532180aa8fdffaa54731d9d2085f15d commit $ git cat-file -t b1f1a7abd35ad7178efe94a13ccf6de2868f68ce tree $ git cat-file -t b00f271ba3e94459a48af1620ec9d2050df8e8f5 blob Let's check their contents. Shell $ git cat-file -p b228401ce532180aa8fdffaa54731d9d2085f15d tree b1f1a7abd35ad7178efe94a13ccf6de2868f68ce parent b7d1ea0ff44167b0daa2b3016d3fced984618612 author Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 Second commit $ git cat-file -p b1f1a7abd35ad7178efe94a13ccf6de2868f68ce 100644 blob b00f271ba3e94459a48af1620ec9d2050df8e8f5 storage_insights.txt $ git cat-file -p b00f271ba3e94459a48af1620ec9d2050df8e8f5 Flash 9000 Storwize The new commit object now has a parent, which is the previous commit object. The new commit refers to the new tree object, which is the updated file, and the new tree refers to the new blob, which is the updated content. Pictorially, the situation at this point is shown below. In a nutshell, the Git repository stores these objects, and the objects are linked with each other via pointers. The objects are immutable; each time they are modified, a new object is created, and references are updated. This is how Git tracks the content as it changes over time. Now that we understand how Git tracks the content let's move on to the next layer and understand what makes Git a revision control system. Revision Control System Building upon the persistent map and the content tracker, Git is a revision control system that allows developers and teams to: Track changes made to files. Maintain a history of revisions, making it possible to revert to the previous version. Manage code branches and merge changes from different contributors. In order to achieve these, Git provides some artifacts that make it a revision control system. We'll explain these one by one in this section. Branches Branches allow developers to experiment with different changes to their code without affecting the main codebase. This can help them to avoid introducing bugs into the main codebase. Branches can also be used to collaborate with other developers on the same project. Just as Git stores various objects in its repository, branches are also stored there. Let's take a look. Shell $ cat .git/refs/heads/master b228401ce532180aa8fdffaa54731d9d2085f15d $ git cat-file -p b228401ce532180aa8fdffaa54731d9d2085f15d tree b1f1a7abd35ad7178efe94a13ccf6de2868f68ce parent b7d1ea0ff44167b0daa2b3016d3fced984618612 author Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 Second commit The branch is pointing to the second commit. A branch is just a reference to a commit. The master branch was created when we initialized the Git repository. To create a new branch, Git provides a method. Shell git branch <branchname> Notice another reference named HEAD. HEAD is a reference to a branch. HEAD changes as we switch branches. This is explained in the diagram below. To change to a different branch. Shell $ git switch branch Switched to branch 'branch' HEAD will now move to the branch branch. To check where the current HEAD is pointing to. Shell $ cat .git/HEAD ref: refs/heads/branch As we switch branches, files, and folders in the working area change. Git doesn't track them unless they are committed (i.e., available in the Git repository). Merge Next, let's look at the concept of merging. A merge in Git is the process of combining two or more branches into a single branch. This is typically done when you have finished working on a feature branch and want to integrate your changes into the main codebase. To merge the changes from <branch> into the current branch. Shell git merge <branch> Let's see what happens if we merge a branch. We will add one commit to the branch branch and another commit to the branch master. When done, we'll merge the branch branch into the master branch. Shell $ git status On branch branch nothing to commit, working tree clean $ echo "DS8000" >> storage_insights.txt $ git add . $ git commit -m "Added DS8000" [branch aac6280] Added DS8000 1 file changed, 1 insertion(+) Now, switch to the master branch and add some content to the file. Shell $ git switch master Switched to branch 'master' $ echo "XIV" >> storage_insights.txt $ git add . $ git commit -m "Added XIV" [master d8a6319] Added XIV 1 file changed, 1 insertion(+) Merge the branch branch into the master branch. Since the same line of the file is modified in both branches, this will give rise to a conflict. Shell $ git merge branch Auto-merging storage_insights.txt CONFLICT (content): Merge conflict in storage_insights.txt Automatic merge failed; fix conflicts and then commit the result. Resolve the conflict in the file and commit it. This will create another kind of Git object called merge commit. Shell $ git log --graph --decorate --oneline * fcc8bae (HEAD -> master) Resolved merge conflict |\ | * aac6280 (branch) Added DS8000 * | d8a6319 Added XIV |/ * b228401 Second commit * b7d1ea0 First commit Let's examine the merge commit. It has two parents; one is the latest commit from the branch branch, and the other parent is the latest commit from the branch master. Shell $ git cat-file -p fcc8bae tree 6031b0e96170c70d7ae4ad264840168c3fc0b1fa parent d8a63194e4054fcd5c4289b5c0488514691c6beb parent aac6280973f401bfe7a7d5a6904794b9133bac6c author Randhir Singh <randhirkumar.singh@gmail.com> 1694609622 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694609622 +0530 Resolved merge conflict Pictorially, the Git repository at this point in time looks like this. Git creates a merge commit only if it is required. A fast-forward merge is a type of merge in Git that combines two branches without creating a new merge commit. This is only possible if the two branches have a linear history, meaning that the target branch is a direct descendant of the source branch. Losing HEAD Normally, the HEAD points to the branch that points to the latest commit. However, it is possible for the HEAD to not point to a branch. In that case, HEAD is said to be detached. A detached HEAD is a state where the HEAD pointer is not pointing to a branch but instead pointing to a specific commit. This can happen if we check out a commit instead of a branch. Shell $ git checkout b228401 Note: switching to 'b228401'. You are in 'detached HEAD' state. When you are in a detached HEAD state, Git will not be able to automatically track your changes. To get out of a detached HEAD state, you can do one of the following: Create a new branch and checkout to it. Merge the changes in the detached HEAD state into your current branch. Checkout to a different branch. Git Object Model This is a good time to review the Git object model, as we've covered all the main Git objects. A Git repository is a bunch of objects linked to each other in a graph. Branch references to a commit, and HEAD is a reference to a branch. Objects are immutable, meaning that they cannot be changed once they are created. This makes them very efficient for storing data, as Git can simply compare the contents of two objects to determine if they are different. There are four main types of objects in the Git object model: Blobs: Blobs store the contents of files. Trees: Trees store the contents of directories. Commits: Commits store metadata about changes to files, such as the author, date, and commit message. Tags: Tags are used to mark specific commits as being important. Objects are stored in the .git directory of your repository. When you make a change to a file and commit it, Git creates a new blob object to store the contents of the changed file and a new commit object to store metadata about the change. Git maintains objects in the repository by following three rules: The current branch tracks new commits. When you move to another commit, Git updates your working directory. Unreachable objects are garbage collected. Rebase Rebase is a process of replaying a sequence of commits onto a new base commit. This means that Git creates new commits, one for each commit in the original sequence, and applies them to the new base commit. To rebase a branch, you can use the git rebase command. For example, to rebase the branch branch onto the master branch, you would run the following command: Shell git rebase master branch Let's look at the history of the branch branch. Contrast this to the earlier scenario when we merged the branch branch into the master branch. Shell $ git log --graph --decorate --oneline * d1f27ee (HEAD -> branch) Added DS8000 * d8a6319 (master) Added XIV * b228401 Second commit * b7d1ea0 First commit Pictorially, the following diagram explains what happens when we rebased. The choice between merge and rebase is to be made based on your preferences. Remember Merge preserves history Rebase refactor history Tags A tag is like a branch that doesn't move. To create a tag. Shell git tag release An annotated tag has a message that can be displayed, while a tag without annotation is just a named pointer to a commit. Where is the tag stored? In the Git repository, like other objects. It points to the commit when the tag was created. Shell $ cat .git/refs/tags/release d8a63194e4054fcd5c4289b5c0488514691c6beb $ git cat-file -p d8a63194e4054fcd5c4289b5c0488514691c6beb tree e4e282cde76ffa017c2d837a6d39bd0259715e2a parent b228401ce532180aa8fdffaa54731d9d2085f15d author Randhir Singh <randhirkumar.singh@gmail.com> 1694609414 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694609414 +0530 Added XIV We've covered the major concepts behind a revision control system and how those concepts are used to achieve its objectives. This completes the part that explains how Git serves as a revision control system. Next, let us discuss the "distributed" nature of Git. Git Is a Distributed Version Control Git, being a distributed version control, every developer has a complete copy of the repository. This is in contrast to centralized version control systems, where there is a single central repository that all developers must access. When you clone a Git repository, you create a local copy of the entire project, including all files and the entire commit history. This local repository contains everything you need to work on the project independently, without needing a constant internet connection or access to a central server. A remote repository can be cloned using. Shell git clone <remote> In our case, we have a Git repository created locally. We'll create a remote repository on GitHub and set the remote. The configured remote repository is stored in .git/config Shell $ git remote add origin https://github.com/Randhir123/test.git $ git push -u origin master Enumerating objects: 9, done. Counting objects: 100% (9/9), done. Delta compression using up to 8 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (9/9), 744 bytes | 372.00 KiB/s, done. Total 9 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/Randhir123/test.git * [new branch] master -> master branch 'master' set up to track 'origin/master'. $ cat .git/config [core] repositoryformatversion = 0 filemode = false bare = false logallrefupdates = true symlinks = false ignorecase = true [remote "origin"] url = https://github.com/Randhir123/test.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Like a local branch, a remote branch is just a reference to a commit. Shell $ git show-ref master d8a63194e4054fcd5c4289b5c0488514691c6beb refs/heads/master d8a63194e4054fcd5c4289b5c0488514691c6beb refs/remotes/origin/master Pushing Commits on the local branches can be pushed to remote branches, as shown below. Shell $ echo "SVC" >> storage_insights.txt $ git add . $ git commit -m "Added SVC" [master 7552e51] Added SVC 1 file changed, 1 insertion(+) $ git push Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Writing objects: 100% (3/3), 291 bytes | 291.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/Randhir123/test.git d8a6319..7552e51 master -> master Pulling Commits on the remote branches can be fetched using. Shell git fetch And merged to a branch using. Shell git merge origin/master These two steps can be done in a single step using. Shell git pull This will fetch remote commits and merge them into the local branch with a single command. We can configure multiple remotes to our repository. All the remotes can displayed using the command. Shell $ git remote -v origin https://github.com/Randhir123/test.git (fetch) origin https://github.com/Randhir123/test.git (push) Pull Request A pull request (PR), often used in the context of Git and code collaboration platforms like GitHub, GitLab, and Bitbucket, is used for proposing and discussing changes to a codebase. Pull requests are typically created to merge or pull our changes to the upstream. Summary In this article, we described the layers that Git is made of. We started our journey of understanding Git from the core, which is the persistent map. Next, we looked at how Git builds upon the persistent map to track the content. The content tracker layer forms the basis of the revision control system. Finally, we looked at the distributed nature of Git, which makes it such a powerful revision control system.
Bartłomiej Żyliński
Software Engineer,
SoftwareMill
Vishnu Vasudevan
Head of Product Engineering & Management,
Opsera
Abhishek Gupta
Principal Developer Advocate,
AWS
Yitaek Hwang
Software Engineer,
NYDIG