Open Source Resources for Engineers

DZone's Featured Open Source Resources

TDD With FastAPI Is Easy

By Aleksei Sharypov

Working more than 15 years in IT, I rarely met programmers who enjoy writing tests and only a few people who use something like TDD. Is this really such an uninteresting part of the software development process? In this article, I’d like to share my experience of using TDD. In most of the teams I worked with, programmers wrote code. Often, they didn't write tests at all or added them symbolically. The only mention of the TDD abbreviation made programmers panic. The main reason is that many people misunderstand the meaning of TDD and try to avoid writing tests. It is generally assumed that TDD are usual tests but written before implementation. But this is not quite true. TDD is a culture of writing code. This approach implies a certain order of solving a task and a specific way of thinking. TDD implies solving a task using loops or iterations. Formally, the cycle consists of three phases: Writing a test that gives something to the input and checks the output. In this case, the test doesn’t pass. Writing the simplest implementation with which the test passes. Refactoring. Changing the code without changing the test. The cycle repeats itself until the problem is solved. TDD Cycle I use a slightly different algorithm. In my approach, refactoring is most often the cycle. That is, I write the test, and then I write the code. Next, I write the test again and write the code because refactoring still often requires editing to the test (various mocks, generation of instances, links to existing modules, etc.), but not always. The general algorithm of what we will do I think I won’t describe it as it is done in textbooks. I'll just show you an example of how it works for me. Example Imagine we got a task. No matter how it is described, we can clarify it ourselves, coordinate it with the customer, and solve it. Let’s suppose that the task is described something like this: "Add an endpoint that returns the current time and user information (id, first name, last name, phone number). Also, it is necessary to sign this information based on a secret key." I will not complicate the task to demonstrate it. But in real life, you may need to make a full-fledged digital signature and supplement it with encryption, and this endpoint needs to be added to an existing project. For academic purposes, we will have to create it from scratch. Let's do it using FastAPI. Most programmers just start working on this task without detailed study. They keep everything they can in their head. After all, such a task does not need to be divided into subtasks since it is quite simple and quickly implemented. While working on it, they clarify the requirements of stakeholders and ask questions. And at the end, they write tests anxiously. But we will do it differently. It may seem unexpected, but let's take something from the Agile methodology. Firstly, this task can be divided into logically completed subtasks. You can immediately clarify all the requirements. Secondly, it can be done iteratively, having something working at each step (even incorrectly) that can be demonstrated. Planning Let’s start with the following partition. The First Subtask Make an empty FastAPI app work with one method. Acceptance Criteria There is a FastAPI app, and it can be launched. The GET request "/my-info" returns a response with code 200 and body {} - empty json. The Second Subtask Add a model/User object. At this stage, it will just be a pedantic scheme for the response. You will have to agree with the business on the name of the fields and whether it is necessary to somehow convert the values (filter, clean, or something else). Acceptance Criteria The GET request "/my-info" returns a response with code 200 and body {"user":{"id":1,"firstname":"John","lastname":"Smith","phone":"+995000000000"}. The Third Subtask Add the current time to the response. Again, we need to agree on the time format and the name of the field. Acceptance Criteria The GET request "/my-info" returns a response with code 200 and body {"user":{added earlier},"timestamp":1691377675}. The Fourth Subtask Add a signature. Immediately, some questions to the business appear: Where to add? How to form it? Where to get the key? Where to store? Who has access? And so on… As a result, we use a simple algorithm: We get base64 from the JSON response body. We concatenate with the private key. First, we use an empty string as a key. Then, we take md5 from the received string. We add the result to the X-Signature header. Acceptance Criteria The GET request "/my-info" returns a response described earlier without changes, but with an additional header: "X-Signature":"638e4c9e30b157cc56fadc9296af813a" For this step, the X-Signature is calculated manually. Base64 = eyJ1c2VyIjp7ImlkIjoxLCJmaXJzdG5hbWUiOiJKb2huIiwibGFzdG5hbWUiOiJTbWl0aCIsInBob25lIjoiKzk5NTAwMDAwMDAwMCJ9LCJ0aW1lc3RhbXAiOjE2OTEzNzc2NzV9. Note that the endpoint returns hard-coded values. To what level tasks should be split is up to you. This is just an example. The most important thing will be described further. These four subtasks result in the endpoint that always returns the same response. But there is a question: why have we described the stub in such detail? Here is the reason: these subtasks don’t have to be physically present. They are just steps. They are needed to use the TDD practice. However, their presence on any storage medium other than our memory will make our work much easier. So, let’s begin. Implementation The First Subtask We add the main.py file to the app directory. Python from fastapi import FastAPI app = FastAPI() @app.get("/my-info") async def my_info(): return {} Right after that, we add one test. For example, to the same directory: test_main.py. Python from fastapi.testclient import TestClient from .main import app client = TestClient(app) def test_my_info_success(): response = client.get("/my-info") assert response.status_code == 200 assert response.json() == {} As a result of the first subtask, we added just a few lines of code and a test. At the very beginning, a simple test appeared. It does not cover business requirements at all. It checks only one case — one step. Obviously, writing such a test does not cause much negativity. And at the same time, we have a working code that can be demonstrated. The Second Subtask We add JSON to the verification. To do this, replace the last line in the test. Python result = { "user": { "id": 1, "firstname": "John", "lastname": "Smith", "phone": "+995000000000", }, } assert response.json() == result ❌ Now, the test fails. We change the code so that the test passes. We add the schema file. Python from pydantic import BaseModel class User(BaseModel): id: int firstname: str lastname: str phone: str class MyInfoResponse(BaseModel): user: User We change the main file. We add import. Python from .scheme import MyInfoResponse, User We change the router function. Python @app.get("/my-info", response_model=MyInfoResponse) async def my_info(): my_info_response = MyInfoResponse( user=User( id=1, firstname="John", lastname="Smith", phone="+995000000000", ), ) return my_info_response ✅ Now, the test passes. And we got a working code again. The Third Subtask We add "timestamp": 1691377675 to the test. Python result = { "user": { "id": 1, "firstname": "John", "lastname": "Smith", "phone": "+995000000000", }, "timestamp": 1691377675, } ❌ The test fails again. We change the code so that the test passes. To do this, we add timestamp to the scheme. Python class MyInfoResponse(BaseModel): user: User timestamp: int We add its initialization to the main file. Python my_info_response = MyInfoResponse( user=User( id=1, firstname="John", lastname="Smith", phone="+995000000000", ), timestamp=1691377675, ) ✅ The test passes again. The Fourth Subtask We add the "X-Signature" header verification to the test: "54977504fbe6c7aec318722d9fbcaec8". Python assert response.headers.get("X-Signature") == "638e4c9e30b157cc56fadc9296af813a" ❌ The test fails again. We add this header to the application's response. To do this, we add middleware. After all, we will most likely need a signature for other endpoints of the application. But this is just our choice, which in reality can be different so as not to complicate the code. Let's do it to understand this. We add import Request. Python from fastapi import FastAPI, Request And the middleware function. Python @app.middleware("http") async def add_signature_header(request: Request, call_next): response = await call_next(request) response.headers["X-Signature"] = "638e4c9e30b157cc56fadc9296af813a" return response ✅ The test passes again. At this stage, we have received a ready-made working test for the endpoint. Next, we will change the application, converting it from a stub into a fully working code while checking it with just one ready-made test. This step can already be considered as refactoring. But we will do it in exactly the same small steps. The Fifth Subtask Implement signature calculation. The algorithm is described above, as well as the acceptance criteria, but the signature should change depending on the user's data and timestamp. Let's implement it. ✅ The test passes, and we don't do anything to it at this step. That is, we do a full-fledged refactoring. We add the signature.py file. It contains the following code: Python import base64 import hashlib def generate_signature(data: bytes) -> str: m = hashlib.md5() b64data = base64.b64encode(data) m.update(b64data + b"") return m.hexdigest() We change main.py. We add import. Python from fastapi import FastAPI, Request, Response from .signature import generate_signature We change middleware. Python @app.middleware("http") async def add_signature_header(request: Request, call_next): response = await call_next(request) body = b"" async for chunk in response.body_iterator: body += chunk response.headers["X-Signature"] = generate_signature(body) return Response( content=body, status_code=response.status_code, headers=dict(response.headers), media_type=response.media_type, ) Here is the result of our complication, which wasn’t necessary for us to do. We did not get the best solution since we have to calculate the entire body of the response and form our own Response. But it is quite suitable for our purposes. ✅ The test still passes. The Sixth Subtask Replace timestamp with the actual value of the current time. Acceptance Criteria timestamp in the response returns the actual current time value. The signature is generated correctly. To generate the time, we will use int(time.time()) First, we edit the test. Now, we have to freeze the current time. Import: Python from datetime import datetime from freezegun import freeze_time We make the test look like the one below. Since freezegun accepts either an object or a string with a date, but not unix timestamp, it will have to be converted. Python def test_my_info_success(): initial_datetime = 1691377675 with freeze_time(datetime.utcfromtimestamp(initial_datetime)): response = client.get("/my-info") assert response.status_code == 200 result = { "user": { "id": 1, "firstname": "John", "lastname": "Smith", "phone": "+995000000000", }, "timestamp": initial_datetime, } assert response.json() == result assert response.headers.get("X-Signature") == "638e4c9e30b157cc56fadc9296af813a" Nothing has changed. ✅ That’s why the test still passes. So, we continue refactoring. Changes to the main.py code. Import: Python import time In the response, we replace the time-hard code with a method call. Python timestamp=int(time.time()), ✅ We launch the test — it works. In tests, one often tries to dynamically generate input data and write duplicate functions to calculate the results. I don’t share the idea of this approach as it can potentially contain errors and requires testing as well. The simplest and most reliable way is to input and output data prepared in advance. The only thing that can be used at the same time is configuration data, settings, and some proven fixtures. Now, we will add the settings. The Seventh Subtask Add a private key. We will take it from the settings environment variables. Acceptance Criteria There is a private key (not an empty string). It is part of the signature generation process according to the algorithm described above. The application gets it from the environment variables. For the test, we use the private key: 6hsjkJnsd)s-_=2$%723 As a result, our signature will change to: 479bb02f0f5f1249760573846de2dbc1 We replace the signature verification in the test: Python assert response.headers.get("X-Signature") == "479bb02f0f5f1249760573846de2dbc1" ❌ Now, the test fails. We add file settings.py to get the settings from environment variables. Python from pydantic_settings import BaseSettings class Settings(BaseSettings): security_key: bytes settings = Settings() We add the code for using this key to signature.py. Import: Python from .settings import settings And we replace the string with concatenation with: Python m.update(b64data + settings.security_key) ❌ Now, the test fails. Now, before running the tests, we need to set the environment variable with the correct key. This can be done right before the call, for example, like this: export security_key='6hsjkJnsd)s-_=2$%723' ✅ Now, the test passes. I would not recommend setting the default value in the settings.py file. The variable must be defined. Setting a default incorrect value can lead to hiding an error in production if the value of this variable is not set during deployment. The application will start without errors, but it will give incorrect results. However, in some cases, a working application with incorrect functionality is better than error 503. It's up to you as a developer to decide. The next steps may be replacing the stub of the User object with real values from the database writing additional validation tests and negative scenarios. In any case, you will have to add more acceptance tests at the end. The most important thing here is dividing the task into micro-tasks, writing a simple test for each subtask, and then writing the application code, and after that, if necessary, refactoring. This order in development really helps: Focus on the problem See the result of the subtask clearly Be able to quickly verify the written code Reduce negativity when writing tests Always have at least one test per task As a result, the number of situations when a programmer "overdoes it" and spends much more time solving a problem than he could with a structured approach decreases. Thus, the development time of the feature is reduced, and the quality of the code is improved. In the long term, changes, refactoring, and updates of package versions in the code are easily controlled and implemented with minimal losses. And here is what’s important: TDD should improve development, make it faster, and strengthen it. This is what the word Driven in the abbreviation means. Therefore, it is not necessary to try to write a complete test or acceptance test of the entire task before the start of development. An iterative approach is needed. Tests are only needed to verify the next small step in development. TDD helps answer the question: how do I know that I have achieved my goal (I mean, that the code fragment I wrote works)? The examples can be found here. More

Docker and Kubernetes Transforming Modern Deployment

By Sohail Shaikh

In today's rapidly evolving world of software development and deployment, containerization has emerged as a transformative technology. It has revolutionized the way applications are built, packaged, and deployed, providing agility, scalability, and consistency to development and operations teams alike. Two of the most popular containerization tools, Docker and Kubernetes, play pivotal roles in this paradigm shift. In this blog, we'll dive deep into containerization technologies, explore how Docker and Kubernetes work together, and understand their significance in modern application deployment. Understanding Containerization A containerization is a lightweight form of virtualization that allows you to package an application and its dependencies into a single, portable unit called a container. Containers are isolated, ensuring that an application runs consistently across different environments, from development to production. Unlike traditional virtual machines (VMs), containers share the host OS kernel, which makes them extremely efficient in terms of resource utilization and startup times. Example: Containerizing a Python Web Application Let's consider a Python web application using Flask, a microweb framework. We'll containerize this application using Docker, a popular containerization tool. Step 1: Create the Python Web Application Python # app.py from flask import Flask app = Flask(__name__) @app.route('/') def hello(): return "Hello, Containerization!" if __name__ == '__main__': app.run(debug=True, host='0.0.0.0') Step 2: Create a Dockerfile Dockerfile # Use an official Python runtime as a parent image FROM python:3.9-slim # Set the working directory to /app WORKDIR /app # Copy the current directory contents into the container at /app COPY . /app # Install any needed packages specified in requirements.txt RUN pip install -r requirements.txt # Make port 80 available to the world outside this container EXPOSE 80 # Define environment variable ENV NAME World # Run app.py when the container launches CMD ["python", "app.py"] Step 3: Build and Run the Docker Container Shell # Build the Docker image docker build -t flask-app . # Run the Docker container, mapping host port 4000 to container port 80 docker run -p 4000:80 flask-app This demonstrates containerization by encapsulating the Python web application and its dependencies within a Docker container. The containerized app can be run consistently on various environments, promoting portability and ease of deployment. Containerization simplifies application deployment, ensures consistency, and optimizes resource utilization, making it a crucial technology in modern software development and deployment pipelines. Docker: The Containerization Pioneer Docker, developed in 2013, is widely regarded as the pioneer of containerization technology. It introduced a simple yet powerful way to create, manage, and deploy containers. Here are some key Docker components: Docker Engine The Docker Engine is the core component responsible for running containers. It includes the Docker daemon, which manages containers, and the Docker CLI (Command Line Interface), which allows users to interact with Docker. Docker Images Docker images are lightweight, stand-alone, and executable packages that contain all the necessary code and dependencies to run an application. They serve as the blueprints for containers. Docker Containers Containers are instances of Docker images. They are isolated environments where applications run. Containers are highly portable and can be executed consistently across various environments. Docker's simplicity and ease of use made it a go-to choice for developers and operators. However, managing a large number of containers at scale and ensuring high availability required a more sophisticated solution, which led to the rise of Kubernetes. Kubernetes: Orchestrating Containers at Scale Kubernetes, often abbreviated as K8s, is an open-source container orchestration platform originally developed by Google. It provides a framework for automating the deployment, scaling, and management of containerized applications. Here's a glimpse of Kubernetes' core components: Master Node The Kubernetes master node is responsible for controlling the cluster. It manages container orchestration, scaling, and load balancing. Worker Nodes Worker nodes, also known as Minions, host containers and run the tasks assigned by the master node. They provide the computing resources needed to run containers. Pods Pods are the smallest deployable units in Kubernetes. They can contain one or more containers that share the same network namespace, storage, and IP address. Services Kubernetes services enable network communication between different sets of pods. They abstract the network and ensure that applications can discover and communicate with each other reliably. Deployments Deployments in Kubernetes allow you to declaratively define the desired state of your application and ensure that the current state matches it. This enables rolling updates and automatic rollbacks in case of failures. The Docker-Kubernetes Synergy Docker and Kubernetes are often used together to create a comprehensive containerization and orchestration solution. Docker simplifies the packaging and distribution of containerized applications, while Kubernetes takes care of their deployment and management at scale. Here's how Docker and Kubernetes work together: Building Docker Images: Developers use Docker to build and package their applications into Docker images. These images are then pushed to a container registry, such as Docker Hub or Google Container Registry. Kubernetes Deployments: Kubernetes takes the Docker images and orchestrates the deployment of containers across a cluster of nodes. Developers define the desired state of their application using Kubernetes YAML manifests, including the number of replicas, resource requirements, and networking settings. Scaling and Load Balancing: Kubernetes can automatically scale the number of container replicas based on resource utilization or traffic load. It also manages load balancing to ensure high availability and efficient resource utilization. Service Discovery: Kubernetes services enable easy discovery and communication between different parts of an application. Services can be exposed internally or externally, depending on the use case. Rolling Updates: Kubernetes supports rolling updates and rollbacks, allowing applications to be updated with minimal downtime and the ability to revert to a previous version in case of issues. The Significance in Modern Application Deployment The adoption of Docker and Kubernetes has had a profound impact on modern application deployment practices. Here's why they are crucial: Portability: Containers encapsulate everything an application needs, making it highly portable. Developers can build once and run anywhere, from their local development environment to a public cloud or on-premises data center. Efficiency: Containers are lightweight and start quickly, making them efficient in terms of resource utilization and time to deployment. Scalability: Kubernetes allows applications to scale up or down automatically based on demand, ensuring optimal resource allocation and high availability. Consistency: Containers provide consistency across different environments, reducing the "it works on my machine" problem and streamlining the development and operations pipeline. DevOps Enablement: Docker and Kubernetes promote DevOps practices by enabling developers and operators to collaborate seamlessly, automate repetitive tasks, and accelerate the software delivery lifecycle. Conclusion In conclusion, Docker and Kubernetes are at the forefront of containerization and container orchestration technologies. They have reshaped the way applications are developed, deployed, and managed in the modern era. By combining the simplicity of Docker with the power of Kubernetes, organizations can achieve agility, scalability, and reliability in their application deployment processes. Embracing these technologies is not just a trend but a strategic move for staying competitive in the ever-evolving world of software development. As you embark on your containerization journey with Docker and Kubernetes, remember that continuous learning and best practices are key to success. Stay curious, explore new features, and leverage the vibrant communities surrounding these technologies to unlock their full potential in your organization's quest for innovation and efficiency. Containerization is not just a technology; it's a mindset that empowers you to build, ship, and run your applications with confidence in a rapidly changing digital landscape. More

Spring Authentication With MetaMask

By Alexander Makeev

AI in Java: Building a ChatGPT Clone With Spring Boot and LangChain

By Marcus Hellberg

Spring WebFlux Retries

By Emmanouil Gkatziouras CORE

Build a Serverless App Fast With Zipper: Write TypeScript, Offload Everything Else

I remember the first time I saw a demonstration of Ruby on Rails. With very little effort, demonstrators created a full-stack web application that could be used for real business purposes. I was impressed – especially when I thought about how much time it took me to deliver similar solutions using the Seam and Struts frameworks. Ruby was created in 1993 to be an easy-to-use scripting language that also included object-oriented features. Ruby on Rails took things to the next level in the mid 2000s – arriving at the right time to become the tech-of-choice for the initial startup efforts of Twitter, Shopify, GitHub, and Airbnb. I began to ask the question, “Is it possible to have a product, like Ruby on Rails, without needing to worry about the infrastructure or underlying data tier?” That’s when I discovered the Zipper platform. About Zipper Zipper is a platform for building web services using simple TypeScript functions. You use Zipper to create applets (not related to Java, though they share the same name), which are then built and deployed on Zipper’s platform. The coolest thing about Zipper is that it lets you focus on coding your solution using TypeScript, and you don’t need to worry about anything else. Zipper takes care of: User interface Infrastructure to host your solution Persistence layer APIs to interact with your applet Authentication Although the platform is currently in beta, it’s open for consumers to use. At the time I wrote this article, there were four templates in place to help new adopters get started: Hello World – a basic applet to get you started CRUD Template – offers a ToDo list where items can be created, viewed, updated, and deleted Slack App Template – provides an example on how to interact with the Slack service AI-Generated Code – expresses your solution in human language and lets AI create an applet for you There is also a gallery on the Zipper platform that provides applets that can be forked in the same manner as Git-based repositories. I thought I would put the Zipper platform to the test and create a ballot applet. HOA Ballot Use Case The homeowner’s association (HOA) concept started to gain momentum in the United States back in the 20th century. Subdivisions formed HOAs to handle things like the care of common areas and for establishing rules and guidelines for residents. Their goal is to maintain the subdivision’s quality of living as a whole, long after the home builder has finished development. HOAs often hold elections to allow homeowners to vote on the candidate they feel best matches their views and perspectives. In fact, last year I published an article on how an HOA ballot could be created using Web3 technologies. For this article, I wanted to take the same approach using Zipper. Ballot Requirements The requirements for the ballot applet are: As a ballot owner, I need the ability to create a list of candidates for the ballot. As a ballot owner, I need the ability to create a list of registered voters. As a voter, I need the ability to view the list of candidates. As a voter, I need the ability to cast one vote for a single candidate. As a voter, I need the ability to see a current tally of votes that have been cast for each candidate. Additionally, I thought some stretch goals would be nice too: As a ballot owner, I need the ability to clear all candidates. As a ballot owner, I need the ability to clear all voters. As a ballot owner, I need the ability to set a title for the ballot. As a ballot owner, I need the ability to set a subtitle for the ballot. Designing the Ballot Applet To start working on the Zipper platform, I navigated to Zipper's website and clicked the Sign In button. Next, I selected an authentication source: Once logged in, I used the Create Applet button from the dashboard to create a new applet: A unique name is generated, but that can be changed to better identify your use case. For now, I left all the defaults the same and pushed the Next button – which allowed me to select from four different templates for applet creation. I started with the CRUD template because it provides a solid example of how the common create, view, update, and delete flows work on the Zipper platform. Once the code was created, the screen appears as shown below: With a fully functional applet in place, we can now update the code to meet the HOA ballot use requirements. Establish Core Elements For the ballot applet, the first thing I wanted to do was update the types.ts file as shown below: TypeScript export type Candidate = { id: string; name: string; votes: number; }; export type Voter = { email: string; name: string; voted: boolean; }; I wanted to establish constant values for the ballot title and subtitle within a new file called constants.ts: TypeScript export class Constants { static readonly BALLOT_TITLE = "Sample Ballot"; static readonly BALLOT_SUBTITLE = "Sample Ballot Subtitle"; }; To allow only the ballot owner to make changes to the ballot, I used the Secrets tab for the applet to create an owner secret with the value of my email address. Then I introduced a common.ts file which contained the validateRequest() function: TypeScript export function validateRequest(context: Zipper.HandlerContext) { if (context.userInfo?.email !== Deno.env.get('owner')) { return ( <> <Markdown> {`### Error: You are not authorized to perform this action`} </Markdown> </> ); } }; This way I could pass in the context to this function to make sure only the value in the owner secret would be allowed to make changes to the ballot and voters. Establishing Candidates After understanding how the ToDo item was created in the original CRUD applet, I was able to introduce the create-candidate.ts file as shown below: TypeScript import { Candidate } from "./types.ts"; import { validateRequest } from "./common.ts"; type Input = { name: string; }; export async function handler({ name }: Input, context: Zipper.HandlerContext) { validateRequest(context); const candidates = (await Zipper.storage.get<Candidate[]>("candidates")) || []; const newCandidate: Candidate = { id: crypto.randomUUID(), name: name, votes: 0, }; candidates.push(newCandidate); await Zipper.storage.set("candidates", candidates); return newCandidate; } For this use case, we just need to provide a candidate name, but the Candidate object contains a unique ID and the number of votes received. While here, I went ahead and wrote the delete-all-candidates.ts file, which removes all candidates from the key/value data store: TypeScript import { validateRequest } from "./common.ts"; type Input = { force: boolean; }; export async function handler( { force }: Input, context: Zipper.HandlerContext ) { validateRequest(context); if (force) { await Zipper.storage.set("candidates", []); } } At this point, I used the Preview functionality to create Candidate A, Candidate B, and Candidate C: Registering Voters With the ballot ready, I needed the ability to register voters for the ballot. So I added a create-voter.ts file with the following content: TypeScript import { Voter } from "./types.ts"; import { validateRequest } from "./common.ts"; type Input = { email: string; name: string; }; export async function handler( { email, name }: Input, context: Zipper.HandlerContext ) { validateRequest(context); const voters = (await Zipper.storage.get<Voter[]>("voters")) || []; const newVoter: Voter = { email: email, name: name, voted: false, }; voters.push(newVoter); await Zipper.storage.set("voters", voters); return newVoter; } To register a voter, I decided to provide inputs for email address and name. There is also a boolean property called voted which will be used to enforce the vote-only-once rule. Like before, I went ahead and created the delete-all-voters.ts file: TypeScript import { validateRequest } from "./common.ts"; type Input = { force: boolean; }; export async function handler( { force }: Input, context: Zipper.HandlerContext ) { validateRequest(context); if (force) { await Zipper.storage.set("voters", []); } } Now that we were ready to register some voters, I registered myself as a voter for the ballot: Creating the Ballot The last thing I needed to do was establish the ballot. This involved updating the main.ts as shown below: TypeScript import { Constants } from "./constants.ts"; import { Candidate, Voter } from "./types.ts"; type Input = { email: string; }; export async function handler({ email }: Input) { const voters = (await Zipper.storage.get<Voter[]>("voters")) || []; const voter = voters.find((v) => v.email == email); const candidates = (await Zipper.storage.get<Candidate[]>("candidates")) || []; if (email && voter && candidates.length > 0) { return { candidates: candidates.map((candidate) => { return { Candidate: candidate.name, Votes: candidate.votes, actions: [ Zipper.Action.create({ actionType: "button", showAs: "refresh", path: "vote", text: `Vote for ${candidate.name}`, isDisabled: voter.voted, inputs: { candidateId: candidate.id, voterId: voter.email, }, }), ], }; }), }; } else if (!email) { <> <h4>Error:</h4> <p> You must provide a valid email address in order to vote for this ballot. </p> </>; } else if (!voter) { return ( <> <h4>Invalid Email Address:</h4> <p> The email address provided ({email}) is not authorized to vote for this ballot. </p> </> ); } else { return ( <> <h4>Ballot Not Ready:</h4> <p>No candidates have been configured for this ballot.</p> <p>Please try again later.</p> </> ); } } export const config: Zipper.HandlerConfig = { description: { title: Constants.BALLOT_TITLE, subtitle: Constants.BALLOT_SUBTITLE, }, }; I added the following validations as part of the processing logic: The email property must be included or else a “You must provide a valid email address in order to vote for this ballot” message will be displayed. The email value provided must match a registered voter or else a “The email address provided is not authorized to vote for this ballot” message will be displayed. There must be at least one candidate to vote on or else a “No candidates have been configured for this ballot” message will be displayed. If the registered voter has already voted, the voting buttons will be disabled for all candidates on the ballot. The main.ts file contains a button for each candidate, all of which call the vote.ts file, displayed below: TypeScript import { Candidate, Voter } from "./types.ts"; type Input = { candidateId: string; voterId: string; }; export async function handler({ candidateId, voterId }: Input) { const candidates = (await Zipper.storage.get<Candidate[]>("candidates")) || []; const candidate = candidates.find((c) => c.id == candidateId); const candidateIndex = candidates.findIndex(c => c.id == candidateId); const voters = (await Zipper.storage.get<Voter[]>("voters")) || []; const voter = voters.find((v) => v.email == voterId); const voterIndex = voters.findIndex(v => v.email == voterId); if (candidate && voter) { candidate.votes++; candidates[candidateIndex] = candidate; voter.voted = true; voters[voterIndex] = voter; await Zipper.storage.set("candidates", candidates); await Zipper.storage.set("voters", voters); return `${voter.name} successfully voted for ${candidate.name}`; } return `Could not vote. candidate=${ candidate }, voter=${ voter }`; } At this point, the ballot applet was ready for use. HOA Ballot In Action For each registered voter, I would send them an email with a link similar to what is listed below: https://squeeking-echoing-cricket.zipper.run/run/main.ts?email=some.email@example.com The link would be customized to provide the appropriate email address for the email query parameter. Clicking the link runs the main.ts file and passes in the email parameter, avoiding the need for the registered voter to have to type in their email address. The ballot appears as shown below: I decided to cast my vote for Candidate B. Once I pushed the button, the ballot was updated as shown: The number of votes for Candidate B increased by one, and all of the voting buttons were disabled. Success! Conclusion Looking back on the requirements for the ballot applet, I realized I was able to meet all of the criteria, including the stretch goals in about two hours—and this included having a UI, infrastructure, and deployment. The best part of this experience was that 100% of my time was focused on building my solution, and I didn’t need to spend any time dealing with infrastructure or even the persistence store. My readers may recall that I have been focused on the following mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” - J. Vester The Zipper platform adheres to my personal mission statement 100%. In fact, they have been able to take things a step further than Ruby on Rails did, because I don’t have to worry about where my service will run or what data store I will need to configure. Using the applet approach, my ballot is already deployed and ready for use. If you are interested in giving applets a try, simply login to zipper.dev and start building. Currently, using the Zipper platform is free. Give the AI-Generated Code template a try, as it is really cool to provide a paragraph of what you want to build and see how closely the resulting applet matches what you have in mind. If you want to give my ballot applet a try, it is also available to fork in the Zipper gallery. Have a really great day!

By John Vester CORE

Ansible by Example

In my previous posting, I explained how to run Ansible scripts using a Linux virtual machine on Windows Hyper-V. This article aims to ease novices into Ansible IAC at the hand of an example. The example being booting one's own out-of-cloud Kubernetes cluster. As such, the intricacies of the steps required to boot a local k8s cluster are beyond the scope of this article. The steps can, however, be studied at the GitHub repo, where the Ansible scripts are checked in. The scripts were tested on Ubuntu20, running virtually on Windows Hyper-V. Network connectivity was established via an external virtual network switch on an ethernet adaptor shared between virtual machines but not with Windows. Dynamic memory was switched off from the Hyper-V UI. An SSH service daemon was pre-installed to allow Ansible a tty terminal to run commands from. Bootstrapping the Ansible User Repeatability through automation is a large part of DevOps. It cuts down on human error, after all. Ansible, therefore, requires a standard way to establish a terminal for the various machines under its control. This can be achieved using a public/private key pairing for SSH authentication. The keys can be generated for an Elliptic Curve Algorithm as follows: ssh-keygen -f ansible -t ecdsa -b 521 The Ansible script to create and match an account to the keys is: YAML --- - name: Bootstrap ansible hosts: all become: true tasks: - name: Add ansible user ansible.builtin.user: name: ansible shell: /bin/bash become: true - name: Add SSH key for ansible ansible.posix.authorized_key: user: ansible key: "{{ lookup('file', 'ansible.pub') }" state: present exclusive: true # to allow revocation # Join the key options with comma (no space) to lock down the account: key_options: "{{ ','.join([ 'no-agent-forwarding', 'no-port-forwarding', 'no-user-rc', 'no-x11-forwarding' ]) }" # noqa jinja[spacing] become: true - name: Configure sudoers community.general.sudoers: name: ansible user: ansible state: present commands: ALL nopassword: true runas: ALL # ansible user should be able to impersonate someone else become: true Ansible is declarative, and this snippet depicts a series of tasks that ensure that: The Ansible user exists; The keys are added for SSH authentication and The Ansible user can execute with elevated privilege using sudo Towards the top is something very important, and it might go unnoticed under a cursory gaze: hosts: all What does this mean? The answer to this puzzle can be easily explained at the hand of the Ansible inventory file: YAML masters: hosts: host1: ansible_host: "192.168.68.116" ansible_connection: ssh ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible comasters: hosts: co-master_vivobook: ansible_connection: ssh ansible_host: "192.168.68.109" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible workers: hosts: client1: ansible_connection: ssh ansible_host: "192.168.68.115" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible client2: ansible_connection: ssh ansible_host: "192.168.68.130" ansible_user: atmin ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible It is the register of all machines the Ansible project is responsible for. Since our example project concerns a high availability K8s cluster, it consists of sections for the master, co-masters, and workers. Each section can contain more than one machine. The root-enabled account atmin on display here was created by Ubuntu during installation. The answer to the question should now be clear — the host key above specifies that every machine in the cluster will have an account called Ansible created according to the specification of the YAML. The command to run the script is: ansible-playbook --ask-pass bootstrap/bootstrap.yml -i atomika/atomika_inventory.yml -K The locations of the user bootstrapping YAML and the inventory files are specified. The command, furthermore, requests password authentication for the user from the inventory file. The -K switch, on its turn, asks that the superuser password be prompted. It is required by tasks that are specified to be run as root. It can be omitted should the script run from the root. Upon successful completion, one should be able to login to the machines using the private key of the ansible user: ssh ansible@172.28.110.233 -i ansible Note that since this account is not for human use, the bash shell is not enabled. Nevertheless, one can access the home of root (/root) using 'sudo ls /root' The user account can now be changed to ansible and the location of the private key added for each machine in the inventory file: YAML host1: ansible_host: "192.168.68.116" ansible_connection: ssh ansible_user: ansible ansible_ssh_common_args: "-o ControlMaster=no -o ControlPath=none" ansible_ssh_private_key_file: ./bootstrap/ansible One Master To Rule Them All We are now ready to boot the K8s master: ansible-playbook atomika/k8s_master_init.yml -i atomika/atomika_inventory.yml --extra-vars='kubectl_user=atmin' --extra-vars='control_plane_ep=192.168.68.119' The content of atomika/k8s_master_init.yml is: YAML # k8s_master_init.yml - hosts: masters become: yes become_method: sudo become_user: root gather_facts: yes connection: ssh roles: - atomika_base vars_prompt: - name: "control_plane_ep" prompt: "Enter the DNS name of the control plane load balancer?" private: no - name: "kubectl_user" prompt: "Enter the name of the existing user that will execute kubectl commands?" private: no tasks: - name: Initializing Kubernetes Cluster become: yes # command: kubeadm init --pod-network-cidr 10.244.0.0/16 --control-plane-endpoint "{{ ansible_eno1.ipv4.address }:6443" --upload-certs command: kubeadm init --pod-network-cidr 10.244.0.0/16 --control-plane-endpoint "{{ control_plane_ep }:6443" --upload-certs #command: kubeadm init --pod-network-cidr 10.244.0.0/16 --upload-certs run_once: true #delegate_to: "{{ k8s_master_ip }" - pause: seconds=30 - name: Create directory for kube config of {{ ansible_user }. become: yes file: path: /home/{{ ansible_user }/.kube state: directory owner: "{{ ansible_user }" group: "{{ ansible_user }" mode: 0755 - name: Copy /etc/kubernetes/admin.conf to user home directory /home/{{ ansible_user }/.kube/config. copy: src: /etc/kubernetes/admin.conf dest: /home/{{ ansible_user }/.kube/config remote_src: yes owner: "{{ ansible_user }" group: "{{ ansible_user }" mode: '0640' - pause: seconds=30 - name: Remove the cache directory. file: path: /home/{{ ansible_user }/.kube/cache state: absent - name: Create directory for kube config of {{ kubectl_user }. become: yes file: path: /home/{{ kubectl_user }/.kube state: directory owner: "{{ kubectl_user }" group: "{{ kubectl_user }" mode: 0755 - name: Copy /etc/kubernetes/admin.conf to user home directory /home/{{ kubectl_user }/.kube/config. copy: src: /etc/kubernetes/admin.conf dest: /home/{{ kubectl_user }/.kube/config remote_src: yes owner: "{{ kubectl_user }" group: "{{ kubectl_user }" mode: '0640' - pause: seconds=30 - name: Remove the cache directory. file: path: /home/{{ kubectl_user }/.kube/cache state: absent - name: Create Pod Network & RBAC. become_user: "{{ ansible_user }" become_method: sudo become: yes command: "{{ item }" with_items: kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml - pause: seconds=30 - name: Configure kubectl command auto-completion for {{ ansible_user }. lineinfile: dest: /home/{{ ansible_user }/.bashrc line: 'source <(kubectl completion bash)' insertafter: EOF - name: Configure kubectl command auto-completion for {{ kubectl_user }. lineinfile: dest: /home/{{ kubectl_user }/.bashrc line: 'source <(kubectl completion bash)' insertafter: EOF ... From the host keyword, one can see these tasks are only enforced on the master node. However, two things are worth explaining. The Way Ansible Roles The first is the inclusion of the atomika_role towards the top: YAML roles: - atomika_base The official Ansible documentation states that: "Roles let you automatically load related vars, files, tasks, handlers, and other Ansible artifacts based on a known file structure." The atomika_base role is included in all three of the Ansible YAML scripts that maintain the master, co-masters, and workers of the cluster. Its purpose is to lay the base by making sure that tasks common to all three member types have been executed. As stated above, an ansible role follows a specific directory structure that can contain file templates, tasks, and variable declaration, amongst other things. The Kubernetes and ContainerD versions are, for example, declared in the YAML of variables: YAML k8s_version: 1.28.2-00 containerd_version: 1.6.24-1 In short, therefore, development can be fast-tracked through the use of roles developed by the Ansible community that open-sourced it at Ansible Galaxy. Dealing the Difference The second thing of interest is that although variables can be passed in from the command line using the --extra-vars switch, as can be seen, higher up, Ansible can also be programmed to prompt when a value is not set: YAML vars_prompt: - name: "control_plane_ep" prompt: "Enter the DNS name of the control plane load balancer?" private: no - name: "kubectl_user" prompt: "Enter the name of the existing user that will execute kubectl commands?" private: no Here, prompts are specified to ask for the user that should have kubectl access and the IP address of the control plane. Should the script execute without error, the state of the cluster should be: atmin@kxsmaster2:~$ kubectl get pods -o wide -A NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-flannel kube-flannel-ds-mg8mr 1/1 Running 0 114s 192.168.68.111 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-bkzgd 1/1 Running 0 3m31s 10.244.0.6 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-vzkw2 1/1 Running 0 3m31s 10.244.0.7 kxsmaster2 <none> <none> kube-system etcd-kxsmaster2 1/1 Running 0 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-apiserver-kxsmaster2 1/1 Running 0 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-controller-manager-kxsmaster2 1/1 Running 7 3m45s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-proxy-69cqq 1/1 Running 0 3m32s 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-scheduler-kxsmaster2 1/1 Running 7 3m45s 192.168.68.111 kxsmaster2 <none> <none> All the pods required to make up the control plane run on the one master node. Should you wish to run a single-node cluster for development purposes, do not forget to remove the taint that prevents scheduling on the master node(s). kubectl taint node --all node-role.kubernetes.io/control-plane:NoSchedule- However, a cluster consisting of one machine is not a true cluster. This will be addressed next. Kubelets of the Cluster, Unite! Kubernetes, as an orchestration automaton, needs to be resilient by definition. Consequently, developers and a buggy CI/CD pipeline should not touch the master nodes by scheduling load on it. Therefore, Kubernetes increases resilience by expecting multiple worker nodes to join the cluster and carry the load: ansible-playbook atomika/k8s_workers.yml -i atomika/atomika_inventory.yml The content of k8x_workers.yml is: YAML # k8s_workers.yml --- - hosts: workers, vmworkers remote_user: "{{ ansible_user }" become: yes become_method: sudo gather_facts: yes connection: ssh roles: - atomika_base - hosts: masters tasks: - name: Get the token for joining the nodes with Kuberenetes master. become_user: "{{ ansible_user }" shell: kubeadm token create --print-join-command register: kubernetes_join_command - name: Generate the secret for joining the nodes with Kuberenetes master. become: yes shell: kubeadm init phase upload-certs --upload-certs register: kubernetes_join_secret - name: Copy join command to local file. become: false local_action: copy content="{{ kubernetes_join_command.stdout_lines[0] } --certificate-key {{ kubernetes_join_secret.stdout_lines[2] }" dest="/tmp/kubernetes_join_command" mode=0700 - hosts: workers, vmworkers #remote_user: k8s5gc #become: yes #become_metihod: sudo become_user: root gather_facts: yes connection: ssh tasks: - name: Copy join command to worker nodes. become: yes become_method: sudo become_user: root copy: src: /tmp/kubernetes_join_command dest: /tmp/kubernetes_join_command mode: 0700 - name: Join the Worker nodes with the master. become: yes become_method: sudo become_user: root command: sh /tmp/kubernetes_join_command register: joined_or_not - debug: msg: "{{ joined_or_not.stdout }" ... There are two blocks of tasks — one with tasks to be executed on the master and one with tasks for the workers. This ability of Ansible to direct blocks of tasks to different member types is vital for cluster formation. The first block extracts and augments the join command from the master, while the second block executes it on the worker nodes. The top and bottom portions from the console output can be seen here: YAML janrb@dquick:~/atomika$ ansible-playbook atomika/k8s_workers.yml -i atomika/atomika_inventory.yml [WARNING]: Could not match supplied host pattern, ignoring: vmworkers PLAY [workers, vmworkers] ********************************************************************************************************************************************************************* TASK [Gathering Facts] ************************************************************************************************************************************************************************ok: [client1] ok: [client2] ........................................................................... TASK [debug] **********************************************************************************************************************************************************************************ok: [client1] => { "msg": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Starting the kubelet\n[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...\n\nThis node has joined the cluster:\n* Certificate signing request was sent to apiserver and a response was received.\n* The Kubelet was informed of the new secure connection details.\n\nRun 'kubectl get nodes' on the control-plane to see this node join the cluster." } ok: [client2] => { "msg": "[preflight] Running pre-flight checks\n[preflight] Reading configuration from the cluster...\n[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Starting the kubelet\n[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...\n\nThis node has joined the cluster:\n* Certificate signing request was sent to apiserver and a response was received.\n* The Kubelet was informed of the new secure connection details.\n\nRun 'kubectl get nodes' on the control-plane to see this node join the cluster." } PLAY RECAP ************************************************************************************************************************************************************************************client1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 client1 : ok=23 changed=6 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 client2 : ok=23 changed=6 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0 host1 : ok=4 changed=3 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 Four tasks were executed on the master node to determine the join command, while 23 commands ran on each of the two clients to ensure they were joined to the cluster. The tasks from the atomika-base role accounts for most of the worker tasks. The cluster now consists of the following nodes, with the master hosting the pods making up the control plane: atmin@kxsmaster2:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8xclient1 Ready <none> 23m v1.28.2 192.168.68.116 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 kxsclient2 Ready <none> 23m v1.28.2 192.168.68.113 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 kxsmaster2 Ready control-plane 34m v1.28.2 192.168.68.111 <none> Ubuntu 20.04.6 LTS 5.4.0-163-generic containerd://1.6.24 With Nginx deployed, the following pods will be running on the various members of the cluster: atmin@kxsmaster2:~$ kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-7854ff8877-g8lvh 1/1 Running 0 20s 10.244.1.2 kxsclient2 <none> <none> kube-flannel kube-flannel-ds-4dgs5 1/1 Running 1 (8m58s ago) 26m 192.168.68.116 k8xclient1 <none> <none> kube-flannel kube-flannel-ds-c7vlb 1/1 Running 1 (8m59s ago) 26m 192.168.68.113 kxsclient2 <none> <none> kube-flannel kube-flannel-ds-qrwnk 1/1 Running 0 35m 192.168.68.111 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-pqp2s 1/1 Running 0 37m 10.244.0.9 kxsmaster2 <none> <none> kube-system coredns-5dd5756b68-rh577 1/1 Running 0 37m 10.244.0.8 kxsmaster2 <none> <none> kube-system etcd-kxsmaster2 1/1 Running 1 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-apiserver-kxsmaster2 1/1 Running 1 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-controller-manager-kxsmaster2 1/1 Running 8 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-proxy-bdzlv 1/1 Running 1 (8m58s ago) 26m 192.168.68.116 k8xclient1 <none> <none> kube-system kube-proxy-ln4fx 1/1 Running 1 (8m59s ago) 26m 192.168.68.113 kxsclient2 <none> <none> kube-system kube-proxy-ndj7w 1/1 Running 0 37m 192.168.68.111 kxsmaster2 <none> <none> kube-system kube-scheduler-kxsmaster2 1/1 Running 8 37m 192.168.68.111 kxsmaster2 <none> <none> All that remains is to expose the Nginx pod using an instance of NodePort, LoadBalancer, or Ingress to the outside world. Maybe more on that in another article... Conclusion This posting explained the basic concepts of Ansible at the hand of scripts booting up a K8s cluster. The reader should now grasp enough concepts to understand tutorials and search engine results and to make a start at using Ansible to set up infrastructure using code.

By Jan-Rudolph Bührmann

Microservices With Apache Camel and Quarkus (Part 5)

In part three of this series, we have seen how to deploy our Quarkus/Camel-based microservices in Minikube, which is one of the most commonly used Kubernetes local implementations. While such a local Kubernetes implementation is very practical for testing purposes, its single-node feature doesn't satisfy real production environment requirements. Hence, in order to check our microservices behavior in a production-like environment, we need a multi-node Kubernetes implementation. And one of the most common is OpenShift. What Is OpenShift? OpenShift is an open-source, enterprise-grade platform for container application development, deployment, and management based on Kubernetes. Developed by Red Hat as a component layer on top of a Kubernetes cluster, it comes both as a commercial product and a free platform or both as on-premise and cloud infrastructure. The figure below depicts this architecture. As with any Kubernetes implementation, OpenShift has its complexities, and installing it as a standalone on-premise platform isn't a walk in the park. Using it as a managed platform on a dedicated cloud like AWS, Azure, or GCP is a more practical approach, at least in the beginning, but it requires a certain enterprise organization. For example, ROSA (Red Hat OpenShift Service on AWS) is a commercial solution that facilitates the rapid creation and the simple management of a full Kubernetes infrastructure, but it isn't really a developer-friendly environment allowing it to quickly develop, deploy and test cloud-native services. For this later use case, Red Hat offers the OpenShift Developer's Sandbox, a development environment that gives immediate access to OpenShift without any heavy installation or subscription process and where developers can start practicing their skills and learning cycle, even before having to work on real projects. This totally free service, which doesn't require any credit card but only a Red Hat account, provides a private OpenShift environment in a shared, multi-tenant Kubernetes cluster that is pre-configured with a set of developer tools, like Java, Node.js, Python, Go, C#, including a catalog of Helm charts, the s2i build tool, and OpenShift Dev Spaces. In this post, we'll be using OpenShift Developer's Sandbox to deploy our Quarkus/Camel microservices. Deploying on OpenShift In order to deploy on OpenShift, Quarkus applications need to include the OpenShift extension. This might be done using the Qurakus CLI, of course, but given that our project is a multi-module maven one, a more practical way of doing it is to directly include the following dependency in the master POM: XML <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-openshift</artifactId> </dependency> <dependency> <groupId>io.quarkus</groupId> <artifactId>quarkus-container-image-openshift</artifactId> </dependency> This way, all the sub-modules will inherit the dependencies. OpenShift is supposed to work with vanilla Kubernetes resources; hence, our previous recipe, where we deployed our microservices on Minikube, should also apply here. After all, both Minikube and OpenShift are implementations of the same de facto standard: Kubernetes. If we look back at part three of this series, our Jib-based build and deploy process was generating vanilla Kubernetes manifest files (kubernetes.yaml), as well as Minikube ones (minikube.yaml). Then, we had the choice between using the vanilla-generated Kubernetes resources or the more specific Minikube ones, and we preferred the latter alternative. While the Minikube-specific manifest files could only work when deployed on Minikube, the vanilla Kubernetes ones are supposed to work the same way on Minikube as well as on any other Kubernetes implementation, like OpenShift. However, in practice, things are a bit more complicated, and, as far as I'm concerned, I failed to successfully deploy on OpenShift vanilla Kubernetes manifests generated by Jib. What I needed to do was to rename most of the properties whose names satisfy the pattern quarkus.kubernetes.* by quarkus.openshift.*. Also, some vanilla Kubernetes properties, for example quarkus.kubernetes.ingress.expose, have a completely different name for OpenShift. In this case quarkus.openshift.route.expose. But with the exception of these almost cosmetic alterations, everything remains on the same site as in our previous recipe of part three. Now, in order to deploy our microservices on OpenShift Developer's Sandbox, proceed as follows. Log in to OpenShift Developer's Sandbox Here are the required steps to log in to OpenShift Developer Sandbox: Fire your preferred browser and go to the OpenShift Developer's Sandbox site Click on the Login link in the upper right corner (you need to already have registered with the OpenShift Developer Sandbox) Click on the red button labeled Start your sandbox for free in the center of the screen In the upper right corner, unfold your user name and click on the Copy login command button In the new dialog labeled Log in with ... click on the DevSandbox link A new page is displayed with a link labeled Display Token. Click on this link. Copy and execute the displayed oc command, for example: Shell $ oc login --token=... --server=https://api.sandbox-m3.1530.p1.openshiftapps.com:6443 Clone the Project From GitHub Here are the steps required to clone the project's GitHub repository: Shell $ git clone https://github.com/nicolasduminil/aws-camelk.git $ cd aws-camelk $ git checkout openshift Create the OpenShift Secret In order to connect to AWS resources, like S3 buckets and SQS queues, we need to provide AWS credentials. These credentials are the Access Key ID and the Secret Access Key. There are several ways to provide these credentials, but here, we chose to use Kubernetes secrets. Here are the required steps: First, encode your Access Key ID and Secret Access Key in Base64 as follows: Shell $ echo -n <your AWS access key ID> | base64 $ echo -n <your AWS secret access key> | base64 Edit the file aws-secret.yaml and amend the following lines such that to replace ... by the Base64 encoded values: Shell AWS_ACCESS_KEY_ID: ... AWS_SECRET_ACCESS_KEY: ... Create the OpenShift secret containing the AWS access key ID and secret access key: Shell $ kubectl apply -f aws-secret.yaml Start the Microservices In order to start the microservices, run the following script: Shell $ ./start-ms.sh This script is the same as the one in our previous recipe in part three: Shell #!/bin/sh ./delete-all-buckets.sh ./create-queue.sh sleep 10 mvn -DskipTests -Dquarkus.kubernetes.deploy=true clean install sleep 3 ./copy-xml-file.sh The copy-xml-file.sh script that is used here in order to trigger the Camel file poller has been amended slightly: Shell #!/bin/sh aws_camel_file_pod=$(oc get pods | grep aws-camel-file | grep -wv -e build -e deploy | awk '{print $1}') cat aws-camelk-model/src/main/resources/xml/money-transfers.xml | oc exec -i $aws_camel_file_pod -- sh -c "cat > /tmp/input/money-transfers.xml" Here, we replaced the kubectl commands with the oc ones. Also, given that OpenShift has this particularity of creating pods not only for the microservices but also for the build and the deploy commands, we need to filter out in the list of the running pods the ones having string occurrences of build and deploy. Running this script might take some time. Once finished, make sure that all the required OpenShift controllers are running: Shell $ oc get is NAME IMAGE REPOSITORY TAGS UPDATED aws-camel-file default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-file 1.0.0-SNAPSHOT 17 minutes ago aws-camel-jaxrs default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-jaxrs 1.0.0-SNAPSHOT 9 minutes ago aws-camel-s3 default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-s3 1.0.0-SNAPSHOT 16 minutes ago aws-camel-sqs default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/aws-camel-sqs 1.0.0-SNAPSHOT 13 minutes ago openjdk-11 default-route-openshift-image-registry.apps.sandbox-m3.1530.p1.openshiftapps.com/nicolasduminil-dev/openjdk-11 1.10,1.10-1,1.10-1-source,1.10-1.1634738701 + 46 more... 18 minutes ago $ oc get pods NAME READY STATUS RESTARTS AGE aws-camel-file-1-build 0/1 Completed 0 19m aws-camel-file-1-d72w5 1/1 Running 0 18m aws-camel-file-1-deploy 0/1 Completed 0 18m aws-camel-jaxrs-1-build 0/1 Completed 0 14m aws-camel-jaxrs-1-deploy 0/1 Completed 0 10m aws-camel-jaxrs-1-pkf6n 1/1 Running 0 10m aws-camel-s3-1-76sqz 1/1 Running 0 17m aws-camel-s3-1-build 0/1 Completed 0 18m aws-camel-s3-1-deploy 0/1 Completed 0 17m aws-camel-sqs-1-build 0/1 Completed 0 17m aws-camel-sqs-1-deploy 0/1 Completed 0 14m aws-camel-sqs-1-jlgkp 1/1 Running 0 14m oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE aws-camel-jaxrs ClusterIP 172.30.192.74 <none> 80/TCP 11m modelmesh-serving ClusterIP None <none> 8033/TCP,8008/TCP,8443/TCP,2112/TCP 18h As shown in the listing above, all the required image streams have been created, and all the pods are either completed or running. The completed pods are the ones associated with the build and deploy operations. The running ones are associated with the microservices. There is only one service running: aws-camel-jaxrs. This service makes it possible to communicate with the pod that runs the aws-camel-jaxrs microservice by exposing the route to it. This is automatically done in effect to the quarkus.openshift.route.expose=true property. And the microservice aws-camel-sqs needs, as a matter of fact, to communicate with aws-camel-sqs and, consequently, it needs to know the route to it. To get this route, you may proceed as follows: Shell $ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD aws-camel-jaxrs aws-camel-jaxrs-nicolasduminil-dev.apps.sandbox-m3.1530.p1.openshiftapps.com aws-camel-jaxrs http None Now open the application.properties file associated with the aws-camel-sqs microservice and modify the property rest-uri such that to read as follows: Properties files rest-uri=aws-camel-jaxrs-nicolasduminil-dev.apps.sandbox-m3.1530.p1.openshiftapps.com/xfer Here, you have to replace the namespace nicolasduminil-dev with the value which makes sense in your case. Now, you need to stop the microservices and start them again: Shell $ ./kill-ms.sh ... $ ./start-ms.sh ... Your microservices should run as expected now, and you may check the log files by using commands like: Shell $ oc logs aws-camel-jaxrs-1-pkf6n As you may see, in order to get the route to the aws-camel-jaxrs service, we need to start, to stop, and to start our microservices again. This solution is far from being elegant, but I didn't find any other, and I'm relying on the advised reader to help me improve it. It's probably possible to use the OpenShift Java client in order to perform, in Java code, the same thing as the oc get routes command is doing, but I didn't find how, and the documentation isn't too explicit. I would like to present my apologies for not being able to provide here the complete solution, but enjoy it nevertheless!

By Nicolas Duminil CORE

Janus vs. MediaSoup: The Ultimate Guide To Choosing Your WebRTC Server

When building real-time multimedia applications, the choice of server technology is pivotal. Two big players in this space are Janus and MediaSoup, both enabling WebRTC capabilities but doing so in distinctly different ways. This comprehensive guide aims to provide a deep dive into the architecture, code examples, and key differentiators of each, helping you make an informed choice for your next project. The Role of a WebRTC Server Before diving into the specifics, let’s clarify what a WebRTC server does. Acting as the middleman in real-time web applications, a WebRTC server manages a plethora of tasks like signaling, NAT traversal, and media encoding/decoding. The choice of server can significantly affect the performance, scalability, and flexibility of your application. Janus: The General-Purpose WebRTC Gateway Janus is an open-source, general-purpose WebRTC gateway designed for real-time communication. Its modular architecture, broad protocol support, and extensive capabilities make it one of the most popular solutions in the realm of real-time multimedia applications. Janus serves as a bridge between different multimedia components, translating protocols and enabling varied real-time functionalities. It's not just restricted to WebRTC but also supports SIP, RTSP, and plain RTP, among other protocols. Janus can be extended using a plugin architecture, making it suitable for various use cases like streaming, video conferencing, and recording. Architecture Core Design Janus follows a modular architecture, acting as a gateway that can be customized using plugins. Plugins The real functionality of Janus is implemented via plugins, which can be loaded and unloaded dynamically. Pre-built plugins for popular tasks like SIP gateway or Video Room are available. API Layer Janus exposes APIs via HTTP and WebSocket. It communicates with clients through a JSON-based messaging format, offering a higher-level interface for applications. Media Engine Janus leans on GStreamer and libav for media-related tasks but isn't exclusively tied to them. You can use different engines if desired. Scalability Janus is designed for horizontal scalability, which means you can add more instances behind a load balancer to handle more connections. Session Management Janus maintains user sessions and allows for complex state management. Code Example Here’s a simplified JavaScript snippet using Janus.js to create a video room: JavaScript var janus = new Janus({ server: "wss://your-janus-instance", success: function() { janus.attach({ plugin: "janus.plugin.videoroom", success: function(pluginHandle) { pluginHandle.send({ message: { request: "join", room: 1234 } }); }, onmessage: function(msg, jsep) { // Handle incoming messages and media } }); } }); Advantages Modular and extensible design. A rich ecosystem of pre-built plugins. Wide protocol support. Horizontal scalability. Disadvantages Steeper learning curve due to its modular nature. The primary language is C, which may not be preferred for web services. MediaSoup: The WebRTC Specialist MediaSoup is an open-source WebRTC Selective Forwarding Unit (SFU) that specializes in delivering a highly efficient, low-latency server-side WebRTC engine. Designed for simplicity, performance, and scalability, MediaSoup is often the go-to choice for building cutting-edge, real-time video conferencing solutions. In this article, we’ll dive deep into what MediaSoup is, its architecture, design, advantages, and disadvantages, topped off with a code snippet to get you started. MediaSoup serves as a powerful WebRTC SFU, facilitating real-time transport of video, audio, and data between multiple clients. Its primary focus is on achieving low latency, high efficiency, and performance in multi-party communication scenarios. Unlike some other solutions that attempt to be protocol-agnostic, MediaSoup is a WebRTC specialist. Architecture Worker Processes MediaSoup utilizes a multi-core architecture with workers running in separate Node.js processes, taking full advantage of modern CPU capabilities. Routers Within each worker process, routers manage the media streams. They decide which media streams to forward to which connected clients. Transports Transports handle the underlying communication layer, dealing with DTLS, ICE, and other transport protocols essential for WebRTC. Producers and Consumers MediaSoup uses producers to represent incoming media streams and consumers for outgoing media streams. Code Example On the server-side using Node.js: JavaScript const mediaSoup = require("mediasoup"); const mediaSoupWorker = mediaSoup.createWorker(); let router; (async () => { const mediaCodecs = [{ kind: "audio", mimeType: "audio/opus", clockRate: 48000 }]; router = await mediaSoupWorker.createRouter({ mediaCodecs }); })(); On the client-side: JavaScript const device = new Device(); await device.load({ routerRtpCapabilities: router.rtpCapabilities }); const sendTransport = await device.createSendTransport(transportOptions); Advantages Low-latency and high-efficiency. Vertical scalability. Modern C++ and JavaScript codebase. Disadvantages Limited to WebRTC. Requires you to build higher-level features. Architectural Comparisons Modularity vs. focus: Janus is modular, enabling a wide range of functionalities through plugins. MediaSoup, however, offers a more streamlined and focused architecture. API and usability: Janus provides a higher-level API exposed through HTTP and WebSocket, while MediaSoup offers a more programmatic, lower-level API. Scalability: Janus focuses on horizontal scalability, whereas MediaSoup is optimized for vertical scalability within a single server. Session management: Janus manages sessions internally, while MediaSoup expects this to be managed by the application layer. Protocol support: Janus supports multiple protocols, but MediaSoup is specialized for WebRTC. Conclusion: Making the Right Choice Both Janus and MediaSoup are robust and capable servers but serve different needs: Choose Janus if you want a modular, highly extensible solution that can handle a variety of real-time communication needs and protocols. Choose MediaSoup if your primary concern is performance and low latency in a WebRTC-centric environment. Understanding their architectural differences, advantages, and disadvantages will help you align your choice with your project’s specific needs. Whether it's the modular and expansive ecosystem of Janus or the high-performance, WebRTC-focused architecture of MediaSoup, knowing what each offers can equip you to make a well-informed decision.

By Nilesh Savani

How To Deploy Helidon Application to Kubernetes With Kubernetes Maven Plugin

In this article, we delve into the exciting realm of containerizing Helidon applications, followed by deploying them effortlessly to a Kubernetes environment. To achieve this, we'll harness the power of JKube’s Kubernetes Maven Plugin, a versatile tool for Java applications for Kubernetes deployments that has recently been updated to version 1.14.0. What's exciting about this release is that it now supports the Helidon framework, a Java Microservices gem open-sourced by Oracle in 2018. If you're curious about Helidon, we've got some blog posts to get you up to speed: Building Microservices With Oracle Helidon Ultra-Fast Microservices: When MicroStream Meets Helidon Helidon: 2x Productivity With Microprofile REST Client In this article, we will closely examine the integration between JKube’s Kubernetes Maven Plugin and Helidon. Here's a sneak peek of the exciting journey we'll embark on: We'll kick things off by generating a Maven application from Helidon Starter Transform your Helidon application into a nifty Docker image. Craft Kubernetes YAML manifests tailored for your Helidon application. Apply those manifests to your Kubernetes cluster. We'll bundle those Kubernetes YAML manifests into a Helm Chart. We'll top it off by pushing that Helm Chart to a Helm registry. Finally, we'll deploy our Helidon application to Red Hat OpenShift. An exciting aspect worth noting is that JKube’s Kubernetes Maven Plugin can be employed with previous versions of Helidon projects as well. The only requirement is to provide your custom image configuration. With this latest release, Helidon users can now easily generate opinionated container images. Furthermore, the plugin intelligently detects project dependencies and seamlessly incorporates Kubernetes health checks into the generated manifests, streamlining the deployment process. Setting up the Project You can either use an existing Helidon project or create a new one from Helidon Starter. If you’re on JDK 17 use 3.x version of Helidon. Otherwise, you can stick to Helidon 2.6.x which works with older versions of Java. In the starter form, you can choose either Helidon SE or Helidon Microprofile, choose application type, and fill out basic details like project groupId, version, and artifactId. Once you’ve set your project, you can add JKube’s Kubernetes Maven Plugin to your pom.xml: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> </plugin> Also, the plugin version is set to 1.14.0, which is the latest version at the time of writing. You can check for the latest version on the Eclipse JKube releases page. It’s not really required to add the plugin if you want to execute it directly from some CI pipeline. You can just provide a fully qualified name of JKube’s Kubernetes Maven Plugin while issuing some goals like this: Shell $ mvn org.eclipse.jkube:kubernetes-maven-plugin:1.14.0:resource Now that we’ve added the plugin to the project, we can start using it. Creating Container Image (JVM Mode) In order to build a container image, you do not need to provide any sort of configuration. First, you need to build your project. Shell $ mvn clean install Then, you just need to run k8s:build goal of JKube’s Kubernetes Maven Plugin. By default, it builds the image using the Docker build strategy, which requires access to a Docker daemon. If you have access to a docker daemon, run this command: Shell $ mvn k8s:build If you don’t have access to any docker daemon, you can also build the image using the Jib build strategy: Shell $ mvn k8s:build -Djkube.build.strategy=jib You will notice that Eclipse JKube has created an opinionated container image for your application based on your project configuration. Here are some key points about JKube’s Kubernetes Maven Plugin to observe in this zero configuration mode: It used quay.io/jkube/jkube-java as a base image for the container image It added some labels to the container image (picked from pom.xml) It exposed some ports in the container image based on the project configuration It automatically copied relevant artifacts and libraries required to execute the jar in the container environment. Creating Container Image (Native Mode) In order to create a container image for the native executable, we need to generate the native executable first. In order to do that, let’s build our project in the native-image profile (as specified in Helidon GraalVM Native Image documentation): Shell $ mvn package -Pnative-image This creates a native executable file in the target folder of your project. In order to create a container image based on this executable, we just need to run k8s:build goal but also specify native-image profile: Shell $ mvn k8s:build -Pnative-image Like JVM mode, Eclipse JKube creates an opinionated container image but uses a lightweight base image: registry.access.redhat.com/ubi8/ubi-minimal and exposes only the required ports by application. Customizing Container Image as per Requirements Creating a container image with no configuration is a really nice way to get started. However, it might not suit everyone’s use case. Let’s take a look at how to configure various aspects of the generated container image. You can override basic aspects of the container image with some properties like this: Property Name Description jkube.generator.name Change Image Name jkube.generator.from Change Base Image jkube.generator.tags A comma-separated value of additional tags for the image If you want more control, you can provide a complete XML configuration for the image in the plugin configuration section: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>${jkube.version}</version> <configuration> <images> <image> <name>${project.artifactId}:${project.version}</name> <build> <from>openjdk:11-jre-slim</from> <ports>8080</ports> <assembly> <mode>dir</mode> <targetDir>/deployments</targetDir> <layers> <layer> <id>lib</id> <fileSets> <fileSet> <directory>${project.basedir}/target/libs</directory> <outputDirectory>libs</outputDirectory> <fileMode>0640</fileMode> </fileSet> </fileSets> </layer> <layer> <id>app</id> <files> <file> <source>${project.basedir}/target/${project.artifactId}.jar</source> <outputDirectory>.</outputDirectory> </file> </files> </layer> </layers> </assembly> <cmd>java -jar /deployments/${project.artifactId}.jar</cmd> </build> </image> </images> </configuration> </plugin> The same is also possible by providing your own Dockerfile in the project base directory. Kubernetes Maven Plugin automatically detects it and builds a container image based on its content: Dockerfile FROM openjdk:11-jre-slim COPY maven/target/helidon-quickstart-se.jar /deployments/ COPY maven/target/libs /deployments/libs CMD ["java", "-jar", "/deployments/helidon-quickstart-se.jar"] EXPOSE 8080 Pushing the Container Image to Quay.io: Once you’ve built a container image, you most likely want to push it to some public or private container registry. Before pushing the image, make sure you’ve renamed your image to include the registry name and registry user. If I want to push an image to Quay.io in the namespace of a user named rokumar, this is how I would need to rename my image: Shell $ mvn k8s:build -Djkube.generator.name=quay.io/rokumar/%a:%v %a and %v correspond to project artifactId and project version. For more information, you can check the Kubernetes Maven Plugin Image Configuration documentation. Once we’ve built an image with the correct name, the next step is to provide credentials for our registry to JKube’s Kubernetes Maven Plugin. We can provide registry credentials via the following sources: Docker login Local Maven Settings file (~/.m2/settings.xml) Provide it inline using jkube.docker.username and jkube.docker.password properties Once you’ve configured your registry credentials, you can issue the k8s:push goal to push the image to your specified registry: Shell $ mvn k8s:push Generating Kubernetes Manifests In order to generate opinionated Kubernetes manifests, you can use k8s:resource goal from JKube’s Kubernetes Maven Plugin: Shell $ mvn k8s:resource It generates Kubernetes YAML manifests in the target directory: Shell $ ls target/classes/META-INF/jkube/kubernetes helidon-quickstart-se-deployment.yml helidon-quickstart-se-service.yml JKube’s Kubernetes Maven Plugin automatically detects if the project contains io.helidon:helidon-health dependency and adds liveness, readiness, and startup probes: YAML $ cat target/classes/META-INF/jkube/kubernetes//helidon-quickstart-se-deployment. yml | grep -A8 Probe livenessProbe: failureThreshold: 3 httpGet: path: /health/live port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 -- readinessProbe: failureThreshold: 3 httpGet: path: /health/ready port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 Applying Kubernetes Manifests JKube’s Kubernetes Maven Plugin provides k8s:apply goal that is equivalent to kubectl apply command. It just applies the resources generated by k8s:resource in the previous step. Shell $ mvn k8s:apply Packaging Helm Charts Helm has established itself as the de facto package manager for Kubernetes. You can package generated manifests into a Helm Chart and apply it on some other cluster using Helm CLI. You can generate a Helm Chart of generated manifests using k8s:helm goal. The interesting thing is that JKube’s Kubernetes Maven Plugin doesn’t rely on Helm CLI for generating the chart. Shell $ mvn k8s:helm You’d notice Helm Chart is generated in target/jkube/helm/ directory: Shell $ ls target/jkube/helm/helidon-quickstart-se/kubernetes Chart.yaml helidon-quickstart-se-0.0.1-SNAPSHOT.tar.gz README.md templates values.yaml Pushing Helm Charts to Helm Registries Usually, after generating a Helm Chart locally, you would want to push it to some Helm registry. JKube’s Kubernetes Maven Plugin provides k8s:helm-push goal for achieving this task. But first, we need to provide registry details in plugin configuration: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> <configuration> <helm> <snapshotRepository> <name>ChartMuseum</name> <url>http://example.com/api/charts</url> <type>CHARTMUSEUM</type> <username>user1</username> </snapshotRepository> </helm> </configuration> </plugin> JKube’s Kubernetes Maven Plugin supports pushing Helm Charts to ChartMuseum, Nexus, Artifactory, and OCI registries. You have to provide the applicable Helm repository type and URL. You can provide the credentials via environment variables, properties, or ~/.m2/settings.xml. Once you’ve all set up, you can run k8s:helm-push goal to push chart: Shell $ mvn k8s:helm-push -Djkube.helm.snapshotRepository.password=yourpassword Deploying To Red Hat OpenShift If you’re deploying to Red Hat OpenShift, you can use JKube’s OpenShift Maven Plugin to deploy your Helidon application to an OpenShift cluster. It contains some add-ons specific to OpenShift like S2I build strategy, support for Routes, etc. You also need to add the JKube’s OpenShift Maven Plugin plugin to your pom.xml. Maybe you can add it in a separate profile: XML <profile> <id>openshift</id> <build> <plugins> <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>openshift-maven-plugin</artifactId> <version>${jkube.version}</version> </plugin> </plugins> </build> </profile> Then, you can deploy the application with a combination of these goals: Shell $ mvn oc:build oc:resource oc:apply -Popenshift Conclusion In this article, you learned how smoothly you can deploy your Helidon applications to Kubernetes using Eclipse JKube’s Kubernetes Maven Plugin. We saw how effortless it is to package your Helidon application into a container image and publish it to some container image registry. We can alternatively generate Helm Charts of our Kubernetes YAML manifests and publish Helm Charts to some Helm registry. In the end, we learned about JKube’s OpenShift Maven Plugin, which is specifically designed for Red Hat OpenShift users who want to deploy their Helidon applications to Red Hat OpenShift. You can find the code used in this blog post in this GitHub repository. In case you’re interested in knowing more about Eclipse JKube, you can check these links: Documentation Github Issue Tracker StackOverflow YouTube Channel Twitter Gitter Chat

By Rohan Kumar

Building AI Applications With Java and Gradle

Artificial intelligence (AI) is transforming various industries and changing the way businesses operate. Although Python is often regarded as the go-to language for AI development, Java provides robust libraries and frameworks that make it an equally strong contender for creating AI-based applications. In this article, we explore using Java and Gradle for AI development by discussing popular libraries, providing code examples, and demonstrating end-to-end working examples. Java Libraries for AI Development Java offers several powerful libraries and frameworks for building AI applications, including: Deeplearning4j (DL4J) - A deep learning library for Java that provides a platform for building, training, and deploying neural networks, DL4J supports various neural network architectures and offers GPU acceleration for faster computations. Weka - A collection of machine learning algorithms for data mining tasks, Weka offers tools for data pre-processing, classification, regression, clustering, and visualization. Encog - A machine learning framework supporting various advanced algorithms, including neural networks, support vector machines, genetic programming, and Bayesian networks Setting up Dependencies With Gradle To begin AI development in Java using Gradle, set up the required dependencies in your project by adding the following to your build.gradle file: Groovy dependencies { implementation 'org.deeplearning4j:deeplearning4j-core:1.0.0-M1.1' implementation 'nz.ac.waikato.cms.weka:weka-stable:3.8.5' implementation 'org.encog:encog-core:3.4' } Code Examples Building a Simple Neural Network With DL4J This example demonstrates creating a basic neural network using the Deeplearning4j (DL4J) library. The code sets up a two-layer neural network architecture consisting of a DenseLayer with 4 input neurons and 10 output neurons, using the ReLU activation function, and an OutputLayer with 10 input neurons and 3 output neurons, using the Softmax activation function and Negative Log Likelihood as the loss function. The model is then initialized and can be further trained on data and used for predictions. Java import org.deeplearning4j.nn.api.OptimizationAlgorithm; import org.deeplearning4j.nn.conf.MultiLayerConfiguration; import org.deeplearning4j.nn.conf.NeuralNetConfiguration; import org.deeplearning4j.nn.conf.layers.DenseLayer; import org.deeplearning4j.nn.conf.layers.OutputLayer; import org.deeplearning4j.nn.multilayer.MultiLayerNetwork; import org.deeplearning4j.nn.weights.WeightInit; import org.nd4j.linalg.activations.Activation; import org.nd4j.linalg.learning.config.Sgd; import org.nd4j.linalg.lossfunctions.LossFunctions; public class SimpleNeuralNetwork { public static void main(String[] args) { MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(123) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .updater(new Sgd(0.01)) .list() .layer(0, new DenseLayer.Builder().nIn(4).nOut(10) .weightInit(WeightInit.XAVIER) .activation(Activation.RELU) .build()) .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .nIn(10).nOut(3) .weightInit(WeightInit.XAVIER) .activation(Activation.SOFTMAX) .build()) .pretrain(false).backprop(true) .build(); MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); } } Classification Using Weka This example shows how to use the Weka library for classification on the Iris dataset. The code loads the dataset from an ARFF file, sets the class attribute (the attribute we want to predict) to be the last attribute in the dataset, builds a Naive Bayes classifier using the loaded data, and classifies a new instance. Java import weka.classifiers.bayes.NaiveBayes; import weka.core.Instance; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; public class WekaClassification { public static void main(String[] args) throws Exception { DataSource source = new DataSource("data/iris.arff"); Instances data = source.getDataSet(); data.setClassIndex(data.numAttributes() - 1); NaiveBayes nb = new NaiveBayes(); nb.buildClassifier(data); Instance newInstance = data.instance(0); double result = nb.classifyInstance(newInstance); System.out.println("Predicted class: " + data.classAttribute().value((int) result)); } } Conclusion Java, with its rich ecosystem of libraries and frameworks for AI development, is a viable choice for building AI-based applications. By leveraging popular libraries like Deeplearning4j, Weka, and Encog, and using Gradle as the build tool, developers can create powerful AI solutions using the familiar Java programming language. The provided code examples demonstrate the ease of setting up and configuring AI applications using Java and Gradle. The DL4J example shows how to create a basic deep learning model that can be applied to tasks such as image recognition or natural language processing. The Weka example demonstrates how to use Java and the Weka library for machine learning tasks, specifically classification, which can be valuable for implementing machine learning solutions in Java applications, such as predicting customer churn or classifying emails as spam or not spam. Happy Learning!!

By Arun Pandey

Exploring Apache Ignite With Spring Boot

For the use cases that I am going to describe here, I have 2 services: courses-service basically provides CRUD operations for dealing with courses and instructors. reviews-service is another CRUD operations provider for dealing with reviews for courses that are totally agnostic of courses from courses-service. Both apps are written in Kotlin using Spring Boot and other libraries. Having these 2 services, we are going to discuss distributed caching with Apache Ignite and Spring Boot, and we’ll see how we can use code-deployment to invoke remote code execution via Apache Ignite on a service. Spoiler alert: The examples/usecases presented here are designed purely for the sake of demonstrating integration with some of Apache Ignite’s capabilities; the discussed problems here can be solved in various ways and maybe even in better ways, so don’t spend too much on thinking “why." So, without further ado, let’s dive into the code. Note: here is the source code in case you want to follow along. Simple Distributed Caching We’ll focus on the courses-service for now, having this entity: Java @Entity @Table(name = "courses") class Course( var name: String, @Column(name = "programming_language") var programmingLanguage: String, @Column(name = "programming_language_description", length = 3000, nullable = true) var programmingLanguageDescription: String? = null, @Enumerated(EnumType.STRING) var category: Category, @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "instructor_id") var instructor: Instructor? = null ) : AbstractEntity() { override fun toString(): String { return "Course(id=$id, name='$name', category=$category)" } } And this method in CourseServiceImpl: Java @Transactional override fun save(course: Course): Course { return courseRepository.save(course) } I want to enhance every course that is saved with a programming language description for the programming language that has been sent by the user. For this, I created a Wikipedia API client that will make the following request every time a new course is added. Plain Text GET https://en.wikipedia.org/api/rest_v1/page/summary/java_(programming_language) So, my method looks like this now: Java @Transactional override fun save(course: Course): Course { enhanceWithProgrammingLanguageDescription(course) return courseRepository.save(course) } private fun enhanceWithProgrammingLanguageDescription(course: Course) { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { course.programmingLanguageDescription = it.summary } } That’s great. Now here comes our use case: we want to cache the Wikipedia response so we don’t call the Wikipedia API every time. Our courses will be mostly oriented to a set of popular programming languages like Java, Kotlin, C#, and other popular programming languages. We don’t want to decrease our save’s performance querying every time for mostly the same language. Also, this can act as a guard in case the API server is down. Time to introduce Apache Ignite! Apache Ignite is a distributed database for high-performance computing with in-memory speed. Data in Ignite is stored in-memory and/or on-disk, and is either partitioned or replicated across a cluster of multiple nodes. This provides scalability, performance, and resiliency. You can read about lots of places where Apache Ignite is the appropriate solution and about all the advantages on their FAQ page. When it comes to integrating a Spring Boot app with Apache Ignite (embedded), it is quite straightforward and simple, but – there is a but – it has its corner cases that we are going to discuss, especially when you want, let’s say, Java 17 code deployment or Spring Data. There are a few ways of configuring Apache Ignite, via XML or the programmatic way. I picked the programmatic way of configuring Apache Ignite. Here are the dependencies: Groovy implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-spring-boot-autoconfigure-ext:1.0.0") Here is the configuration that we are going to add to courses-service: Java @Configuration @Profile("!test") @EnableConfigurationProperties(value = [IgniteProperties::class]) class IgniteConfig(val igniteProperties: IgniteProperties) { @Bean(name = ["igniteInstance"]) fun igniteInstance(ignite: Ignite): Ignite { return ignite } @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes } } private fun configureDiscovery(): TcpDiscoverySpi { val spi = TcpDiscoverySpi() var ipFinder: TcpDiscoveryIpFinder? = null; if (igniteProperties.discovery.tcp.enabled) { ipFinder = TcpDiscoveryMulticastIpFinder() ipFinder.setMulticastGroup(DFLT_MCAST_GROUP) } else if (igniteProperties.discovery.kubernetes.enabled) { ipFinder = TcpDiscoveryKubernetesIpFinder() ipFinder.setNamespace(igniteProperties.discovery.kubernetes.namespace) ipFinder.setServiceName(igniteProperties.discovery.kubernetes.serviceName) } spi.setIpFinder(ipFinder) return spi } } First, as you might have noticed, there is the IgniteProperties class that I created in order to allow flexible configuration based on the profile. In my case, local is going to be multicast discovery, and on prod, it will be Kubernetes discovery, but this class is not mandatory. Java @ConstructorBinding @ConfigurationProperties(prefix = "ignite") data class IgniteProperties( val instanceName: String, val discovery: DiscoveryProperties = DiscoveryProperties() ) @ConstructorBinding data class DiscoveryProperties( val tcp: TcpProperties = TcpProperties(), val kubernetes: KubernetesProperties = KubernetesProperties() ) @ConstructorBinding data class TcpProperties( val enabled: Boolean = false, val host: String = "localhost" ) data class KubernetesProperties( val enabled: Boolean = false, val namespace: String = "evil-inc", val serviceName: String = "course-service" ) And here are its corresponding values from application.yaml: YAML ignite: instanceName: ${spring.application.name}-server-${random.uuid} discovery: tcp: enabled: true host: localhost kubernetes: enabled: false namespace: evil-inc service-name: course-service Then we define a bean name igniteInstance, which is going to be our main entry point for all Ignite APIs. Via the provided IgniteConfigurer from ignite-spring-boot-autoconfigure-ext:1.0.0, we start the configuration of our igniteInstance, and provide a name that is picked up from the properties. Then we configure the discovery service provider interface via TcpDiscoverySpi. As I mentioned earlier, based on the properties provided I will either use the TcpDiscoveryMulticastIpFinder or the TcpDiscoveryKubernetesIpFinder. With this, our basic configuration is done, and we can start it! Not so fast! Apache Ignite is backed by an H2 in-memory database, and being in the Spring Boot realm, you’ll get it automatically. This is as much of a blessing as it is a curse because Ignite supports only a specific H2 version and we need to declare it explicitly in our build.gradle like this: Groovy ext["h2.version"] = "1.4.197" Also, if you’re like me running on Java 17, you might’ve gotten this exception: Plain Text Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.ignite.IgniteJdbcThinDriver To address this exception, we have to add the following VM arguments to our run configuration: Plain Text --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED Now we can start it! Plain Text INFO 11116 --- [4-6ceb9d7d547b%] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=2, locNode=9087c6ef, servers=1, clients=0, state=ACTIVE, CPUs=16, offheap=6.3GB, heap=4.0GB … INFO 11116 --- [4-6ceb9d7d547b%] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=1, online=1, offline=0] INFO 32076 --- [ main] o.a.i.s.c.tcp.TcpCommunicationSpi : Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=8, selectorSpins=0, pairedConn=false] INFO 32076 --- [ main] o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=84e5553d-a7a9-46d9-a98c-81f34bf84673] Once you see this log, Ignite is up and running, The topology snapshot states that there is one server running, and no clients, and we can see that the discovery/communication took place by binding to ports 47100/47500. Also, in the logs, you might’ve observed some warnings like these. Let’s see how we can get rid of them: 1. Plain Text ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options) Add the following VM argument: -XX:MaxDirectMemorySize=256m 2. Plain Text ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options) Add the following VM arguments: -Xms512m -Xmx2g 3. Plain Text Metrics for local node (to disable set 'metricsLogFrequency' to 0) This one is not really an issue and it might be very convenient during development, but at the moment it just spams the logs which I don’t like, so we’re going to disable it by adding this line in our configure: igniteConfiguration.setMetricsLogFrequency(0) 4. Plain Text Message queue limit is set to 0 which may lead to potential OOMEs This one is complaining about the parameter that is responsible for the limit of incoming and outgoing messages which has the default value to 0 which in other words is limitless. So we are going to set a limit by configuring the TcpCommunicationSpi like this: Java igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) private fun configureTcpCommunicationSpi(): TcpCommunicationSpi { val tcpCommunicationSpi = TcpCommunicationSpi() tcpCommunicationSpi.setMessageQueueLimit(1024) return tcpCommunicationSpi } Okay, now that everything is set up we can move on. Let’s configure a cache in IgniteConfig class and see how we can fix our Wikipedia responses caching problem. In Apache Ignite we can configure a cache at the configuration level, or in runtime (in runtime, you can use a template for that, too). For this demo, I’ll show you how we can configure it in the configuration. Java @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setDiscoverySpi(configureDiscovery()) igniteConfiguration.setMetricsLogFrequency(0) igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg } } Again our entry point for configuring Ignite is IgniteConfiguration - igniteConfiguration.setCacheConfiguration. This line accepts a variety of CacheConfiguration(s). Java private fun wikipediaSummaryCacheConfiguration(): CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary> { val wikipediaCache = CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary>(WIKIPEDIA_SUMMARIES) wikipediaCache.setIndexedTypes(String::class.java, WikipediaApiClientImpl.WikipediaSummary::class.java) wikipediaCache.setEagerTtl(true) wikipediaCache.setCacheMode(CacheMode.REPLICATED) wikipediaCache.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC) wikipediaCache.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL) wikipediaCache.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(Duration(TimeUnit.MINUTES, 60))) return wikipediaCache } wikipediaSummaryCacheConfiguration returns a CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary>: as per our requirement, one Wikipedia summary per programming language. This class defines grid cache configuration. It defines all configuration parameters required to start a cache within a grid instance. Now let’s see how we configured it: setIndexedTypes(): This function is used to specify an array of key and value types that will be indexed. setEagerTtl(): By setting this to true, Ignite will proactively remove cache entries that have expired. setExpiryPolicyFactory(): This configuration sets the cache to expire entries after 60 minutes. setCacheMode(): When you choose the REPLICATED mode, all keys are distributed to every participating node. The default mode is PARTITIONED, where keys are divided into partitions and distributed among nodes. You can also control the number of backup copies using setBackups(), and specify the partition loss policy. setWriteSynchronizationMode(): This flag determines whether Ignite will wait for write or commit responses from other nodes. The default is PRIMARY_SYNC, where Ignite waits for the primary node to complete the write or commit but not for backups to update. setAtomicityMode(): Setting this to TRANSACTIONAL enables fully ACID-compliant transactions for key-value operations. In contrast, ATOMIC mode disables distributed transactions and locking, providing higher performance but sacrificing transactional features. Having this configuration, all that’s left is to adjust our enhanceWithProgrammingLanguageDescription method to cache fetched Wikipedia summaries: Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val summaries = igniteInstance.cache<String, WikipediaApiClientImpl.WikipediaSummary>(WIKIPEDIA_SUMMARIES) log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") summaries[course.programmingLanguage]?.let { log.debug("Cache value found, using cache's response $it to update $course programming language description") course.programmingLanguageDescription = it.summary } ?: wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") summaries.putIfAbsent(course.programmingLanguage, it) it }?.let { course.programmingLanguageDescription = it.summary } } Basically, we are using the bean of the Ignite instance to retrieve our configured cache. Each instance is a member and/or client in an Apache Ignite cluster. After getting a hold of the replicated cache, it is a matter of some simple checks: if we have a summary for the programming language key in our map, then we use that one. If not, we fetch it from the Wikipedia API, add it to the map, and use it. Now let’s see it in action. If we execute the following HTTP request: Plain Text ### POST http://localhost:8080/api/v1/courses Content-Type: application/json { "name": "C++ Development", "category": "TUTORIAL", "programmingLanguage" : "C++", "instructor": { "name": "Bjarne Stroustrup" } } We’ll see in the logs: Plain Text DEBUG 32076 --- [nio-8080-exec-1] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(0)] DEBUG 32076 --- [nio-8080-exec-1] i.e.c.s.i.CourseServiceImpl$Companion : No cache value found, using wikipedia's response We retrieved the previously configured cache for Wikipedia summaries, but its size is 0. Therefore, the update took place using Wikipedia’s API. Now if we are to execute the same request again, we’ll notice a different behavior: Plain Text DEBUG 32076 --- [nio-8080-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(1)] DEBUG 32076 --- [nio-8080-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Cache value found, using cache's response… Now the cache has size 1, and since it was populated by our previous request, no request to Wikipedia’s API can be observed. However, what truly highlights the elegance and ease of Apache Ignite's integration is when we launch another instance of our application on a different port using the -Dserver.port=8060 option. This is when we can see the replicated cache mechanism in action. Plain Text INFO 37600 --- [ main] o.a.i.s.c.tcp.TcpCommunicationSpi : Successfully bound communication NIO server to TCP port [port=47101, locHost=0.0.0.0/0.0.0.0, selectorsCnt=8, selectorSpins=0, pairedConn=false] INFO 37600 --- [ main] o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Successfully bound to TCP port [port=47501, localHost=0.0.0.0/0.0.0.0, locNodeId=4770d2ff-2979-4b4b-8d0e-30565aeff75e] INFO 37600 --- [1-d0db3c4f0d78%] a.i.i.p.c.d.d.p.GridDhtPartitionDemander : Starting rebalance routine [WIKIPEDIA_SUMMARIES] INFO 37600 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=6, locNode=4770d2ff, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=13.0GB, heap=4.0GB... INFO 37600 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=2, online=2, offline=0] We see that our TcpDiscoveryMulticastIpFinder discovered an already running Apache Ignite node on ports 47100/47500 running together with our first courses-service instance on port 8080. Therefore, additionally, a new cluster connection is established on ports 47101/47501. This triggers the rebalancing routine for our cache. In the end, we observe in the topology log line that the number of servers now is 2. Now if we are to make a new HTTP request to create the same course on 8060 instance, we’ll see the following: Plain Text DEBUG 37600 --- [nio-8060-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(1)] DEBUG 37600 --- [nio-8060-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Cache value found, using cache's response So, we used the same cache which has the size 1, and no requests to Wikipedia’s API were made. As you might think, the same goes if we are to make some requests on 8060 for another language: the cache being populated will be seen on 8080 on request for that language, too. Spring Data Support A quite surprising feature that comes with Apache Ignite is the Spring Data support, which allows us to interact with our cache in a more elegant/familiar way. The Spring Data framework offers a widely adopted API that abstracts the underlying data storage from the application layer. Apache Ignite seamlessly integrates with Spring Data by implementing the Spring Data CrudRepository interface. This integration further enhances the flexibility and adaptability of our application's data layer. Let’s configure it by adding the following dependency: Groovy implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") Let’s declare our repository, by extending the IgniteRepository. Java @Repository @RepositoryConfig(cacheName = WIKIPEDIA_SUMMARIES) interface WikipediaSummaryRepository : IgniteRepository<WikipediaApiClientImpl.WikipediaSummary, String> Having both Ignite’s Spring Data support and Spring Data JPA on the classpath might generate some bean scanning issues, which we can address by specifically instructing both the JPA and Ignite where to look for their beans like this: Java @EnableIgniteRepositories(basePackages = ["inc.evil.coursecatalog.ignite"]) @EnableJpaRepositories( basePackages = ["inc.evil.coursecatalog.repo"], excludeFilters = [ComponentScan.Filter(type = FilterType.ANNOTATION, value = [RepositoryConfig::class])] ) Having such a configuration, we ensure that Ignite will scan for its repositories only in the Ignite package, JPA will scan for its repositories only in the repo package, and will exclude any classes that have the @RepositoryConfig on them. Now let’s refactor our CourseServiceImpl so it will use the newly created WikipediaSummaryRepository: Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val summaries = wikipediaSummaryRepository.cache() log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") wikipediaSummaryRepository.findById(course.programmingLanguage).orElseGet { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") wikipediaSummaryRepository.save(course.programmingLanguage, it) it } }?.let { course.programmingLanguageDescription = it.summary } } Instead of interacting directly with the low-level cache/map, we've transitioned to directing our requests to a new high-level class called WikipediaSummaryRepository. This approach is not only more elegant in the implementation/usage, but also resonates much better with Spring fans, doesn't it? Also, you might’ve noticed that we no longer need the igniteInstance to access the cache. The repository can give it to us via .cache() method, so even if we use the IgniteRepository we don’t lose access to our cache and its low-level operations. If we are to play with it in the same manner as we did with the cache, we’ll notice that the behavior didn’t change. But wait, there’s more! Integration with Spring Data brings an abundance of advantages: query abstraction/query generation, manual queries, pagination/sorting, projections, query with Cache.Entry return type or entity-like type – you name it – and IgniteRepository will have it. For this purpose, I will experiment with the CommandLineRunner since I don’t expose any API to integrate directly with the WikipediaSummaryRepository. First, let’s write some queries: Java @Repository @RepositoryConfig(cacheName = WIKIPEDIA_SUMMARIES) interface WikipediaSummaryRepository : IgniteRepository<WikipediaSummary, String> { fun findByTitle(title: String): List<WikipediaSummary> fun findByDescriptionContains(keyword: String): List<Cache.Entry<String, WikipediaSummary>> @Query(value = "select description, count(description) as \"count\" from WIKIPEDIA_SUMMARIES.WIKIPEDIASUMMARY group by description") fun countPerDescription(): List<CountPerProgrammingLanguageType> interface CountPerProgrammingLanguageType { fun getDescription(): String fun getCount(): Int } } And here is the CommandLineRunner: Java @Bean fun init(client: WikipediaApiClient, repo: WikipediaSummaryRepository): CommandLineRunner = CommandLineRunner { run { client.fetchSummaryFor("Java programming language")?.let { repo.save("Java", it) } client.fetchSummaryFor("Kotlin programming language")?.let { repo.save("Kotlin", it) } client.fetchSummaryFor("C++")?.let { repo.save("C++", it) } client.fetchSummaryFor("Python programming language")?.let { repo.save("C#", it) } client.fetchSummaryFor("Javascript")?.let { repo.save("Javascript", it) } repo.findAll().forEach { log.info("Fetched {}", it) } repo.findByTitle("Kotlin").forEach { log.info("Fetched by title {}", it) } repo.findByDescriptionContains("programming language").forEach { log.info(" Fetched by description {}", it) } repo.countPerDescription().forEach { log.info("Count per description {}", it) } } } Before we can run it we’ll have to adjust a bit our cached entity like this: Java @JsonIgnoreProperties(ignoreUnknown = true) data class WikipediaSummary( @JsonProperty("title") @QuerySqlField(name = "title", index = true) val title: String, @JsonProperty("description") @QuerySqlField(name = "description", index = false) val description: String, @JsonProperty("extract") @QuerySqlField(name = "summary", index = false) val summary: String ) You might notice the @QuerySqlField on each of the fields, all fields that will be involved in SQL clauses must have this annotation. This annotation is needed in order to instruct Ignite to create a column for each of our fields; otherwise, it will create a single huge column containing our payload. This is a bit intrusive, but that is a small price to pay for the plethora of possibilities we gain. Now once we run it, we have the following log line: Plain Text INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Fetched WikipediaSummary(title=Python (programming language)… … INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Fetched by description Entry [key=C#, val=WikipediaSummary(title=Python (programming language)… … INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Count per description {count=1, description=General-purpose programming language derived from Java} … This proves that our implementation works as expected. Note: If you want to connect to connect to the Ignite’s in-memory database during your research, you might stumble on this VM argument: -DIGNITE_H2_DEBUG_CONSOLE=true. I wanted to mention that the Ignite team deprecated IGNITE_H2_DEBUG_CONSOLE in 2.8 version in favor of their thin JDBC driver. So if you want to connect to the DB, please refer to the updated documentation, but long story short: the JDBC URL is jdbc:ignite:thin://127.0.0.1/ with the default port 10800, and IntelliJ provides first-class support in their database datasources. Distributed Locks Another useful feature that comes with Apache Ignite is the API for distributed locks. Suppose our enhanceWithProgrammingLanguageDescription method is a slow intensive operation dealing with cache and other resources, and we wouldn’t want other threads on the same instance or even other requests from a different instance to interfere or alter something until the operation is complete. Here comes IgniteLock into play: this interface offers a comprehensive API for managing distributed reentrant locks, similar to java.util.concurrent.ReentrantLock. You can create instances of these locks using Ignite's reentrantLock() method. IgniteLock provides protection from node failures via the failoverSafe flag when set to true: the lock will automatically recover. If the owning node fails, ensure uninterrupted lock management across the cluster. On the other hand, if failoverSafe is set to false, a node failure will result in an IgniteException, rendering the lock unusable. So with this in mind let’s try and guard our so-called “critical section." Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val lock = igniteInstance.reentrantLock(SUMMARIES_LOCK, true, true, true) if (!lock.tryLock()) throw LockAcquisitionException(SUMMARIES_LOCK, "enhanceWithProgrammingLanguageDescription") log.debug("Acquired lock {}", lock) Thread.sleep(2000) val summaries = wikipediaSummaryRepository.cache() log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") wikipediaSummaryRepository.findById(course.programmingLanguage).orElseGet { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") wikipediaSummaryRepository.save(course.programmingLanguage, it) it } }?.let { course.programmingLanguageDescription = it.summary } lock.unlock() } As you can see, the implementation is quite simple: we obtain the lock via the igniteInstance’s reentrantLock method and then we try locking it with tryLock(). The locking will succeed if the acquired lock is available or already held by the current thread, and it will immediately return true. Otherwise, it will return false and a LockAcquisitionException will be thrown. Then we simulate some intensive work by sleeping for 2 seconds with Thread.sleep(2000), and in the end, we release the acquired lock with unlock(). Now if we run a single instance of our app on port 8080 and try 2 subsequent requests, one will pass and the other one will fail: Plain Text ERROR 36580 --- [nio-8080-exec-2] e.c.w.r.e.RESTExceptionHandler$Companion : Exception while handling request [summaries-lock] could not be acquired for [enhanceWithProgrammingLanguageDescription] operation. Please try again. inc.evil.coursecatalog.common.exceptions.LockAcquisitionException: [summaries-lock] could not be acquired for [enhanceWithProgrammingLanguageDescription] operation. Please try again. The same goes if we are to make 1 request to an 8080 instance of our app and the next one in the 2-second timeframe to the 8060 instance - the first request will succeed while the second one will fail. Code Deployment Now let’s switch our attention to reviews-service, and remember – this service is totally unaware of courses: it is just a way to add reviews for some course_id. With this in mind, we have this entity: Java @Table("reviews") data class Review( @Id var id: Int? = null, var text: String, var author: String, @Column("created_at") @CreatedDate var createdAt: LocalDateTime? = null, @LastModifiedDate @Column("last_modified_at") var lastModifiedAt: LocalDateTime? = null, @Column("course_id") var courseId: Int? = null ) And we have this method in ReviewServiceImpl. So, our new silly feature request would be to somehow check for the existence of the course that the review has been written for. How can we do that? The most obvious choice would be to invoke a REST endpoint on courses-service to check if we have a course for the review’s course_id, but that is not what this article is about. We have Apache Ignite, right? We are going to invoke code from course-service from reviews-service via Ignite’s cluster. To do that, we need to create some kind of API or Gateway module that we are going to publish as an artifact so courses-service can implement it and reviews-service can depend on and use it to invoke the code. Okay - first things first: let’s design the new module as a courses-api module: Groovy plugins { id("org.springframework.boot") version "2.7.3" id("io.spring.dependency-management") version "1.0.13.RELEASE" kotlin("jvm") version "1.6.21" kotlin("plugin.spring") version "1.6.21" kotlin("plugin.jpa") version "1.3.72" `maven-publish` } group = "inc.evil" version = "0.0.1-SNAPSHOT" repositories { mavenCentral() } publishing { publications { create<MavenPublication>("maven") { groupId = "inc.evil" artifactId = "courses-api" version = "1.1" from(components["java"]) } } } dependencies { implementation("org.springframework.boot:spring-boot-starter-actuator") implementation("org.springframework.boot:spring-boot-starter-web") implementation("com.fasterxml.jackson.module:jackson-module-kotlin") implementation("org.jetbrains.kotlin:kotlin-reflect") implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8") implementation("org.jetbrains.kotlinx:kotlinx-coroutines-rx2:1.6.4") implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.4") implementation("org.apache.commons:commons-lang3:3.12.0") implementation("org.apache.ignite:ignite-core:2.15.0") testImplementation("org.junit.jupiter:junit-jupiter-api:5.8.1") testRuntimeOnly("org.junit.jupiter:junit-jupiter-engine:5.8.1") } tasks.getByName<Test>("test") { useJUnitPlatform() } Nothing fancy here, except the maven-publish plugin that we’ll use to publish the artifact to the local Maven repository. Here is the interface that courses-service will implement, and reviews-service will use: Java interface CourseApiFacade: Service { companion object { const val COURSE_API_FACADE_SERVICE_NAME = "CourseApiFacade" } fun findById(id: Int): CourseApiResponse } data class InstructorApiResponse( val id: Int?, val name: String?, val summary: String?, val description: String? ) data class CourseApiResponse( val id: Int?, val name: String, val category: String, val programmingLanguage: String, val programmingLanguageDescription: String?, val createdAt: String, val updatedAt: String, val instructor: InstructorApiResponse ) You might’ve noticed that CourseApiFacade extends org.apache.ignite.services.Service interface – an instance of grid-managed service, our entry point in the services that may be deployed. Having this module properly configured, we can add it as a dependency in courses-service: Groovy implementation(project(":courses-api")) And implement the exposed interface like this: Java @Component class CourseApiFacadeImpl : CourseApiFacade { @Transient @SpringResource(resourceName = "courseService") lateinit var courseService: CourseServiceImpl @Transient @IgniteInstanceResource //spring constructor injection won't work since ignite is not ready lateinit var igniteInstance: Ignite companion object { private val log: Logger = LoggerFactory.getLogger(this::class.java) } override fun findById(id: Int): CourseApiResponse = courseService.findById(id).let { CourseApiResponse( id = it.id, name = it.name, category = it.category.toString(), programmingLanguage = it.programmingLanguage, programmingLanguageDescription = it.programmingLanguageDescription, createdAt = it.createdAt.toString(), updatedAt = it.updatedAt.toString(), instructor = InstructorApiResponse(it.instructor?.id, it.instructor?.name, it.instructor?.summary, it.instructor?.description) ) } override fun cancel() { log.info("Canceling service") } override fun init() { log.info("Before deployment :: Pre-initializing service before execution on node {}", igniteInstance.cluster().forLocal().node()) } override fun execute() { log.info("Deployment :: The service is deployed on grid node {}", igniteInstance.cluster().forLocal().node()) } } As you can see, CourseFacadeImpl implements CourseFacade method findById and overrides some methods from the Service interface for debugging purposes. When a service is deployed on a cluster node, Ignite will invoke the execute() method of that service. Likewise, when a deployed service is canceled, Ignite will automatically invoke the cancel() method of that service. init() is guaranteed to be called before execute(). Also, there are some new annotations: @SpringResource(resourceName = "courseService") - Annotates a field or a setter method for injection of resources from Spring ApplicationContext. Since this is IgniteService now, we need to let Ignite take care of the bean injections. resourceName is a mandatory field that is equal to the bean name in the Spring applicationContext. @IgniteInstanceResource – Again, since this is going to be deployed, we can’t rely on Spring anymore for the auto-wiring, so Ignite offers this annotation that offers the possibility to inject an igniteInstance into grid tasks and grid jobs. @Transient/transient in java – This annotation/keyword makes sure that we don’t serialize unnecessary hierarchies of objects in the cluster. For everything mentioned above to work, we have to slightly modify our build.gradle dependencies for Ignite. Groovy implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-spring:2.15.0") implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") We got rid of ignite-spring-boot-autoconfigure in favor of ignite-spring, since I couldn’t make Ignite aware of the Spring’s application context with the autoconfiguration. As you might’ve guessed, since we don’t have IgniteAutoConfiguration anymore, we have to write the Igniteconfiguration manually, but don’t you worry: they are quite similar. Here’s the updated IgniteConfig in courses-service: Java @Configuration @Profile("!test") @EnableConfigurationProperties(value = [IgniteProperties::class]) @EnableIgniteRepositories(basePackages = ["inc.evil.coursecatalog.ignite"]) class IgniteConfig( val igniteProperties: IgniteProperties, val applicationContext: ApplicationContext ) { companion object { const val WIKIPEDIA_SUMMARIES = "WIKIPEDIA_SUMMARIES" } @Bean(name = ["igniteInstance"]) fun igniteInstance(igniteConfiguration: IgniteConfiguration): Ignite { return IgniteSpring.start(igniteConfiguration, applicationContext) } @Bean fun igniteConfiguration(): IgniteConfiguration { val igniteConfiguration = IgniteConfiguration() igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg return igniteConfiguration } } Not that much of a change, right? Instead of IgniteConfigurer we declared a bean named IgniteConfiguration that takes care of our configuration. We injected the applicationContext in our config so we can pass it in the rewritten igniteInstance bean that now is a Spring-aware IgniteSpring. Now that we’ve updated our configuration, we’ll have to tell Ignite about our new IgniteService – CourseApiFacade. Java @Bean fun igniteConfiguration(): IgniteConfiguration { val igniteConfiguration = IgniteConfiguration() igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setPeerClassLoadingEnabled(true) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg igniteConfiguration.setServiceConfiguration(courseApiFacadeConfiguration()) //vararg return igniteConfiguration } private fun courseApiFacadeConfiguration(): ServiceConfiguration { val serviceConfiguration = ServiceConfiguration() serviceConfiguration.service = courseApiFacade serviceConfiguration.name = CourseApiFacade.COURSE_API_FACADE_SERVICE_NAME serviceConfiguration.maxPerNodeCount = 1 return serviceConfiguration } We create a ServiceConfiguration which is bound to courseApiFacade with the name from the exposed interface in courses-api, and with a setting stating one service per node, lastly we set courseApiFacadeConfiguration in the IgniteConfiguration. Now back to reviews-service. First of all, we want to add the required dependencies for Apache Ignite, since reviews-service is much simpler and doesn’t need the Spring-aware Ignite. We’ll go with ignite-spring-boot-autoconfigure here: Groovy implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-spring-boot-autoconfigure-ext:1.0.0") implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") Also, previously I mentioned that we are going to use that interface from courses-api. We can run the publishMavenPublicationToMavenLocal gradle task on courses-api to get our artifact published and then we can add the following dependency to reviews-service. Groovy implementation("inc.evil:courses-api:1.1") Now we need to configure Ignite here as well as we did previously in courses-service: Java @Configuration @EnableConfigurationProperties(value = [IgniteProperties::class]) @EnableIgniteRepositories(basePackages = ["inc.evil.reviews.ignite"]) class IgniteConfig(val igniteProperties: IgniteProperties) { @Bean(name = ["igniteInstance"]) fun igniteInstance(ignite: Ignite): Ignite { return ignite } @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setClientMode(true) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes } } private fun configureTcpCommunicationSpi(): TcpCommunicationSpi { val tcpCommunicationSpi = TcpCommunicationSpi() tcpCommunicationSpi.setMessageQueueLimit(1024) return tcpCommunicationSpi } private fun configureDiscovery(): TcpDiscoverySpi { val spi = TcpDiscoverySpi() var ipFinder: TcpDiscoveryIpFinder? = null; if (igniteProperties.discovery.tcp.enabled) { ipFinder = TcpDiscoveryMulticastIpFinder() ipFinder.setMulticastGroup(DFLT_MCAST_GROUP) } else if (igniteProperties.discovery.kubernetes.enabled) { ipFinder = TcpDiscoveryKubernetesIpFinder() ipFinder.setNamespace(igniteProperties.discovery.kubernetes.namespace) ipFinder.setServiceName(igniteProperties.discovery.kubernetes.serviceName) } spi.setIpFinder(ipFinder) return spi } } The only difference from courses-service is that reviews-service will run in client mode. Other than that, everything is the same. Okay, with Ignite properly configured, it is time to make use of our IgniteService from courses-service in reviews-service. For this purpose, I created this class: Java @Component class IgniteCoursesGateway(private val igniteInstance: Ignite) { fun findCourseById(id: Int) = courseApiFacade().findById(id) private fun courseApiFacade(): CourseApiFacade { return igniteInstance.services(igniteInstance.cluster().forServers()) .serviceProxy(CourseApiFacade.COURSE_API_FACADE_SERVICE_NAME, CourseApiFacade::class.java, false) } } IgniteCoursesGateway is an entry point in the courses domain world via the Ignite cluster. Via the autowired igniteInstance, we retrieve a serviceProxy of type CourseApiFacade for the name COURSE_API_FACADE_SERVICE_NAME. We also tell Ignite to always try to load-balance between services by setting the sticky flag to false. Then in the findCourseById(), we simply use the obtained serviceProxy to query by id for the desired course. All that’s left is to use IgniteCoursesGateway in ReviewServiceImpl to fulfill the feature’s requirements. Java override suspend fun save(review: Review): Review { runCatching { igniteCoursesGateway.findCourseById(review.courseId!!).also { log.info("Call to ignite ended with $it") } }.onFailure { log.error("Oops, ignite remote execution failed due to ${it.message}", it) } .getOrNull() ?: throw NotFoundException(CourseApiResponse::class, "course_id", review.courseId.toString()) return reviewRepository.save(review).awaitFirst() } The logic is as follows: before saving, we try to find the course by course_id from review by invoking the findCourseById in our Ignite cluster. If we have an exception (CourseApiFacadeImpl will throw a NotFoundException if the requested course was not found), we swallow it and throw a reviews-service NotFoundException stating that the course could’ve not been retrieved. If a course was returned by our method we proceed to save it – that’s it. Now let’s restart course-service and observe the logs: Plain Text INFO 23372 --- [a-67c579c6ea47%] i.e.c.f.i.CourseApiFacadeImpl$Companion : Before deployment :: Pre-initializing service before execution on node TcpDiscoveryNode … INFO 23372 --- [a-67c579c6ea47%] o.a.i.i.p.s.IgniteServiceProcessor : Starting service instance [name=CourseApiFacade, execId=52de6edc-ac6f-49d4-8c9e-17d6a6ebc8d5] INFO 23372 --- [a-67c579c6ea47%] i.e.c.f.i.CourseApiFacadeImpl$Companion : Deployment :: The service is deployed on grid node TcpDiscoveryNode … We see that according to our overridden methods of the Service interface, CourseApiFacade was successfully deployed. Now we have courses-service running, and if we are to start reviews-service, we’ll see the following log: Plain Text INFO 13708 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=2, locNode=cb90109d, servers=1, clients=1, state=ACTIVE, CPUs=16, offheap=6.3GB, heap=4.0GB... INFO 13708 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=1, online=1, offline=0] You may notice that we have 1 server running and 1 client. Now let’s try a request to add a review for an existing course (reviews-service is using GraphQL). Plain Text GRAPHQL http://localhost:8070/graphql Content-Type: application/graphql mutation { createReview(request: {text: "Amazing, loved it!" courseId: 39 author: "Mike Scott"}) { id text author courseId createdAt lastModifiedAt } } In the logs, we’ll notice: Plain Text INFO 13708 --- [actor-tcp-nio-1] i.e.r.s.i.ReviewServiceImpl$Companion : Call to ignite ended with CourseApiResponse(id=39, name=C++ Development, category=TUTORIAL … And in the courses-service logs, we’ll notice the code execution: Plain Text DEBUG 29316 --- [2-64cc57b09c89%] i.e.c.c.aop.LoggingAspect$Companion : before :: execution(public inc.evil.coursecatalog.model.Course inc.evil.coursecatalog.service.impl.CourseServiceImpl.findById(int)) This means that the request was executed successfully. If we try the same request for a non-existent course - let’s say, for ID 999, we’ll observe the NotFoundException in reviews-service. Plain Text WARN 33188 --- [actor-tcp-nio-1] .w.g.e.GraphQLExceptionHandler$Companion : Exception while handling request: CourseApiResponse with course_id equal to [999] could not be found! Conclusion Alright, everyone, that's a wrap! I trust you now have a good grasp of what Apache Ignite is all about. We delved into designing a simple distributed cache using Ignite and Spring Boot, explored Ignite's Spring Data Support, distributed locks for guarding critical sections of code, and, finally, witnessed how Apache Ignite's code deployment can execute code within the cluster.Once again, if you missed it, you can access all the code we discussed in the link at the beginning of this article. Happy coding!

By Ion Pascari

Building Real-Time Applications to Process Wikimedia Streams Using Kafka and Hazelcast

In this tutorial, developers, solution architects, and data engineers can learn how to build high-performance, scalable, and fault-tolerant applications that react to real-time data using Kafka and Hazelcast. We will be using Wikimedia as a real-time data source. Wikimedia provides various streams and APIs (Application Programming Interfaces) to access real-time data about edits and changes made to their projects. For example, this source provides a continuous stream of updates on recent changes, such as new edits or additions to Wikipedia articles. Developers and solution architects often use such streams to monitor and analyze the activity on Wikimedia projects in real-time or to build applications that rely on this data, like this tutorial. Kafka is great for event streaming architectures, continuous data integration (ETL), and messaging systems of record (database). Hazelcast is a unified real-time stream data platform that enables instant action on data in motion by combining stream processing and a fast data store for low-latency querying, aggregation, and stateful computation against event streams and traditional data sources. It allows you to build resource-efficient, real-time applications quickly. You can deploy it at any scale from small edge devices to a large cluster of cloud instances. In this tutorial, we will guide you through setting up and integrating Kafka and Hazelcast to enable real-time data ingestion and processing for reliable streaming processing. By the end, you will have a deep understanding of how to leverage the combined capabilities of Hazelcast and Kafka to unlock the potential of streaming processing and instant action for your applications. So, let's get started! Wikimedia Event Streams in Motion First, let’s understand what we are building: Most of us use or read Wikipedia, so let’s use Wikipedia's recent changes as an example. Wikipedia receives changes from multiple users in real time, and these changes contain details about the change such as title, request_id, URI, domain, stream, topic, type, user, topic, title_url, bot, server_name, and parsedcomment. We will read recent changes from Wikimedia Event Streams. Event Streams is a web service that exposes streams of structured event data in real time. It does it over HTTP with chunked transfer encoding in accordance with the Server-Sent Events protocol (SSE). Event Streams can be accessed directly through HTTP, but they are more often used through a client library. An example of this is a “recentchange”. But what if you want to process or enrich changes in real time? For example, what if you want to determine if a recent change is generated by a bot or human? How can you do this in real time? There are actually multiple options, but here we’ll show you how to use Kafka to transport data and how to use Hazelcast for real-time stream processing for simplicity and performance. Here’s a quick diagram of the data pipeline architecture: Prerequisites If you are new to Kafka or you’re just getting started, I recommend you start with Kafka Documentation. If you are new to Hazelcast or you’re just getting started, I recommend you start with Hazelcast Documentation. For Kafka, you need to download Kafka, start the environment, create a topic to store events, write some events to your topic, and finally read these events. Here’s a Kafka Quick Start. For Hazelcast, you can use either the Platform or the Cloud. I will use a local cluster. Step #1: Start Kafka Run the following commands to start all services in the correct order: Markdown # Start the ZooKeeper service $ bin/zookeeper-server-start.sh config/zookeeper.properties Open another terminal session and run: Markdown # Start the Kafka broker service $ bin/kafka-server-start.sh config/server.properties Once all services have successfully launched, you will have a basic Kafka environment running and ready to use. Step #2: Create a Java Application Project The pom.xml should include the following dependencies in order to run Hazelcast and connect to Kafka: XML <dependencies> <dependency> <groupId>com.hazelcast</groupId> <artifactId>hazelcast</artifactId> <version>5.3.1</version> </dependency> <dependency> <groupId>com.hazelcast.jet</groupId> <artifactId>hazelcast-jet-kafka</artifactId> <version>5.3.1</version> </dependency> </dependencies> Step #3: Create a Wikimedia Publisher Class Basically, the class reads from a URL connection, creates a Kafka Producer, and sends messages to a Kafka topic: Java public static void main(String[] args) throws Exception { String topicName = "events"; URLConnection conn = new URL ("https://stream.wikimedia.org/v2/stream/recentchange").openConnection(); BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8)); try (KafkaProducer<Long, String> producer = new KafkaProducer<>(kafkaProps())) { for (long eventCount = 0; ; eventCount++) { String event = reader.readLine(); producer.send(new ProducerRecord<>(topicName, eventCount, event)); System.out.format("Published '%s' to Kafka topic '%s'%n", event, topicName); Thread.sleep(20 * (eventCount % 20)); } } } private static Properties kafkaProps() { Properties props = new Properties(); props.setProperty("bootstrap.servers", "127.0.0.1:9092"); props.setProperty("key.serializer", LongSerializer.class.getCanonicalName()); props.setProperty("value.serializer", StringSerializer.class.getCanonicalName()); return props; } Step #4: Create a Main Stream Processing Class This class creates a pipeline that reads from a Kafka source using the same Kafka topic, and then it filters out messages that were created by bots (bot:true), keeping only messages created by humans. It sends the output to a logger: Java public static void main(String[] args) { Pipeline p = Pipeline.create(); p.readFrom(KafkaSources.kafka(kafkaProps(), "events")) .withNativeTimestamps(0) .filter(event-> Objects.toString(event.getValue()).contains("bot\":false")) .writeTo(Sinks.logger()); JobConfig cfg = new JobConfig().setName("kafka-traffic-monitor"); HazelcastInstance hz = Hazelcast.bootstrappedInstance(); hz.getJet().newJob(p, cfg); } private static Properties kafkaProps() { Properties props = new Properties(); props.setProperty("bootstrap.servers", "127.0.0.1:9092"); props.setProperty("key.deserializer", LongDeserializer.class.getCanonicalName()); props.setProperty("value.deserializer", StringDeserializer.class.getCanonicalName()); props.setProperty("auto.offset.reset", "earliest"); return props; } Step #5: Enriching a Stream If you want to enrich real-time messages with batch or static data such as location details, labels, or some features, you can follow the next step: Create a Hazelcast Map and load static data into it. Use the Map to enrich the Message stream using mapUsingIMap. Conclusion In this post, we explained how to build a real-time application to process Wikimedia streams using Kafka and Hazelcast. Hazelcast allows you to quickly build resource-efficient, real-time applications. You can deploy it at any scale, from small-edge devices to a large cluster of cloud instances. A cluster of Hazelcast nodes shares the data storage and computational load, which can dynamically scale up and down. Referring to the Wikimedia example, it means that this solution is reliable, even when there are significantly higher volumes of users making changes to Wikimedia. We look forward to your feedback and comments about this blog post!

By Fawaz Ghali, PhD

Automate Migration Assessment With XML Linter

When people think of linting, the first thing that comes to mind is usually static code analysis for programming languages, but rarely for markup languages. In this article, I would like to share how our team developed ZK Client MVVM Linter, an XML linter that automates migration assessment for our new Client MVVM feature in the upcoming ZK 10 release. The basic idea is to compile a catalog of known compatibility issues as lint rules to allow users to assess the potential issues flagged by the linter before committing to the migration. For those unfamiliar with ZK, ZK is a Java framework for building enterprise applications; ZUL (ZK User Interface Markup Language) is its XML-based language for simplifying user interface creation. Through sharing our experience developing ZK Client MVVM Linter, we hope XML linters can find broader applications. File Parsing The Problem Like other popular linters, our ZUL linter starts by parsing source code into AST (abstract syntax tree). Although Java provides several libraries for XML parsing, they lose the original line and column numbers of elements in the parsing process. As the subsequent analysis stage will need this positional information to report compatibility issues precisely, our first task is to find a way to obtain and store the original line and column numbers in AST. How We Address This After exploring different online sources, we found a Stack Overflow solution that leverages the event-driven property of SAX Parser to store the end position of each start tag in AST. Its key observation was that the parser invokes the startElement method whenever it encounters the ending ‘>’ character. Therefore, the parser position returned by the locator must be equivalent to the end position of the start tag, making the startElement method the perfect opportunity for creating new AST nodes and storing their end positions. Java public static Document parse(File file) throws Exception { Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); SAXParser parser = SAXParserFactory.newInstance().newSAXParser(); parser.parse(file, new DefaultHandler() { private Locator _locator; private final Stack<Node> _stack = new Stack<>(); @Override public void setDocumentLocator(Locator locator) { _locator = locator; _stack.push(document); } @Override public void startElement(String uri, String localName, String qName, Attributes attributes) { // Create a new AST node Element element = document.createElement(qName); for (int i = 0; i < attributes.getLength(); i++) element.setAttribute(attributes.getQName(i), attributes.getValue(i)); // Store its end position int lineNumber = _locator.getLineNumber(), columnNumber = _locator.getColumnNumber(); element.setUserData("position", lineNumber + ":" + columnNumber, null); _stack.push(element); } @Override public void endElement(String uri, String localName, String qName) { Node element = _stack.pop(); _stack.peek().appendChild(element); } }); return document; } Building on the solution above, we implemented a more sophisticated parser capable of storing the position of each attribute. Our parser uses the end positions returned by the locator as reference points to reduce the task into finding attribute positions relative to the end position. Initially, we started with a simple idea of iteratively finding and removing the last occurrence of each attribute-value pair from the buffer. For example, if <elem attr1="value" attr2="value"> ends at 3:34 (line 3: column 34), our parser will perform the following steps: Plain Text Initialize buffer = <elem attr1="value" attr2="value"> Find buffer.lastIndexOf("value") = 28 → Update buffer = <elem attr1="value" attr2=" Find buffer.lastIndexOf("attr2") = 21 → Update buffer = <elem attr1="value" Find buffer.lastIndexOf("value") = 14 → Update buffer = <elem attr1=" Find buffer.lastIndexOf("attr1") = 7 → Update buffer = <elem From steps 3 and 6, we can conclude that attr1 and attr2 start at 3:7 and 3:21, respectively. Then, we further improved the mechanism to handle other formatting variations, such as a single start tag across multiple lines and multiple start tags on a single line, by introducing the start index and leading space stack to store the buffer indices where new lines start and the number of leading spaces of each line. For example, if there is a start tag that starts from line 1 and ends at 3:20 (line 3: column 20): XML <elem attr1="value across 2 lines" attr2 = "value"> Our parser will perform the following steps: Plain Text Initialize buffer = <elem attr1="value across 2 lines" attr2 = "value"> Initialize startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4] Find buffer.lastIndexOf("value") = 45 Find buffer.lastIndexOf("attr2") = 36 → lineNumber = 3, startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4] → columnNumber = 36 - startIndexes.peek() + leadingSpaces.peek() = 5 Find buffer.lastIndexOf("value across 2 lines") = 14 Find buffer.lastIndexOf("attr1") = 7 → Update lineNumber = 1, startIndexes = [0], and leadingSpaces = [0] → columnNumber = 7 - startIndexes.peek() + leadingSpaces.peek() = 7 From steps 4 and 8, we can conclude that attr1 and attr2 start at 1:7 and 3:5, respectively. As a result of the code provided below: Java public void startElement(String uri, String localName, String qName, Attributes attributes) { // initialize buffer, startIndexes, and leadingSpaces int endLineNumber = _locator.getLineNumber(), endColNumber = _locator.getColumnNumber(); for (int i = 0; _readerLineNumber <= endLineNumber; i++, _readerLineNumber++) { startIndexes.push(buffer.length()); if (i > 0) _readerCurrentLine = _reader.readLine(); buffer.append(' ').append((_readerLineNumber < endLineNumber ? _readerCurrentLine : _readerCurrentLine.substring(0, endColNumber - 1)).stripLeading()); leadingSpaces.push(countLeadingSpaces(_readerCurrentLine)); } _readerLineNumber--; // recover attribute positions int lineNumber = endLineNumber, columnNumber; Element element = document.createElement(qName); for (int i = attributes.getLength() - 1; i >= 0; i--) { String[] words = attributes.getValue(i).split("\\s+"); for (int j = words.length - 1; j >= 0; j--) buffer.delete(buffer.lastIndexOf(words[j]), buffer.length()); buffer.delete(buffer.lastIndexOf(attributes.getQName(i)), buffer.length()); while (buffer.length() < startIndexes.peek()) { lineNumber--; leadingSpaces.pop(); startIndexes.pop(); } columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek(); Attr attr = document.createAttribute(attributes.getQName(i)); attr.setUserData("position", lineNumber + ":" + columnNumber, null); element.setAttributeNode(attr); } // recover element position buffer.delete(buffer.lastIndexOf(element.getTagName()), buffer.length()); while (buffer.length() < startIndexes.peek()) { lineNumber--; leadingSpaces.pop(); startIndexes.pop(); } columnNumber = leadingSpaces.peek() + buffer.length() - startIndexes.peek(); element.setUserData("position", lineNumber + ":" + columnNumber, null); _stack.push(element); } File Analysis Now that we have a parser that converts ZUL files into ASTs, we are ready to move on to the file analysis stage. Our ZulFileVisitor class encapsulates the AST traversal logic and delegates the responsibility of implementing specific checking mechanisms to its subclasses. This design allows lint rules to be easily created by extending the ZulFileVisitor class and overriding the visit method for the node type the lint rule needs to inspect. Java public class ZulFileVisitor { private Stack<Element> _currentPath = new Stack<>(); protected void report(Node node, String message) { System.err.println(node.getUserData("position") + " " + message); } protected void visit(Node node) { if (node.getNodeType() == Node.ELEMENT_NODE) { Element element = (Element) node; _currentPath.push(element); visitElement(element); NamedNodeMap attributes = element.getAttributes(); for (int i = 0; i < attributes.getLength(); i++) visitAttribute((Attr) attributes.item(i)); } NodeList children = node.getChildNodes(); for (int i = 0; i < children.getLength(); i++) visit(children.item(i)); if (node.getNodeType() == Node.ELEMENT_NODE) _currentPath.pop(); } protected void visitAttribute(Attr node) {} protected void visitElement(Element node) {} } Conclusion The Benefits For simple lint rules such as "row elements not supported," developing an XML linter may seem like an overkill when manual checks would suffice. However, as the codebase expands or the number of lint rules increases over time, the advantages of linting will quickly become noticeable compared to manual checks, which are both time-consuming and prone to human errors. Java class SimpleRule extends ZulFileVisitor { @Override protected void visitElement(Element node) { if ("row".equals(node.getTagName())) report(node, "`row` not supported"); } } On the other hand, complicated rules involving ancestor elements are where XML linters truly shine. Consider a lint rule that only applies to elements inside certain ancestor elements, such as "row elements not supported outside rows elements," our linter would be able to efficiently identify the infinite number of variations that satisfy the rule, which cannot be done manually or with a simple file search. Java class ComplexRule extends ZulFileVisitor { @Override protected void visitElement(Element node) { if ("row".equals(node.getTagName())) { boolean outsideRows = getCurrentPath().stream() .noneMatch(element -> "rows".equals(element.getTagName())); if (outsideRows) report(node, "`row` not supported outside `rows`"); } } } Now It's Your Turn Despite XML linting not being widely adopted in the software industry, we hope our ZK Client MVVM Linter, which helps us to automate migration assessment, will be able to show the benefits of XML linting or even help you to develop your own XML linter.

By Rebecca Lai

Can Redis Be Used as a Relational Database?

Let's start with the question, "How do you use Redis?" I'm sure most use it as a cache for the service. I hope you know that it can do more than just that. Recently, I spoke at a conference with a report on how we moved part of the data to Redis and requests fly to it in the first place. Now I want to tell you not about how we applied it, but about the fact that when working with Spring and its abstractions, you may not immediately notice the substitution. Let's try to write a small Spring app that will use two PostgreSQL and Redis databases. I want to note that we will store in the databases not some kind of flat object, but a full-fledged object from a relational database with nested fields (inner join). To do this, we need plugins that need to be installed in Redis such as RedisJSON and RediSearch. The first allows us to store our object in JSON format, and the second allows us to search by any field of our object, even nested fields. To work with a relational database, we will choose Spring Data JPA. And to work with Redis, we will use the excellent Redis OM Spring library, which allows you to work with the database at the abstraction level. This is an analog of Data JPA. Under the hood, Redis OM Spring has all the necessary dependencies for Spring and Jedis to work with the database. We will not dwell on the details, since the article is not about that. Let's Write Code So let's write code. Let's say we need to write a certain entity called "downtime" to the database. In this entity, I added other objects such as "place", "reason", and others. Entity for a relational database: Java @Entity @Table(schema = "test", name = "downtime") public class Downtime { @Id private String id; private LocalDateTime beginDate; private LocalDateTime endDate; @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "area") private Place area; @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "cause") private Cause cause; ... This piece of code does not need comments. We need to do the same for Redis. Object for Redis: Java @Document public class DowntimeDoc { @Id @Indexed private String id; @Indexed private LocalDateTime beginDate; private LocalDateTime endDate; @Indexed private PlaceDoc area; @Indexed private CauseDoc cause; .... In this case, instead of @Entity, we use @Document. This annotation indicates that our object is an entity. It will be stored in the database under the key “package path + class name + Idx.” The @Indexed annotation means that it will be indexed for search. If you do not specify this annotation, then this field will be saved in the database, but searching for it will return an empty result. You can add this annotation as needed. Data that is already in the database will be indexed asynchronously; new data will be indexed synchronously. Next, we will make a repository, which basically works to get data from the database. An example for a relational database: Java public interface DowntimeRepository extends JpaRepository<Downtime, String> { } Example for Redis: Java public interface DowntimeRedisRepository extends RedisDocumentRepository<DowntimeDoc, String> { } The difference is that we extend the current interface from RedisDocumentRepository, which extends the standard CRUD interface for Spring. Let's add a method to find the first downtime for the reason we specified. Java public interface DowntimeRepository extends JpaRepository<Downtime, String> { Downtime findFirstByCauseIdOrderByBeginDate(String causeId); } And the same for Redis: Java public interface DowntimeRedisRepository extends RedisDocumentRepository<DowntimeDoc, String> { DowntimeDoc findTopByCause_IdOrderByBeginDateAsc(String causeId); } As you noticed, if you write code working with the database through abstractions, then the difference is almost not noticeable. In addition, Redis OM Spring allows you to write queries yourself using the @Query annotation, as in Spring Data JPA. Here is an example of an HQL query: Java @Query("SELECT d FROM Downtime d" + " JOIN FETCH d.area " + " JOIN FETCH d.cause" + " JOIN FETCH d.fixer" + " JOIN FETCH d.area.level " + " WHERE d.area IN ?1 AND (d.beginDate BETWEEN ?2 AND ?3 OR d.cause IN ?4) ") List<Downtime> findAllByParams(List<Place> workPlace, LocalDateTime start, LocalDateTime end, List<Cause> causes); Same for Redis: Java @Query("(@area_id:{$areas} ) & (@beginDate:[$start $end] | @cause_id:{$causes})") Page<DowntimeDoc> findByParams(@Param("areas") List<String> areas, @Param("start") long start, @Param("end") long end, @Param("causes") List<String> causes, Pageable pageable); In the case of Redis, we simply specify the conditions for the “WHERE” section. It is not necessary to indicate which fields need to be attached since they are always pulled from the database. However, we can not pull up all the fields but specify with the additional “returnFields” parameter what exactly we need. You can also specify sorting, limit, and offset - the latter, by the way, is impossible in HQL. In this example, I passed Pageable to the method, and it will work at the database level, not pull all the data into the service, and trim it in it (as would be the case with Hibernate). Also, Redis OM Spring allows you to write queries using EntityStream, which is analogous to Stream API. Here is an example of the above queries using EntityStream. Java … entityStream .of(DowntimeDoc.class) .filter(DowntimeDoc$.AREA_ID.in(filter.getWorkPlace().toArray(String[]::new))) .filter(between + " | " + causes) .map(mapper::toEntity) .collect(Collectors.toList()); In this example, I'm using one filter using the metamodel, passing the parameters as a string to the second filter to show that both options are valid. You guessed it: EntityStream accepts a set of intermediate operations and executes this set when calling a terminal operation. Nuances of Redis OM Spring Let me tell you about some of the nuances of using Redis OM Spring: You will not be able to use a UUID as a primary key. You can specify it as a string and it will be indexed. But when searching, you will need to escape spaces @id: {2e5af82m\-02af\-553b\-7961\-168878aa521е} And one more thing: if you search through the RedisDocumentRepository repository, nothing will work, because there is such an expression in the code that will remove all screens: String regex = "(\\$" + key + ")(\\W+|\\*|\\+)(.*)"; Therefore, in order to search by such fields, you will have to write a query directly in RediSearch. I have an example of how to do this in the demo project. When searching through the RedisDocumentRepository methods, if you expect a collection, then you must pass a Pageable indicating the size of the expected rows or specifying the size in @Query; otherwise, you will receive a maximum of 10 records. The FT.SEARCH (@Query) method supports only one parameter for sorting. This is solved by writing a query through FT.AGGREGATE (@Aggregation). The above list is not exhaustive. While working with these libraries, I found many different things, but this is all just a specificity of the database implementation. Finally, I did not put information about Redis plugins in this article and did not talk about all the features of Redis OM Spring; otherwise, this article will be huge and not readable. Conclusion I showed that currently, Redis allows you to store an object with a large nesting and allows you to search through the fields of this object. If you are working with data through abstractions in the repository, then some may not see any difference from Spring Data JPA, especially if you use some simple queries like Save, delete, findAllBy, etc., as well as queries through the name of the method. Examples can be found on GitHub. All success.

By Artem Artemev

Open Source

DZone's Featured Open Source Resources

Top Open Source Experts

The Latest Open Source Topics