Tools Resources

DZone's Featured Tools Resources

Exploring Apache Ignite With Spring Boot

By Ion Pascari

For the use cases that I am going to describe here, I have 2 services: courses-service basically provides CRUD operations for dealing with courses and instructors. reviews-service is another CRUD operations provider for dealing with reviews for courses that are totally agnostic of courses from courses-service. Both apps are written in Kotlin using Spring Boot and other libraries. Having these 2 services, we are going to discuss distributed caching with Apache Ignite and Spring Boot, and we’ll see how we can use code-deployment to invoke remote code execution via Apache Ignite on a service. Spoiler alert: The examples/usecases presented here are designed purely for the sake of demonstrating integration with some of Apache Ignite’s capabilities; the discussed problems here can be solved in various ways and maybe even in better ways, so don’t spend too much on thinking “why." So, without further ado, let’s dive into the code. Note: here is the source code in case you want to follow along. Simple Distributed Caching We’ll focus on the courses-service for now, having this entity: Java @Entity @Table(name = "courses") class Course( var name: String, @Column(name = "programming_language") var programmingLanguage: String, @Column(name = "programming_language_description", length = 3000, nullable = true) var programmingLanguageDescription: String? = null, @Enumerated(EnumType.STRING) var category: Category, @ManyToOne(fetch = FetchType.LAZY) @JoinColumn(name = "instructor_id") var instructor: Instructor? = null ) : AbstractEntity() { override fun toString(): String { return "Course(id=$id, name='$name', category=$category)" } } And this method in CourseServiceImpl: Java @Transactional override fun save(course: Course): Course { return courseRepository.save(course) } I want to enhance every course that is saved with a programming language description for the programming language that has been sent by the user. For this, I created a Wikipedia API client that will make the following request every time a new course is added. Plain Text GET https://en.wikipedia.org/api/rest_v1/page/summary/java_(programming_language) So, my method looks like this now: Java @Transactional override fun save(course: Course): Course { enhanceWithProgrammingLanguageDescription(course) return courseRepository.save(course) } private fun enhanceWithProgrammingLanguageDescription(course: Course) { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { course.programmingLanguageDescription = it.summary } } That’s great. Now here comes our use case: we want to cache the Wikipedia response so we don’t call the Wikipedia API every time. Our courses will be mostly oriented to a set of popular programming languages like Java, Kotlin, C#, and other popular programming languages. We don’t want to decrease our save’s performance querying every time for mostly the same language. Also, this can act as a guard in case the API server is down. Time to introduce Apache Ignite! Apache Ignite is a distributed database for high-performance computing with in-memory speed. Data in Ignite is stored in-memory and/or on-disk, and is either partitioned or replicated across a cluster of multiple nodes. This provides scalability, performance, and resiliency. You can read about lots of places where Apache Ignite is the appropriate solution and about all the advantages on their FAQ page. When it comes to integrating a Spring Boot app with Apache Ignite (embedded), it is quite straightforward and simple, but – there is a but – it has its corner cases that we are going to discuss, especially when you want, let’s say, Java 17 code deployment or Spring Data. There are a few ways of configuring Apache Ignite, via XML or the programmatic way. I picked the programmatic way of configuring Apache Ignite. Here are the dependencies: Groovy implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-spring-boot-autoconfigure-ext:1.0.0") Here is the configuration that we are going to add to courses-service: Java @Configuration @Profile("!test") @EnableConfigurationProperties(value = [IgniteProperties::class]) class IgniteConfig(val igniteProperties: IgniteProperties) { @Bean(name = ["igniteInstance"]) fun igniteInstance(ignite: Ignite): Ignite { return ignite } @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes } } private fun configureDiscovery(): TcpDiscoverySpi { val spi = TcpDiscoverySpi() var ipFinder: TcpDiscoveryIpFinder? = null; if (igniteProperties.discovery.tcp.enabled) { ipFinder = TcpDiscoveryMulticastIpFinder() ipFinder.setMulticastGroup(DFLT_MCAST_GROUP) } else if (igniteProperties.discovery.kubernetes.enabled) { ipFinder = TcpDiscoveryKubernetesIpFinder() ipFinder.setNamespace(igniteProperties.discovery.kubernetes.namespace) ipFinder.setServiceName(igniteProperties.discovery.kubernetes.serviceName) } spi.setIpFinder(ipFinder) return spi } } First, as you might have noticed, there is the IgniteProperties class that I created in order to allow flexible configuration based on the profile. In my case, local is going to be multicast discovery, and on prod, it will be Kubernetes discovery, but this class is not mandatory. Java @ConstructorBinding @ConfigurationProperties(prefix = "ignite") data class IgniteProperties( val instanceName: String, val discovery: DiscoveryProperties = DiscoveryProperties() ) @ConstructorBinding data class DiscoveryProperties( val tcp: TcpProperties = TcpProperties(), val kubernetes: KubernetesProperties = KubernetesProperties() ) @ConstructorBinding data class TcpProperties( val enabled: Boolean = false, val host: String = "localhost" ) data class KubernetesProperties( val enabled: Boolean = false, val namespace: String = "evil-inc", val serviceName: String = "course-service" ) And here are its corresponding values from application.yaml: YAML ignite: instanceName: ${spring.application.name}-server-${random.uuid} discovery: tcp: enabled: true host: localhost kubernetes: enabled: false namespace: evil-inc service-name: course-service Then we define a bean name igniteInstance, which is going to be our main entry point for all Ignite APIs. Via the provided IgniteConfigurer from ignite-spring-boot-autoconfigure-ext:1.0.0, we start the configuration of our igniteInstance, and provide a name that is picked up from the properties. Then we configure the discovery service provider interface via TcpDiscoverySpi. As I mentioned earlier, based on the properties provided I will either use the TcpDiscoveryMulticastIpFinder or the TcpDiscoveryKubernetesIpFinder. With this, our basic configuration is done, and we can start it! Not so fast! Apache Ignite is backed by an H2 in-memory database, and being in the Spring Boot realm, you’ll get it automatically. This is as much of a blessing as it is a curse because Ignite supports only a specific H2 version and we need to declare it explicitly in our build.gradle like this: Groovy ext["h2.version"] = "1.4.197" Also, if you’re like me running on Java 17, you might’ve gotten this exception: Plain Text Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.ignite.IgniteJdbcThinDriver To address this exception, we have to add the following VM arguments to our run configuration: Plain Text --add-opens=jdk.management/com.sun.management.internal=ALL-UNNAMED --add-opens=java.base/jdk.internal.misc=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED --add-opens=jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED --add-opens=java.base/sun.reflect.generics.reflectiveObjects=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED Now we can start it! Plain Text INFO 11116 --- [4-6ceb9d7d547b%] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=2, locNode=9087c6ef, servers=1, clients=0, state=ACTIVE, CPUs=16, offheap=6.3GB, heap=4.0GB … INFO 11116 --- [4-6ceb9d7d547b%] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=1, online=1, offline=0] INFO 32076 --- [ main] o.a.i.s.c.tcp.TcpCommunicationSpi : Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=8, selectorSpins=0, pairedConn=false] INFO 32076 --- [ main] o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=84e5553d-a7a9-46d9-a98c-81f34bf84673] Once you see this log, Ignite is up and running, The topology snapshot states that there is one server running, and no clients, and we can see that the discovery/communication took place by binding to ports 47100/47500. Also, in the logs, you might’ve observed some warnings like these. Let’s see how we can get rid of them: 1. Plain Text ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options) Add the following VM argument: -XX:MaxDirectMemorySize=256m 2. Plain Text ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options) Add the following VM arguments: -Xms512m -Xmx2g 3. Plain Text Metrics for local node (to disable set 'metricsLogFrequency' to 0) This one is not really an issue and it might be very convenient during development, but at the moment it just spams the logs which I don’t like, so we’re going to disable it by adding this line in our configure: igniteConfiguration.setMetricsLogFrequency(0) 4. Plain Text Message queue limit is set to 0 which may lead to potential OOMEs This one is complaining about the parameter that is responsible for the limit of incoming and outgoing messages which has the default value to 0 which in other words is limitless. So we are going to set a limit by configuring the TcpCommunicationSpi like this: Java igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) private fun configureTcpCommunicationSpi(): TcpCommunicationSpi { val tcpCommunicationSpi = TcpCommunicationSpi() tcpCommunicationSpi.setMessageQueueLimit(1024) return tcpCommunicationSpi } Okay, now that everything is set up we can move on. Let’s configure a cache in IgniteConfig class and see how we can fix our Wikipedia responses caching problem. In Apache Ignite we can configure a cache at the configuration level, or in runtime (in runtime, you can use a template for that, too). For this demo, I’ll show you how we can configure it in the configuration. Java @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setDiscoverySpi(configureDiscovery()) igniteConfiguration.setMetricsLogFrequency(0) igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg } } Again our entry point for configuring Ignite is IgniteConfiguration - igniteConfiguration.setCacheConfiguration. This line accepts a variety of CacheConfiguration(s). Java private fun wikipediaSummaryCacheConfiguration(): CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary> { val wikipediaCache = CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary>(WIKIPEDIA_SUMMARIES) wikipediaCache.setIndexedTypes(String::class.java, WikipediaApiClientImpl.WikipediaSummary::class.java) wikipediaCache.setEagerTtl(true) wikipediaCache.setCacheMode(CacheMode.REPLICATED) wikipediaCache.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_ASYNC) wikipediaCache.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL) wikipediaCache.setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(Duration(TimeUnit.MINUTES, 60))) return wikipediaCache } wikipediaSummaryCacheConfiguration returns a CacheConfiguration<String, WikipediaApiClientImpl.WikipediaSummary>: as per our requirement, one Wikipedia summary per programming language. This class defines grid cache configuration. It defines all configuration parameters required to start a cache within a grid instance. Now let’s see how we configured it: setIndexedTypes(): This function is used to specify an array of key and value types that will be indexed. setEagerTtl(): By setting this to true, Ignite will proactively remove cache entries that have expired. setExpiryPolicyFactory(): This configuration sets the cache to expire entries after 60 minutes. setCacheMode(): When you choose the REPLICATED mode, all keys are distributed to every participating node. The default mode is PARTITIONED, where keys are divided into partitions and distributed among nodes. You can also control the number of backup copies using setBackups(), and specify the partition loss policy. setWriteSynchronizationMode(): This flag determines whether Ignite will wait for write or commit responses from other nodes. The default is PRIMARY_SYNC, where Ignite waits for the primary node to complete the write or commit but not for backups to update. setAtomicityMode(): Setting this to TRANSACTIONAL enables fully ACID-compliant transactions for key-value operations. In contrast, ATOMIC mode disables distributed transactions and locking, providing higher performance but sacrificing transactional features. Having this configuration, all that’s left is to adjust our enhanceWithProgrammingLanguageDescription method to cache fetched Wikipedia summaries: Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val summaries = igniteInstance.cache<String, WikipediaApiClientImpl.WikipediaSummary>(WIKIPEDIA_SUMMARIES) log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") summaries[course.programmingLanguage]?.let { log.debug("Cache value found, using cache's response $it to update $course programming language description") course.programmingLanguageDescription = it.summary } ?: wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") summaries.putIfAbsent(course.programmingLanguage, it) it }?.let { course.programmingLanguageDescription = it.summary } } Basically, we are using the bean of the Ignite instance to retrieve our configured cache. Each instance is a member and/or client in an Apache Ignite cluster. After getting a hold of the replicated cache, it is a matter of some simple checks: if we have a summary for the programming language key in our map, then we use that one. If not, we fetch it from the Wikipedia API, add it to the map, and use it. Now let’s see it in action. If we execute the following HTTP request: Plain Text ### POST http://localhost:8080/api/v1/courses Content-Type: application/json { "name": "C++ Development", "category": "TUTORIAL", "programmingLanguage" : "C++", "instructor": { "name": "Bjarne Stroustrup" } } We’ll see in the logs: Plain Text DEBUG 32076 --- [nio-8080-exec-1] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(0)] DEBUG 32076 --- [nio-8080-exec-1] i.e.c.s.i.CourseServiceImpl$Companion : No cache value found, using wikipedia's response We retrieved the previously configured cache for Wikipedia summaries, but its size is 0. Therefore, the update took place using Wikipedia’s API. Now if we are to execute the same request again, we’ll notice a different behavior: Plain Text DEBUG 32076 --- [nio-8080-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(1)] DEBUG 32076 --- [nio-8080-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Cache value found, using cache's response… Now the cache has size 1, and since it was populated by our previous request, no request to Wikipedia’s API can be observed. However, what truly highlights the elegance and ease of Apache Ignite's integration is when we launch another instance of our application on a different port using the -Dserver.port=8060 option. This is when we can see the replicated cache mechanism in action. Plain Text INFO 37600 --- [ main] o.a.i.s.c.tcp.TcpCommunicationSpi : Successfully bound communication NIO server to TCP port [port=47101, locHost=0.0.0.0/0.0.0.0, selectorsCnt=8, selectorSpins=0, pairedConn=false] INFO 37600 --- [ main] o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Successfully bound to TCP port [port=47501, localHost=0.0.0.0/0.0.0.0, locNodeId=4770d2ff-2979-4b4b-8d0e-30565aeff75e] INFO 37600 --- [1-d0db3c4f0d78%] a.i.i.p.c.d.d.p.GridDhtPartitionDemander : Starting rebalance routine [WIKIPEDIA_SUMMARIES] INFO 37600 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=6, locNode=4770d2ff, servers=2, clients=0, state=ACTIVE, CPUs=16, offheap=13.0GB, heap=4.0GB... INFO 37600 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=2, online=2, offline=0] We see that our TcpDiscoveryMulticastIpFinder discovered an already running Apache Ignite node on ports 47100/47500 running together with our first courses-service instance on port 8080. Therefore, additionally, a new cluster connection is established on ports 47101/47501. This triggers the rebalancing routine for our cache. In the end, we observe in the topology log line that the number of servers now is 2. Now if we are to make a new HTTP request to create the same course on 8060 instance, we’ll see the following: Plain Text DEBUG 37600 --- [nio-8060-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Fetched ignite cache [WIKIPEDIA_SUMMARIES] = size(1)] DEBUG 37600 --- [nio-8060-exec-2] i.e.c.s.i.CourseServiceImpl$Companion : Cache value found, using cache's response So, we used the same cache which has the size 1, and no requests to Wikipedia’s API were made. As you might think, the same goes if we are to make some requests on 8060 for another language: the cache being populated will be seen on 8080 on request for that language, too. Spring Data Support A quite surprising feature that comes with Apache Ignite is the Spring Data support, which allows us to interact with our cache in a more elegant/familiar way. The Spring Data framework offers a widely adopted API that abstracts the underlying data storage from the application layer. Apache Ignite seamlessly integrates with Spring Data by implementing the Spring Data CrudRepository interface. This integration further enhances the flexibility and adaptability of our application's data layer. Let’s configure it by adding the following dependency: Groovy implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") Let’s declare our repository, by extending the IgniteRepository. Java @Repository @RepositoryConfig(cacheName = WIKIPEDIA_SUMMARIES) interface WikipediaSummaryRepository : IgniteRepository<WikipediaApiClientImpl.WikipediaSummary, String> Having both Ignite’s Spring Data support and Spring Data JPA on the classpath might generate some bean scanning issues, which we can address by specifically instructing both the JPA and Ignite where to look for their beans like this: Java @EnableIgniteRepositories(basePackages = ["inc.evil.coursecatalog.ignite"]) @EnableJpaRepositories( basePackages = ["inc.evil.coursecatalog.repo"], excludeFilters = [ComponentScan.Filter(type = FilterType.ANNOTATION, value = [RepositoryConfig::class])] ) Having such a configuration, we ensure that Ignite will scan for its repositories only in the Ignite package, JPA will scan for its repositories only in the repo package, and will exclude any classes that have the @RepositoryConfig on them. Now let’s refactor our CourseServiceImpl so it will use the newly created WikipediaSummaryRepository: Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val summaries = wikipediaSummaryRepository.cache() log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") wikipediaSummaryRepository.findById(course.programmingLanguage).orElseGet { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") wikipediaSummaryRepository.save(course.programmingLanguage, it) it } }?.let { course.programmingLanguageDescription = it.summary } } Instead of interacting directly with the low-level cache/map, we've transitioned to directing our requests to a new high-level class called WikipediaSummaryRepository. This approach is not only more elegant in the implementation/usage, but also resonates much better with Spring fans, doesn't it? Also, you might’ve noticed that we no longer need the igniteInstance to access the cache. The repository can give it to us via .cache() method, so even if we use the IgniteRepository we don’t lose access to our cache and its low-level operations. If we are to play with it in the same manner as we did with the cache, we’ll notice that the behavior didn’t change. But wait, there’s more! Integration with Spring Data brings an abundance of advantages: query abstraction/query generation, manual queries, pagination/sorting, projections, query with Cache.Entry return type or entity-like type – you name it – and IgniteRepository will have it. For this purpose, I will experiment with the CommandLineRunner since I don’t expose any API to integrate directly with the WikipediaSummaryRepository. First, let’s write some queries: Java @Repository @RepositoryConfig(cacheName = WIKIPEDIA_SUMMARIES) interface WikipediaSummaryRepository : IgniteRepository<WikipediaSummary, String> { fun findByTitle(title: String): List<WikipediaSummary> fun findByDescriptionContains(keyword: String): List<Cache.Entry<String, WikipediaSummary>> @Query(value = "select description, count(description) as \"count\" from WIKIPEDIA_SUMMARIES.WIKIPEDIASUMMARY group by description") fun countPerDescription(): List<CountPerProgrammingLanguageType> interface CountPerProgrammingLanguageType { fun getDescription(): String fun getCount(): Int } } And here is the CommandLineRunner: Java @Bean fun init(client: WikipediaApiClient, repo: WikipediaSummaryRepository): CommandLineRunner = CommandLineRunner { run { client.fetchSummaryFor("Java programming language")?.let { repo.save("Java", it) } client.fetchSummaryFor("Kotlin programming language")?.let { repo.save("Kotlin", it) } client.fetchSummaryFor("C++")?.let { repo.save("C++", it) } client.fetchSummaryFor("Python programming language")?.let { repo.save("C#", it) } client.fetchSummaryFor("Javascript")?.let { repo.save("Javascript", it) } repo.findAll().forEach { log.info("Fetched {}", it) } repo.findByTitle("Kotlin").forEach { log.info("Fetched by title {}", it) } repo.findByDescriptionContains("programming language").forEach { log.info(" Fetched by description {}", it) } repo.countPerDescription().forEach { log.info("Count per description {}", it) } } } Before we can run it we’ll have to adjust a bit our cached entity like this: Java @JsonIgnoreProperties(ignoreUnknown = true) data class WikipediaSummary( @JsonProperty("title") @QuerySqlField(name = "title", index = true) val title: String, @JsonProperty("description") @QuerySqlField(name = "description", index = false) val description: String, @JsonProperty("extract") @QuerySqlField(name = "summary", index = false) val summary: String ) You might notice the @QuerySqlField on each of the fields, all fields that will be involved in SQL clauses must have this annotation. This annotation is needed in order to instruct Ignite to create a column for each of our fields; otherwise, it will create a single huge column containing our payload. This is a bit intrusive, but that is a small price to pay for the plethora of possibilities we gain. Now once we run it, we have the following log line: Plain Text INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Fetched WikipediaSummary(title=Python (programming language)… … INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Fetched by description Entry [key=C#, val=WikipediaSummary(title=Python (programming language)… … INFO 3252 --- [ main] i.e.c.CourseCatalogApplication$Companion : Count per description {count=1, description=General-purpose programming language derived from Java} … This proves that our implementation works as expected. Note: If you want to connect to connect to the Ignite’s in-memory database during your research, you might stumble on this VM argument: -DIGNITE_H2_DEBUG_CONSOLE=true. I wanted to mention that the Ignite team deprecated IGNITE_H2_DEBUG_CONSOLE in 2.8 version in favor of their thin JDBC driver. So if you want to connect to the DB, please refer to the updated documentation, but long story short: the JDBC URL is jdbc:ignite:thin://127.0.0.1/ with the default port 10800, and IntelliJ provides first-class support in their database datasources. Distributed Locks Another useful feature that comes with Apache Ignite is the API for distributed locks. Suppose our enhanceWithProgrammingLanguageDescription method is a slow intensive operation dealing with cache and other resources, and we wouldn’t want other threads on the same instance or even other requests from a different instance to interfere or alter something until the operation is complete. Here comes IgniteLock into play: this interface offers a comprehensive API for managing distributed reentrant locks, similar to java.util.concurrent.ReentrantLock. You can create instances of these locks using Ignite's reentrantLock() method. IgniteLock provides protection from node failures via the failoverSafe flag when set to true: the lock will automatically recover. If the owning node fails, ensure uninterrupted lock management across the cluster. On the other hand, if failoverSafe is set to false, a node failure will result in an IgniteException, rendering the lock unusable. So with this in mind let’s try and guard our so-called “critical section." Java private fun enhanceWithProgrammingLanguageDescription(course: Course) { val lock = igniteInstance.reentrantLock(SUMMARIES_LOCK, true, true, true) if (!lock.tryLock()) throw LockAcquisitionException(SUMMARIES_LOCK, "enhanceWithProgrammingLanguageDescription") log.debug("Acquired lock {}", lock) Thread.sleep(2000) val summaries = wikipediaSummaryRepository.cache() log.debug("Fetched ignite cache [$WIKIPEDIA_SUMMARIES] = size(${summaries.size()})]") wikipediaSummaryRepository.findById(course.programmingLanguage).orElseGet { wikipediaApiClient.fetchSummaryFor("${course.programmingLanguage}_(programming_language)")?.let { log.debug("No cache value found, using wikipedia's response $it to update $course programming language description") wikipediaSummaryRepository.save(course.programmingLanguage, it) it } }?.let { course.programmingLanguageDescription = it.summary } lock.unlock() } As you can see, the implementation is quite simple: we obtain the lock via the igniteInstance’s reentrantLock method and then we try locking it with tryLock(). The locking will succeed if the acquired lock is available or already held by the current thread, and it will immediately return true. Otherwise, it will return false and a LockAcquisitionException will be thrown. Then we simulate some intensive work by sleeping for 2 seconds with Thread.sleep(2000), and in the end, we release the acquired lock with unlock(). Now if we run a single instance of our app on port 8080 and try 2 subsequent requests, one will pass and the other one will fail: Plain Text ERROR 36580 --- [nio-8080-exec-2] e.c.w.r.e.RESTExceptionHandler$Companion : Exception while handling request [summaries-lock] could not be acquired for [enhanceWithProgrammingLanguageDescription] operation. Please try again. inc.evil.coursecatalog.common.exceptions.LockAcquisitionException: [summaries-lock] could not be acquired for [enhanceWithProgrammingLanguageDescription] operation. Please try again. The same goes if we are to make 1 request to an 8080 instance of our app and the next one in the 2-second timeframe to the 8060 instance - the first request will succeed while the second one will fail. Code Deployment Now let’s switch our attention to reviews-service, and remember – this service is totally unaware of courses: it is just a way to add reviews for some course_id. With this in mind, we have this entity: Java @Table("reviews") data class Review( @Id var id: Int? = null, var text: String, var author: String, @Column("created_at") @CreatedDate var createdAt: LocalDateTime? = null, @LastModifiedDate @Column("last_modified_at") var lastModifiedAt: LocalDateTime? = null, @Column("course_id") var courseId: Int? = null ) And we have this method in ReviewServiceImpl. So, our new silly feature request would be to somehow check for the existence of the course that the review has been written for. How can we do that? The most obvious choice would be to invoke a REST endpoint on courses-service to check if we have a course for the review’s course_id, but that is not what this article is about. We have Apache Ignite, right? We are going to invoke code from course-service from reviews-service via Ignite’s cluster. To do that, we need to create some kind of API or Gateway module that we are going to publish as an artifact so courses-service can implement it and reviews-service can depend on and use it to invoke the code. Okay - first things first: let’s design the new module as a courses-api module: Groovy plugins { id("org.springframework.boot") version "2.7.3" id("io.spring.dependency-management") version "1.0.13.RELEASE" kotlin("jvm") version "1.6.21" kotlin("plugin.spring") version "1.6.21" kotlin("plugin.jpa") version "1.3.72" `maven-publish` } group = "inc.evil" version = "0.0.1-SNAPSHOT" repositories { mavenCentral() } publishing { publications { create<MavenPublication>("maven") { groupId = "inc.evil" artifactId = "courses-api" version = "1.1" from(components["java"]) } } } dependencies { implementation("org.springframework.boot:spring-boot-starter-actuator") implementation("org.springframework.boot:spring-boot-starter-web") implementation("com.fasterxml.jackson.module:jackson-module-kotlin") implementation("org.jetbrains.kotlin:kotlin-reflect") implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8") implementation("org.jetbrains.kotlinx:kotlinx-coroutines-rx2:1.6.4") implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.6.4") implementation("org.apache.commons:commons-lang3:3.12.0") implementation("org.apache.ignite:ignite-core:2.15.0") testImplementation("org.junit.jupiter:junit-jupiter-api:5.8.1") testRuntimeOnly("org.junit.jupiter:junit-jupiter-engine:5.8.1") } tasks.getByName<Test>("test") { useJUnitPlatform() } Nothing fancy here, except the maven-publish plugin that we’ll use to publish the artifact to the local Maven repository. Here is the interface that courses-service will implement, and reviews-service will use: Java interface CourseApiFacade: Service { companion object { const val COURSE_API_FACADE_SERVICE_NAME = "CourseApiFacade" } fun findById(id: Int): CourseApiResponse } data class InstructorApiResponse( val id: Int?, val name: String?, val summary: String?, val description: String? ) data class CourseApiResponse( val id: Int?, val name: String, val category: String, val programmingLanguage: String, val programmingLanguageDescription: String?, val createdAt: String, val updatedAt: String, val instructor: InstructorApiResponse ) You might’ve noticed that CourseApiFacade extends org.apache.ignite.services.Service interface – an instance of grid-managed service, our entry point in the services that may be deployed. Having this module properly configured, we can add it as a dependency in courses-service: Groovy implementation(project(":courses-api")) And implement the exposed interface like this: Java @Component class CourseApiFacadeImpl : CourseApiFacade { @Transient @SpringResource(resourceName = "courseService") lateinit var courseService: CourseServiceImpl @Transient @IgniteInstanceResource //spring constructor injection won't work since ignite is not ready lateinit var igniteInstance: Ignite companion object { private val log: Logger = LoggerFactory.getLogger(this::class.java) } override fun findById(id: Int): CourseApiResponse = courseService.findById(id).let { CourseApiResponse( id = it.id, name = it.name, category = it.category.toString(), programmingLanguage = it.programmingLanguage, programmingLanguageDescription = it.programmingLanguageDescription, createdAt = it.createdAt.toString(), updatedAt = it.updatedAt.toString(), instructor = InstructorApiResponse(it.instructor?.id, it.instructor?.name, it.instructor?.summary, it.instructor?.description) ) } override fun cancel() { log.info("Canceling service") } override fun init() { log.info("Before deployment :: Pre-initializing service before execution on node {}", igniteInstance.cluster().forLocal().node()) } override fun execute() { log.info("Deployment :: The service is deployed on grid node {}", igniteInstance.cluster().forLocal().node()) } } As you can see, CourseFacadeImpl implements CourseFacade method findById and overrides some methods from the Service interface for debugging purposes. When a service is deployed on a cluster node, Ignite will invoke the execute() method of that service. Likewise, when a deployed service is canceled, Ignite will automatically invoke the cancel() method of that service. init() is guaranteed to be called before execute(). Also, there are some new annotations: @SpringResource(resourceName = "courseService") - Annotates a field or a setter method for injection of resources from Spring ApplicationContext. Since this is IgniteService now, we need to let Ignite take care of the bean injections. resourceName is a mandatory field that is equal to the bean name in the Spring applicationContext. @IgniteInstanceResource – Again, since this is going to be deployed, we can’t rely on Spring anymore for the auto-wiring, so Ignite offers this annotation that offers the possibility to inject an igniteInstance into grid tasks and grid jobs. @Transient/transient in java – This annotation/keyword makes sure that we don’t serialize unnecessary hierarchies of objects in the cluster. For everything mentioned above to work, we have to slightly modify our build.gradle dependencies for Ignite. Groovy implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-spring:2.15.0") implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") We got rid of ignite-spring-boot-autoconfigure in favor of ignite-spring, since I couldn’t make Ignite aware of the Spring’s application context with the autoconfiguration. As you might’ve guessed, since we don’t have IgniteAutoConfiguration anymore, we have to write the Igniteconfiguration manually, but don’t you worry: they are quite similar. Here’s the updated IgniteConfig in courses-service: Java @Configuration @Profile("!test") @EnableConfigurationProperties(value = [IgniteProperties::class]) @EnableIgniteRepositories(basePackages = ["inc.evil.coursecatalog.ignite"]) class IgniteConfig( val igniteProperties: IgniteProperties, val applicationContext: ApplicationContext ) { companion object { const val WIKIPEDIA_SUMMARIES = "WIKIPEDIA_SUMMARIES" } @Bean(name = ["igniteInstance"]) fun igniteInstance(igniteConfiguration: IgniteConfiguration): Ignite { return IgniteSpring.start(igniteConfiguration, applicationContext) } @Bean fun igniteConfiguration(): IgniteConfiguration { val igniteConfiguration = IgniteConfiguration() igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg return igniteConfiguration } } Not that much of a change, right? Instead of IgniteConfigurer we declared a bean named IgniteConfiguration that takes care of our configuration. We injected the applicationContext in our config so we can pass it in the rewritten igniteInstance bean that now is a Spring-aware IgniteSpring. Now that we’ve updated our configuration, we’ll have to tell Ignite about our new IgniteService – CourseApiFacade. Java @Bean fun igniteConfiguration(): IgniteConfiguration { val igniteConfiguration = IgniteConfiguration() igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setPeerClassLoadingEnabled(true) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes igniteConfiguration.setCacheConfiguration(wikipediaSummaryCacheConfiguration()) //vararg igniteConfiguration.setServiceConfiguration(courseApiFacadeConfiguration()) //vararg return igniteConfiguration } private fun courseApiFacadeConfiguration(): ServiceConfiguration { val serviceConfiguration = ServiceConfiguration() serviceConfiguration.service = courseApiFacade serviceConfiguration.name = CourseApiFacade.COURSE_API_FACADE_SERVICE_NAME serviceConfiguration.maxPerNodeCount = 1 return serviceConfiguration } We create a ServiceConfiguration which is bound to courseApiFacade with the name from the exposed interface in courses-api, and with a setting stating one service per node, lastly we set courseApiFacadeConfiguration in the IgniteConfiguration. Now back to reviews-service. First of all, we want to add the required dependencies for Apache Ignite, since reviews-service is much simpler and doesn’t need the Spring-aware Ignite. We’ll go with ignite-spring-boot-autoconfigure here: Groovy implementation("org.apache.ignite:ignite-core:2.15.0") implementation("org.apache.ignite:ignite-kubernetes:2.15.0") implementation("org.apache.ignite:ignite-indexing:2.15.0") implementation("org.apache.ignite:ignite-spring-boot-autoconfigure-ext:1.0.0") implementation("org.apache.ignite:ignite-spring-data-ext:2.0.0") Also, previously I mentioned that we are going to use that interface from courses-api. We can run the publishMavenPublicationToMavenLocal gradle task on courses-api to get our artifact published and then we can add the following dependency to reviews-service. Groovy implementation("inc.evil:courses-api:1.1") Now we need to configure Ignite here as well as we did previously in courses-service: Java @Configuration @EnableConfigurationProperties(value = [IgniteProperties::class]) @EnableIgniteRepositories(basePackages = ["inc.evil.reviews.ignite"]) class IgniteConfig(val igniteProperties: IgniteProperties) { @Bean(name = ["igniteInstance"]) fun igniteInstance(ignite: Ignite): Ignite { return ignite } @Bean fun configurer(): IgniteConfigurer { return IgniteConfigurer { igniteConfiguration: IgniteConfiguration -> igniteConfiguration.setIgniteInstanceName(igniteProperties.instanceName) igniteConfiguration.setClientMode(true) igniteConfiguration.setMetricsLogFrequency(0) // no spam igniteConfiguration.setCommunicationSpi(configureTcpCommunicationSpi()) // avoid OOM due to message limit igniteConfiguration.setDiscoverySpi(configureDiscovery()) // allow possibility to switch to Kubernetes } } private fun configureTcpCommunicationSpi(): TcpCommunicationSpi { val tcpCommunicationSpi = TcpCommunicationSpi() tcpCommunicationSpi.setMessageQueueLimit(1024) return tcpCommunicationSpi } private fun configureDiscovery(): TcpDiscoverySpi { val spi = TcpDiscoverySpi() var ipFinder: TcpDiscoveryIpFinder? = null; if (igniteProperties.discovery.tcp.enabled) { ipFinder = TcpDiscoveryMulticastIpFinder() ipFinder.setMulticastGroup(DFLT_MCAST_GROUP) } else if (igniteProperties.discovery.kubernetes.enabled) { ipFinder = TcpDiscoveryKubernetesIpFinder() ipFinder.setNamespace(igniteProperties.discovery.kubernetes.namespace) ipFinder.setServiceName(igniteProperties.discovery.kubernetes.serviceName) } spi.setIpFinder(ipFinder) return spi } } The only difference from courses-service is that reviews-service will run in client mode. Other than that, everything is the same. Okay, with Ignite properly configured, it is time to make use of our IgniteService from courses-service in reviews-service. For this purpose, I created this class: Java @Component class IgniteCoursesGateway(private val igniteInstance: Ignite) { fun findCourseById(id: Int) = courseApiFacade().findById(id) private fun courseApiFacade(): CourseApiFacade { return igniteInstance.services(igniteInstance.cluster().forServers()) .serviceProxy(CourseApiFacade.COURSE_API_FACADE_SERVICE_NAME, CourseApiFacade::class.java, false) } } IgniteCoursesGateway is an entry point in the courses domain world via the Ignite cluster. Via the autowired igniteInstance, we retrieve a serviceProxy of type CourseApiFacade for the name COURSE_API_FACADE_SERVICE_NAME. We also tell Ignite to always try to load-balance between services by setting the sticky flag to false. Then in the findCourseById(), we simply use the obtained serviceProxy to query by id for the desired course. All that’s left is to use IgniteCoursesGateway in ReviewServiceImpl to fulfill the feature’s requirements. Java override suspend fun save(review: Review): Review { runCatching { igniteCoursesGateway.findCourseById(review.courseId!!).also { log.info("Call to ignite ended with $it") } }.onFailure { log.error("Oops, ignite remote execution failed due to ${it.message}", it) } .getOrNull() ?: throw NotFoundException(CourseApiResponse::class, "course_id", review.courseId.toString()) return reviewRepository.save(review).awaitFirst() } The logic is as follows: before saving, we try to find the course by course_id from review by invoking the findCourseById in our Ignite cluster. If we have an exception (CourseApiFacadeImpl will throw a NotFoundException if the requested course was not found), we swallow it and throw a reviews-service NotFoundException stating that the course could’ve not been retrieved. If a course was returned by our method we proceed to save it – that’s it. Now let’s restart course-service and observe the logs: Plain Text INFO 23372 --- [a-67c579c6ea47%] i.e.c.f.i.CourseApiFacadeImpl$Companion : Before deployment :: Pre-initializing service before execution on node TcpDiscoveryNode … INFO 23372 --- [a-67c579c6ea47%] o.a.i.i.p.s.IgniteServiceProcessor : Starting service instance [name=CourseApiFacade, execId=52de6edc-ac6f-49d4-8c9e-17d6a6ebc8d5] INFO 23372 --- [a-67c579c6ea47%] i.e.c.f.i.CourseApiFacadeImpl$Companion : Deployment :: The service is deployed on grid node TcpDiscoveryNode … We see that according to our overridden methods of the Service interface, CourseApiFacade was successfully deployed. Now we have courses-service running, and if we are to start reviews-service, we’ll see the following log: Plain Text INFO 13708 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : Topology snapshot [ver=2, locNode=cb90109d, servers=1, clients=1, state=ACTIVE, CPUs=16, offheap=6.3GB, heap=4.0GB... INFO 13708 --- [ main] o.a.i.i.m.d.GridDiscoveryManager : ^-- Baseline [id=0, size=1, online=1, offline=0] You may notice that we have 1 server running and 1 client. Now let’s try a request to add a review for an existing course (reviews-service is using GraphQL). Plain Text GRAPHQL http://localhost:8070/graphql Content-Type: application/graphql mutation { createReview(request: {text: "Amazing, loved it!" courseId: 39 author: "Mike Scott"}) { id text author courseId createdAt lastModifiedAt } } In the logs, we’ll notice: Plain Text INFO 13708 --- [actor-tcp-nio-1] i.e.r.s.i.ReviewServiceImpl$Companion : Call to ignite ended with CourseApiResponse(id=39, name=C++ Development, category=TUTORIAL … And in the courses-service logs, we’ll notice the code execution: Plain Text DEBUG 29316 --- [2-64cc57b09c89%] i.e.c.c.aop.LoggingAspect$Companion : before :: execution(public inc.evil.coursecatalog.model.Course inc.evil.coursecatalog.service.impl.CourseServiceImpl.findById(int)) This means that the request was executed successfully. If we try the same request for a non-existent course - let’s say, for ID 999, we’ll observe the NotFoundException in reviews-service. Plain Text WARN 33188 --- [actor-tcp-nio-1] .w.g.e.GraphQLExceptionHandler$Companion : Exception while handling request: CourseApiResponse with course_id equal to [999] could not be found! Conclusion Alright, everyone, that's a wrap! I trust you now have a good grasp of what Apache Ignite is all about. We delved into designing a simple distributed cache using Ignite and Spring Boot, explored Ignite's Spring Data Support, distributed locks for guarding critical sections of code, and, finally, witnessed how Apache Ignite's code deployment can execute code within the cluster.Once again, if you missed it, you can access all the code we discussed in the link at the beginning of this article. Happy coding! More

Unveiling the Secret: Achieving 50K Concurrent User Load Using JMeter With 2.5G RAM Only

By Nirmal Suthar

Apache JMeter is an open-source, Java-based tool used for load and performance testing of various services, particularly web applications. It supports multiple protocols like HTTP, HTTPS, FTP, and more. JMeter can simulate heavy loads on a server to analyze performance under different conditions. It offers both GUI and non-GUI modes for test configuration and can display test results in various formats. JMeter also supports distributed testing, enabling it to handle multiple test threads simultaneously. Its functionality can be extended through plugins, making it a versatile and widely used tool in performance testing. JMeter stands out from other testing tools due to its exceptional concurrency model, which governs how it executes requests in parallel. The concurrency model of JMeter relies on Thread Pools, widely recognized as the standard method for parallel processing in Java and several other programming languages. However, as with any advantage, there comes a significant trade-off: the resource-intensive nature of JMeter’s concurrency model. In JMeter, each thread corresponds to a Java thread, further utilizing an Operating System (OS) thread for its execution. OS threads, although effective in accomplishing concurrent tasks, carry a certain level of weightiness, manifested in terms of memory consumption and CPU usage during context switching. This attribute poses a noteworthy challenge to JMeter’s performance. Moreover, certain operating systems enforce strict limitations on the total number of threads that can be generated, imposing implicit restrictions on JMeter’s capabilities. Unleashing the True Power, Java 21 to the Rescue Project Loom, which has gained significant attention within the Java community over the past several years, has finally been incorporated into Java 21 after several early preview releases with JEP-444. Java’s virtual threads, also known as lightweight or user-mode threads, are introduced as an experimental feature under Project Loom, which is now officially included in Java 21. While the details of this feature are interesting, they’re not the main focus of our discussion today, so that we won’t delve deeper into them at this moment. JMeter The code review reveals a straightforward process for creating a new thread group by coying ThreadGroup class. In this instance, we have simply duplicated the logic from the ThreadGroup JMeter class that we wish to modify. A key method to note is startNewThread, which is responsible for creating the threads. We have altered one line in this method: The original line of code: Thread newThread = new Thread(jmThread, jmThread.getThreadName()); Has been replaced with the following: Thread newThread = Thread.ofVirtual() .name(jmThread.getThreadName()) .unstarted(jmThread); In this modification, instead of creating a traditional thread, we’re creating a virtual thread, as introduced in Java’s Project Loom. This change allows for more lightweight, efficient thread handling. Also, other modifications, such as removing the synchronized block from addNewThreadmethod and updating similar thread creation logic at a few other places. Setup I have quickly set up nginx, which always returns a 200 ok response: # nginx.conf location /test { return 200 'OK'; } 2. Add Virtual Thread Group element: 3. Configure Threads: you will see the title Virtual Thread Properties header for the right thread group. 4. Get set, go....!! and final result: My primary focus was not on the server’s responsiveness or its ability to scale up to 50k users (which, with some tuning, could be easily achieved). Instead, I was more interested in observing how JMeter generates and handles the load, irrespective of whether the server responses were successful or failed. Summary JMeter has traditionally been resource-intensive, primarily due to its I/O-bound nature involving network requests. However, with the introduction of virtual threads, it has significantly improved in performance. The utilization of virtual threads has enabled JMeter to operate smoothly and efficiently without any glitches, even when handling heavy loads. Source Code Anyone interested in trying on their own, see the following GitHub Project for more details. More

Build Quicker With Zipper: Building a Ping Pong Ranking App Using TypeScript Functions

By Tyler Hawkins CORE

Selenium Versus Karate: A Concrete Comparative Approach

By Pier-Jean MALANDRINO

Generative AI: A New Tool in the Developer Toolbox

By Keshav Murthy CORE

Using the NGINX Docker Image

Docker is a compelling platform to package and run web applications, especially when paired with one of the many Platform-as-a-Service (PaaS) offerings provided by cloud platforms. NGINX has long provided DevOps teams with the ability to host web applications on Linux and also provides an official Docker image to use as the base for custom web applications. In this post, I explain how DevOps teams can use the NGINX Docker image to build and run web applications on Docker. Getting Started With the Base Image NGINX is a versatile tool with many uses, including a load balancer, reverse proxy, and network cache. However, when running NGINX in a Docker container, most of these high-level functions are delegated to other specialized platforms or other instances of NGINX. Typically, NGINX fulfills the function of a web server when running in a Docker container. To create an NGINX container with the default website, run the following command: docker run -p 8080:80 nginx This command will download the nginx image (if it hasn't already been downloaded) and create a container exposing port 80 in the container to port 8080 on the host machine. You can then open http://localhost:8080/index.html to view the default "Welcome to nginx!" website. To allow the NGINX container to expose custom web assets, you can mount a local directory inside the Docker container. Save the following HTML code to a file called index.html: <html> <body> Hello from Octopus! </body> </html> Next, run the following command to mount the current directory under /usr/share/nginx/html inside the NGINX container with read-only access: docker run -v $(pwd):/usr/share/nginx/html:ro -p 8080:80 nginx Open http://localhost:8080/index.html again and you see the custom HTML page displayed. One of the benefits of Docker images is the ability to bundle all related files into a single distributable artifact. To realize this benefit, you must create a new Docker image based on the NGINX image. Creating Custom Images Based on NGINX To create your own Docker image, save the following text to a file called Dockerfile: FROM nginx COPY index.html /usr/share/nginx/html/index.html Dockerfile contains instructions for building a custom Docker image. Here you use the FROM command to base your image on the NGINX one, and then use the COPY command to copy your index.html file into the new image under the /usr/share/nginx/html directory. Build the new image with the command: docker build . -t mynginx This builds a new image called mynginx. Run the new image with the command: docker run -p 8080:80 mynginx Note that you didn't mount any directories this time. However, when you open http://localhost:8080/index.html your custom HTML page is displayed because it was embedded in your custom image. NGINX is capable of much more than hosting static files. To unlock this functionality, you must use custom NGINX configuration files. Advanced NGINX Configuration NGINX exposes its functionality via configuration files. The default NGINX image comes with a simple default configuration file designed to host static web content. This file is located at /etc/nginx/nginx.conf in the default image, and has the following contents: user nginx; worker_processes auto; error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; #tcp_nopush on; keepalive_timeout 65; #gzip on; include /etc/nginx/conf.d/*.conf; } There's no need to understand this configuration file in detail, but there is one line of interest that instructs NGINX to load additional configuration files from the /etc/nginx/conf.d directory: include /etc/nginx/conf.d/*.conf; The default /etc/nginx/conf.d file configures NGINX to function as a web server. Specifically the location / block-loading files from /usr/share/nginx/html is why you mounted your HTML files to that directory previously: server { listen 80; server_name localhost; #access_log /var/log/nginx/host.access.log main; location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } You can take advantage of the instructions to load any *.conf configuration files in /etc/nginx to customize NGINX. In this example, you add a health check via a custom location listening on port 90 that responds to requests to the /nginx-health path with an HTTP 200 OK. Save the following text to a file called health-check.conf: server { listen 90; server_name localhost; location /nginx-health { return 200 "healthy\n"; add_header Content-Type text/plain; } } Modify the Dockerfile to copy the configuration file to /etc/nginx/conf.d: FROM nginx COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build the image with the command: docker build . -t mynginx Run the new image with the command. Note the new port exposed on 9090: docker run -p 8080:80 -p 9090:90 mynginx Now open http://localhost:9090/nginx-health. The health check response is returned to indicate that the web server is up and running. The examples above base your custom images on the default nginx image. However, there are other variants that provide much smaller image sizes without sacrificing any functionality. Choosing NGINX Variants The default nginx image is based on Debian. However, NGINX also provides images based on Alpine. Alpine is frequently used as a lightweight base for Docker images. To view the sizes of Docker images, they must first be pulled down to your local workstation: docker pull nginx docker pull nginx:stable-alpine You can then find the image sizes with the command: docker image ls From this, you can see the Debian image weighs around 140 MB while the Alpine image weighs around 24 MB. This is quite a saving in image sizes. To base your images on the Alpine variant, you need to update the Dockerfile: FROM nginx:stable-alpine COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build and run the image with the commands: docker build . -t mynginx docker run -p 8080:80 -p 9090:90 mynginx Once again, open http://localhost:9090/nginx-health or http://localhost:8080/index.html to view the web pages. Everything continues to work as it did previously, but your custom image is now much smaller. Conclusion NGINX is a powerful web server, and the official NGINX Docker image allows DevOps teams to host custom web applications in Docker. NGINX also supports advanced scenarios thanks to its ability to read configuration files copied into a custom Docker image. In this post, you learned how to create a custom Docker image hosting a static web application, added advanced NGINX configuration files to provide a health check endpoint, and compared the sizes of Debian and Alpine NGINX images. Resources NGINX Docker image source code Dockerfile reference Happy deployments!

By Matthew Casperson

What Is Kubernetes RBAC and Why Do You Need It?

What Is Kubernetes RBAC? Often, when organizations start their Kubernetes journey, they look up to implementing least privilege roles and proper authorization to secure their infrastructure. That’s where Kubernetes RBAC is implemented to secure Kubernetes resources such as sensitive data, including deployment details, persistent storage settings, and secrets. Kubernetes RBAC provides the ability to control who can access each API resource with what kind of access. You can use RBAC for both human (individual or group) and non-human users (service accounts) to define their types of access to various Kubernetes resources. For example, there are three different environments, Dev, Staging, and Production, which have to be given access to the team, such as developers, DevOps, SREs, App owners, and product managers. Before we get started, we would like to stress that we will treat users and service accounts as the same, from a level of abstraction- every request, either from a user or a service account, is finally an HTTP request. Yes, we understand users and service accounts (for non-human users) are different in nature in Kubernetes. How To Enable Kubernetes RBAC One can enable RBAC in Kubernetes by starting the API server with an authorization-mode flag on. Kubernetes resources used to apply RBAC on users are: Role, ClusterRole, RoleBinding, ClusterRoleBinding Service Account To manage users, Kubernetes provides an authentication mechanism, but it is usually advisable to integrate Kubernetes with your enterprise identity management for users such as Active Directory or LDAP. When it comes to non-human users (or machines or services) in a Kubernetes cluster, the concept of a Service Account comes into the picture. For example, The Kubernetes resources need to be accessed by a CD application such as Spinnaker or Argo to deploy applications, or one pod of service A needs to talk to another pod of service B. In such cases, a Service Account is used to create an account of a non-human user and specify the required authorization (using RoleBinding or ClusterRoleBinding). You can create a Service Account by creating a yaml like the below: YAML apiVersion: v1 kind: ServiceAccount metadata: name: nginx-sa spec: automountServiceAccountToken: false And then apply it. Shell $ kubectl apply -f nginx-sa.yaml serviceaccount/nginx-sa created And now you have to ServiceAccount for pods in the Deployments resource. YAML kind: Deployment metadata: name: nginx1 labels: app: nginx1 spec: replicas: 2 selector: matchLabels: app: nginx1 template: metadata: labels: app: nginx1 spec: serviceAccountName: nginx-sa containers: - name: nginx1 image: nginx ports: - containerPort: 80 In case you don’t specify about serviceAccountName in the Deployment resources, then the pods will belong to the default Service Account. Note there is a default Service Account for each namespace and one for clusters. All the default authorization policies as per the default Service Account will be applied to the pods where Service Account info is not mentioned. In the next section, we will see how to assign various permissions to a Service Account using RoleBinding and ClusterRoleBinding. Role and ClusterRole Role and ClusterRole are the Kubernetes resources used to define the list of actions a user can perform within a namespace or a cluster, respectively. In Kubernetes, the actors, such as users, groups, or ServiceAccount, are called subjects. A subject's actions, such as create, read, write, update, and delete, are called verbs. YAML apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: read-only namespace: dev-namespace rules: - apiGroups: - "" resources: ["*"] verbs: - get - list - watch In the above Role resource, we have specified that the read-only role is only applicable to the deb-ns namespace and to all the resources inside the namespace. Any ServiceAccount or users that would be bound to the read-only role can take these actions- get, list, and watch. Similarly, the ClusterRole resource will allow you to create roles pertinent to clusters. An example is given below: YAML apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: chief-role rules: - apiGroups: - "" resources: ["*"] verbs: - get - list - watch - create - update - patch - delete Any user/group/ServiceAccount bound to the chief-role will be able to take any action in the cluster. In the next section, we will see how to grant roles to subjects using RoleBinding and ClusterRoleBinding. Also, note Kubernetes allows you to configure custom roles using Role resources or use default user-facing roles such as the following: Cluster-admin: For cluster administrators, Kubernetes provides a superuser Role. The Cluster admin can perform any action on any resource in a cluster. One can use a superuser in a ClusterRoleBinding to grant full control over every resource in the cluster (and in all namespaces) or in a RoleBinding to grant full control over every resource in the respective namespace. Admin: Kubernetes provides an admin Role to permit unlimited read/write access to resources within a namespace. admin role can create roles and role bindings within a particular namespace. It does not permit write access to the namespace itself. This can be used in the RoleBinding resource. Edit: edit role grants read/write access within a given Kubernetes namespace. It cannot view or modify roles or role bindings. View: view role allows read-only access within a given namespace. It does not allow viewing or modifying of roles or role bindings. RoleBinding and ClusterRoleBinding To apply the Role to a subject (user/group/ServiceAccount), you must define a RoleBinding. This will give the user the least privileged access to required resources within the namespace with the permissions defined in the Role configuration. YAML apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: Role-binding-dev roleRef: kind: Role name: read-only #The role name you defined in the Role configuration apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: Roy #The name of the user to give the role to apiGroup: rbac.authorization.k8s.io - kind: ServiceAccount name: nginx-sa#The name of the ServiceAccount to give the role to apiGroup: rbac.authorization.k8s.io Similarly, ClusterRoleBinding resources can be created to define the Role of users. Note we have used the default superuser ClusterRole reference provided by Kubernetes instead of using our custom role. This can be applied to cluster administrators. YAML apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: superuser-binding roleRef: kind: ClusterRole name: superuser apiGroup: rbac.authorization.k8s.io subjects: - kind: User name: Aditi apiGroup: rbac.authorization.k8s.io Benefits of Kubernetes RBAC The advantage of Kubernetes RBAC is it allows you to “natively” implement the least privileges to various users and machines in your cluster. The key benefits are: Proper Authorization With the least privileges to various users and Service Accounts to Kubernetes resources, DevOps and architects can implement one of the main pillars of zero trust. Organizations can reduce the risk of data breaches and data leakage and also avoid internal employees accidentally deleting or manipulating any critical resources. Separation of Duties Applying RBAC on Kubernetes resources will always facilitate separation of duties of users such as developers, DevOps, testers, SREs, etc., in an organization. For e.g., for creating/deleting a new resource in a dev environment, developers should not depend on admin. Similarly, deploying new applications into test servers and deleting the pods after testing should not be a bottleneck for DevOps or testers. Applying authorization and permissions to users such as developers and CI/CD deployment agents into respective workspaces (say namespaces or clusters) will decrease the dependencies and cut the slack. 100% Adherence to Compliance Many industry regulations, such as HIPAA, GDPR, SOX, etc., demand tight authentication and authorization mechanisms in the software field. Using Kubernetes RBAC, DevOps, and architects can quickly implement RBAC into their Kubernetes cluster and improve their posture to adhere to those standards. Disadvantages of Kubernetes RBAC For small and medium enterprises, using Kubernetes RBAC is justified, but it is not advisable to use Kubernetes RBAC for the below reasons: There can be many users and machines, and applying Kubernetes RBAC can be cumbersome to implement and maintain. Granular visibility of who performed what operation is difficult. For example, large enterprises would require information such as violations or malicious attempts against RBAC permissions.

By Debasree Panda

How To Deploy Helidon Application to Kubernetes With Kubernetes Maven Plugin

In this article, we delve into the exciting realm of containerizing Helidon applications, followed by deploying them effortlessly to a Kubernetes environment. To achieve this, we'll harness the power of JKube’s Kubernetes Maven Plugin, a versatile tool for Java applications for Kubernetes deployments that has recently been updated to version 1.14.0. What's exciting about this release is that it now supports the Helidon framework, a Java Microservices gem open-sourced by Oracle in 2018. If you're curious about Helidon, we've got some blog posts to get you up to speed: Building Microservices With Oracle Helidon Ultra-Fast Microservices: When MicroStream Meets Helidon Helidon: 2x Productivity With Microprofile REST Client In this article, we will closely examine the integration between JKube’s Kubernetes Maven Plugin and Helidon. Here's a sneak peek of the exciting journey we'll embark on: We'll kick things off by generating a Maven application from Helidon Starter Transform your Helidon application into a nifty Docker image. Craft Kubernetes YAML manifests tailored for your Helidon application. Apply those manifests to your Kubernetes cluster. We'll bundle those Kubernetes YAML manifests into a Helm Chart. We'll top it off by pushing that Helm Chart to a Helm registry. Finally, we'll deploy our Helidon application to Red Hat OpenShift. An exciting aspect worth noting is that JKube’s Kubernetes Maven Plugin can be employed with previous versions of Helidon projects as well. The only requirement is to provide your custom image configuration. With this latest release, Helidon users can now easily generate opinionated container images. Furthermore, the plugin intelligently detects project dependencies and seamlessly incorporates Kubernetes health checks into the generated manifests, streamlining the deployment process. Setting up the Project You can either use an existing Helidon project or create a new one from Helidon Starter. If you’re on JDK 17 use 3.x version of Helidon. Otherwise, you can stick to Helidon 2.6.x which works with older versions of Java. In the starter form, you can choose either Helidon SE or Helidon Microprofile, choose application type, and fill out basic details like project groupId, version, and artifactId. Once you’ve set your project, you can add JKube’s Kubernetes Maven Plugin to your pom.xml: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> </plugin> Also, the plugin version is set to 1.14.0, which is the latest version at the time of writing. You can check for the latest version on the Eclipse JKube releases page. It’s not really required to add the plugin if you want to execute it directly from some CI pipeline. You can just provide a fully qualified name of JKube’s Kubernetes Maven Plugin while issuing some goals like this: Shell $ mvn org.eclipse.jkube:kubernetes-maven-plugin:1.14.0:resource Now that we’ve added the plugin to the project, we can start using it. Creating Container Image (JVM Mode) In order to build a container image, you do not need to provide any sort of configuration. First, you need to build your project. Shell $ mvn clean install Then, you just need to run k8s:build goal of JKube’s Kubernetes Maven Plugin. By default, it builds the image using the Docker build strategy, which requires access to a Docker daemon. If you have access to a docker daemon, run this command: Shell $ mvn k8s:build If you don’t have access to any docker daemon, you can also build the image using the Jib build strategy: Shell $ mvn k8s:build -Djkube.build.strategy=jib You will notice that Eclipse JKube has created an opinionated container image for your application based on your project configuration. Here are some key points about JKube’s Kubernetes Maven Plugin to observe in this zero configuration mode: It used quay.io/jkube/jkube-java as a base image for the container image It added some labels to the container image (picked from pom.xml) It exposed some ports in the container image based on the project configuration It automatically copied relevant artifacts and libraries required to execute the jar in the container environment. Creating Container Image (Native Mode) In order to create a container image for the native executable, we need to generate the native executable first. In order to do that, let’s build our project in the native-image profile (as specified in Helidon GraalVM Native Image documentation): Shell $ mvn package -Pnative-image This creates a native executable file in the target folder of your project. In order to create a container image based on this executable, we just need to run k8s:build goal but also specify native-image profile: Shell $ mvn k8s:build -Pnative-image Like JVM mode, Eclipse JKube creates an opinionated container image but uses a lightweight base image: registry.access.redhat.com/ubi8/ubi-minimal and exposes only the required ports by application. Customizing Container Image as per Requirements Creating a container image with no configuration is a really nice way to get started. However, it might not suit everyone’s use case. Let’s take a look at how to configure various aspects of the generated container image. You can override basic aspects of the container image with some properties like this: Property Name Description jkube.generator.name Change Image Name jkube.generator.from Change Base Image jkube.generator.tags A comma-separated value of additional tags for the image If you want more control, you can provide a complete XML configuration for the image in the plugin configuration section: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>${jkube.version}</version> <configuration> <images> <image> <name>${project.artifactId}:${project.version}</name> <build> <from>openjdk:11-jre-slim</from> <ports>8080</ports> <assembly> <mode>dir</mode> <targetDir>/deployments</targetDir> <layers> <layer> <id>lib</id> <fileSets> <fileSet> <directory>${project.basedir}/target/libs</directory> <outputDirectory>libs</outputDirectory> <fileMode>0640</fileMode> </fileSet> </fileSets> </layer> <layer> <id>app</id> <files> <file> <source>${project.basedir}/target/${project.artifactId}.jar</source> <outputDirectory>.</outputDirectory> </file> </files> </layer> </layers> </assembly> <cmd>java -jar /deployments/${project.artifactId}.jar</cmd> </build> </image> </images> </configuration> </plugin> The same is also possible by providing your own Dockerfile in the project base directory. Kubernetes Maven Plugin automatically detects it and builds a container image based on its content: Dockerfile FROM openjdk:11-jre-slim COPY maven/target/helidon-quickstart-se.jar /deployments/ COPY maven/target/libs /deployments/libs CMD ["java", "-jar", "/deployments/helidon-quickstart-se.jar"] EXPOSE 8080 Pushing the Container Image to Quay.io: Once you’ve built a container image, you most likely want to push it to some public or private container registry. Before pushing the image, make sure you’ve renamed your image to include the registry name and registry user. If I want to push an image to Quay.io in the namespace of a user named rokumar, this is how I would need to rename my image: Shell $ mvn k8s:build -Djkube.generator.name=quay.io/rokumar/%a:%v %a and %v correspond to project artifactId and project version. For more information, you can check the Kubernetes Maven Plugin Image Configuration documentation. Once we’ve built an image with the correct name, the next step is to provide credentials for our registry to JKube’s Kubernetes Maven Plugin. We can provide registry credentials via the following sources: Docker login Local Maven Settings file (~/.m2/settings.xml) Provide it inline using jkube.docker.username and jkube.docker.password properties Once you’ve configured your registry credentials, you can issue the k8s:push goal to push the image to your specified registry: Shell $ mvn k8s:push Generating Kubernetes Manifests In order to generate opinionated Kubernetes manifests, you can use k8s:resource goal from JKube’s Kubernetes Maven Plugin: Shell $ mvn k8s:resource It generates Kubernetes YAML manifests in the target directory: Shell $ ls target/classes/META-INF/jkube/kubernetes helidon-quickstart-se-deployment.yml helidon-quickstart-se-service.yml JKube’s Kubernetes Maven Plugin automatically detects if the project contains io.helidon:helidon-health dependency and adds liveness, readiness, and startup probes: YAML $ cat target/classes/META-INF/jkube/kubernetes//helidon-quickstart-se-deployment. yml | grep -A8 Probe livenessProbe: failureThreshold: 3 httpGet: path: /health/live port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 -- readinessProbe: failureThreshold: 3 httpGet: path: /health/ready port: 8080 scheme: HTTP initialDelaySeconds: 0 periodSeconds: 10 successThreshold: 1 Applying Kubernetes Manifests JKube’s Kubernetes Maven Plugin provides k8s:apply goal that is equivalent to kubectl apply command. It just applies the resources generated by k8s:resource in the previous step. Shell $ mvn k8s:apply Packaging Helm Charts Helm has established itself as the de facto package manager for Kubernetes. You can package generated manifests into a Helm Chart and apply it on some other cluster using Helm CLI. You can generate a Helm Chart of generated manifests using k8s:helm goal. The interesting thing is that JKube’s Kubernetes Maven Plugin doesn’t rely on Helm CLI for generating the chart. Shell $ mvn k8s:helm You’d notice Helm Chart is generated in target/jkube/helm/ directory: Shell $ ls target/jkube/helm/helidon-quickstart-se/kubernetes Chart.yaml helidon-quickstart-se-0.0.1-SNAPSHOT.tar.gz README.md templates values.yaml Pushing Helm Charts to Helm Registries Usually, after generating a Helm Chart locally, you would want to push it to some Helm registry. JKube’s Kubernetes Maven Plugin provides k8s:helm-push goal for achieving this task. But first, we need to provide registry details in plugin configuration: XML <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>kubernetes-maven-plugin</artifactId> <version>1.14.0</version> <configuration> <helm> <snapshotRepository> <name>ChartMuseum</name> <url>http://example.com/api/charts</url> <type>CHARTMUSEUM</type> <username>user1</username> </snapshotRepository> </helm> </configuration> </plugin> JKube’s Kubernetes Maven Plugin supports pushing Helm Charts to ChartMuseum, Nexus, Artifactory, and OCI registries. You have to provide the applicable Helm repository type and URL. You can provide the credentials via environment variables, properties, or ~/.m2/settings.xml. Once you’ve all set up, you can run k8s:helm-push goal to push chart: Shell $ mvn k8s:helm-push -Djkube.helm.snapshotRepository.password=yourpassword Deploying To Red Hat OpenShift If you’re deploying to Red Hat OpenShift, you can use JKube’s OpenShift Maven Plugin to deploy your Helidon application to an OpenShift cluster. It contains some add-ons specific to OpenShift like S2I build strategy, support for Routes, etc. You also need to add the JKube’s OpenShift Maven Plugin plugin to your pom.xml. Maybe you can add it in a separate profile: XML <profile> <id>openshift</id> <build> <plugins> <plugin> <groupId>org.eclipse.jkube</groupId> <artifactId>openshift-maven-plugin</artifactId> <version>${jkube.version}</version> </plugin> </plugins> </build> </profile> Then, you can deploy the application with a combination of these goals: Shell $ mvn oc:build oc:resource oc:apply -Popenshift Conclusion In this article, you learned how smoothly you can deploy your Helidon applications to Kubernetes using Eclipse JKube’s Kubernetes Maven Plugin. We saw how effortless it is to package your Helidon application into a container image and publish it to some container image registry. We can alternatively generate Helm Charts of our Kubernetes YAML manifests and publish Helm Charts to some Helm registry. In the end, we learned about JKube’s OpenShift Maven Plugin, which is specifically designed for Red Hat OpenShift users who want to deploy their Helidon applications to Red Hat OpenShift. You can find the code used in this blog post in this GitHub repository. In case you’re interested in knowing more about Eclipse JKube, you can check these links: Documentation Github Issue Tracker StackOverflow YouTube Channel Twitter Gitter Chat

By Rohan Kumar

Choosing the Appropriate AWS Load Balancer: ALB vs. NLB

With the advent of cloud computing, managing network traffic and ensuring optimal performance have become critical aspects of system architecture. Amazon Web Services (AWS), a leading cloud service provider, offers a suite of load balancers to manage network traffic effectively for applications running on its platform. Two such offerings are the Application Load Balancer (ALB) and Network Load Balancer (NLB). This extensive guide aims to provide an in-depth comparison between these two types of load balancers, helping you choose the most suitable option for your application's needs. Overview The primary role of a load balancer is to distribute network traffic evenly among multiple servers or 'targets' to ensure smooth performance and prevent any single server from being overwhelmed. AWS provides three types of load balancers: Classic Load Balancer (CLB), Application Load Balancer (ALB), and Network Load Balancer (NLB). The ALB operates at Layer 7 of the OSI model, handling HTTP/HTTPS traffic. It offers advanced request routing based on the content of the request, making it ideal for complex web applications. On the other hand, the NLB operates at Layer 4, dealing with TCP traffic. It's designed for extreme performance and low latencies, offering static IP addresses per Availability Zone (AZ). Choosing the right load balancer is crucial as it directly impacts your application’s performance, availability, security, and cost. For instance, if your application primarily handles HTTP requests and requires sophisticated routing rules, an ALB would be more appropriate. Conversely, if your application requires high throughput, low latency, or a static IP address, you should opt for an NLB. Fundamentals of Load Balancing The Network Load Balancer is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. Unpredictable traffic patterns do not affect its performance, thanks to its ability to handle sudden and volatile traffic. Furthermore, it supports long-lived TCP connections that are ideal for WebSocket-type applications. The Application Load Balancer, on the other hand, is best suited for load balancing HTTP and HTTPS traffic. It operates at the request level, allowing advanced routing, microservices, and container-based architecture. It can route requests to different services based on the content of the request, which is ideal for modern, complex web applications. Key Features and Capabilities The NLB provides several important features, such as static IP support, zonal isolation, and low-latency performance. It distributes traffic across multiple targets within one or more AZs, ensuring a robust and reliable performance. Furthermore, it offers connection multiplexing and stickiness, enabling efficient utilization of resources. On the other hand, the ALB comes with built-in features like host and path-based routing, SSL/TLS decryption, and integration with AWS WAF, protecting your applications from various threats. It also supports advanced routing algorithms, slow start mode for new targets, and integration with container services. These features make it ideal for modern, modular, and microservices-based applications. Both ALB and NLB offer unique advantages. While ALB's strength lies in flexible application management and advanced routing features, NLB shines in areas of extreme performance and support for static IP addresses. It's also worth noting that while ALB can handle HTTP/1, HTTP/2, and gRPC protocols, NLB is designed for lower-level TCP and UDP traffic. Performance and Efficiency NLB excels in terms of performance due to its design. As it operates at the transport layer (Layer 4), it merely forwards incoming TCP or UDP connections to a target without inspecting the details of every request. This makes NLB significantly faster and more efficient in forwarding incoming requests, reducing latency. In contrast, ALB operates at the application layer (Layer 7), inspecting details of every incoming HTTP/HTTPS request. While this introduces a slight overhead compared to NLB, it allows ALB to perform advanced routing based on the content of the request, providing flexibility and control. When it comes to raw performance and low latency, NLB has an advantage due to its simple operation at Layer 4. However, ALB offers additional flexibility and control at Layer 7, which can lead to more efficient request handling in complex applications. Handling Traffic Spikes NLB is designed to handle sudden and massive spikes in traffic without requiring any pre-warming or scaling. This is because NLB does not need to scale the number of nodes processing incoming connections, allowing it to adapt instantly to increased traffic. ALB, on the other hand, adapts to an increase in connections and requests automatically. However, this scaling process takes some time, so during sudden, substantial traffic spikes, ALB might not be able to handle all incoming requests immediately. In such cases, AWS recommends informing them in advance about expected traffic spikes so they can pre-warm the ALB. While both NLB and ALB can handle traffic spikes, NLB's design allows it to respond more quickly to sudden increases in traffic, making it a better choice for applications with unpredictable or highly volatile traffic patterns. However, with proper planning and communication with AWS, ALB can also effectively manage large traffic spikes. Security NLB provides robust security features, including TLS termination and integration with VPC security groups. However, it lacks some advanced security features, such as support for AWS WAF and user authentication, which are available in ALB. ALB offers advanced security features like integration with AWS WAF, SSL/TLS termination, and user authentication using OpenID Connect and SAML. It also allows the creation of custom security policies, making it more flexible in terms of security. Both NLB and ALB offer robust security features, but ALB provides additional flexibility and control with its support for AWS WAF and user authentication. However, the choice between the two should be based on your specific security requirements. If your application primarily deals with HTTP/HTTPS traffic and requires advanced security controls, ALB would be a better choice. On the other hand, for applications requiring high throughput and low latency, NLB might be a more suitable option despite its limited advanced security features. Costs and Pricing The cost of using an NLB is largely dependent on the amount of data processed, the duration of usage, and whether you use additional features like cross-zone load balancing. While NLB pricing is relatively lower than ALB, it can cause more connections and hence, a higher load on targets, potentially leading to increased costs. Like NLB, the cost of ALB is based on the amount of data processed and the duration of usage. However, due to its additional features, ALB generally has a higher cost than NLB. However, it's important to note that ALB's sophisticated routing and management features could lead to more efficient resource usage, potentially offsetting its higher price. While NLB may appear cheaper at first glance, the total cost of operation should take into account the efficiency of resource usage, which is where ALB excels with its advanced routing and management features. Ultimately, the most cost-effective choice will depend on your application's specific needs and architecture. Integration and Compatibility NLB integrates seamlessly with other AWS services, such as AWS Auto Scaling Groups, Amazon EC2 Container Service (ECS), and Amazon EC2 Spot Fleet. It also works well with containerized applications and supports both IPv4 and IPv6 addresses. ALB offers extensive integration options with a wide range of AWS services, including AWS Auto Scaling Groups, Amazon ECS, AWS Fargate, and AWS Lambda. It also supports both IPv4 and IPv6 addresses and integrates with container-based and serverless architectures. Both NLB and ALB integrate seamlessly into existing AWS infrastructure. They support various AWS services, making them versatile choices for different application architectures. However, with its additional features and capabilities, ALB may require slightly more configuration than NLB. Conclusion While both ALB and NLB are powerful tools for managing network traffic in AWS, they cater to different needs and scenarios. ALB operates at the application layer, handling HTTP/HTTPS traffic with advanced request routing capabilities, making it suitable for complex web applications. NLB operates at the transport layer, dealing with TCP/UDP traffic, providing high performance and low latency, making it ideal for applications requiring high throughput. The choice between ALB and NLB depends on your specific application requirements. If your application handles HTTP/HTTPS traffic and requires advanced routing capabilities, ALB is the right choice. If your application requires high performance, low latency, and static IP addresses, then NLB is more suitable. For microservices architecture or container-based applications that require advanced routing and flexible management, go for ALB. For applications requiring high throughput and low latency, such as multiplayer gaming, real-time streaming, or IoT applications, choose NLB. As always, the best choice depends on understanding your application's requirements and choosing the tool that best fits those needs.

By Satrajit Basu CORE

Podman Desktop Review

In this blog, you will take a closer look at Podman Desktop, a graphical tool when you are working with containers. Enjoy! Introduction Podman is a container engine, just as Docker is. Podman commands are to be executed by means of a CLI (Command Line Interface), but it would come in handy when a GUI would be available. That is exactly the purpose of Podman Desktop! As stated on the Podman Desktop website: “Podman Desktop is an open source graphical tool enabling you to seamlessly work with containers and Kubernetes from your local environment.” In the next sections, you will execute most of the commands as executed in the two previous posts. If you are new to Podman, it is strongly advised to read those two posts first before continuing. Is Podman a Drop-in Replacement for Docker? Podman Equivalent for Docker Compose Sources used in this blog can be found on GitHub. Prerequisites Prerequisites for this blog are: Basic Linux knowledge, Ubuntu 22.04 is used during this blog; Basic Podman knowledge, see the previous blog posts; Podman version 3.4.4 is used in this blog because that is the version available for Ubuntu although the latest stable release is version 4.6.0 at the time of writing. Installation and Startup First of all, Podman Desktop needs to be installed, of course. Go to the downloads page. When using the Download button, a flatpak file will be downloaded. Flatpak is a framework for distributing desktop applications across various Linux distributions. However, this requires you to install flatpak. A tar.gz file is also available for download, so use this one. After downloading, extract the file to /opt: Shell $ sudo tar -xvf podman-desktop-1.2.1.tar.gz -C /opt/ In order to start Podman Desktop, you only need to double-click the podman-desktop file. The Get Started with Podman Desktop screen is shown. Click the Go to Podman Desktop button, which will open the Podman Desktop main screen. As you can see from the screenshot, Podman Desktop detects that Podman is running but also that Docker is running. This is already a nice feature because this means that you can use Podman Desktop for Podman as well as for Docker. At the bottom, a Docker Compatibility warning is shown, indicating that the Docker socket is not available and some Docker-specific tools will not function correctly. But this can be fixed, of course. In the left menu, you can find the following items from top to bottom: the dashboard, the containers, the pods, the images, and the volumes. Build an Image The container image you will try to build consists out of a Spring Boot application. It is a basic application containing one Rest endpoint, which returns a hello message. There is no need to build the application. You do need to download the jar-file and put it into a target directory at the root of the repository. The Dockerfile you will be using is located in the directory podman-desktop. Choose in the left menu the Images tab. Also note that in the screenshot, both Podman images and Docker images are shown. Click the Build an Image button and fill it in as follows: Containerfile path: select file podman-desktop/1-Dockerfile. Build context directory: This is automatically filled out for you with the podman-desktop directory. However, you need to change this to the root of the repository; otherwise, the jar-file is not part of the build context and cannot be found by Podman. Image Name: docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT Container Engine: Podman Click the Build button. This results in the following error: Shell Uploading the build context from <user directory>/mypodmanplanet...Can take a while... Error:(HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/bError:Error: (HTTP code 500) server error - potentially insufficient UIDs or GIDs available in user namespace (requested 262143:262143 for /var/tmp/libpod_builder2108531042/build/.git): Check /etc/subuid and /etc/subgid: lchown /var/tmp/libpod_builder2108531042/build/.git: invalid argument This error sounds familiar because the error was also encountered in a previous blog. Let’s try to build the image via the command line: Shell $ podman build . --tag docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT -f podman-desktop/1-Dockerfile The image is built without any problem. An issue has been raised for this problem. At the time of writing, building an image via Podman Desktop is not possible. Start a Container Let’s see whether you can start the container. Choose in the left menu the Containers tab and click the Create a Container button. A choice menu is shown. Choose an Existing image. The Images tab is shown. Click the Play button on the right for the mypodmanplanet image. A black screen is shown, and no container is started. Start the container via CLI: Shell $ podman run -p 8080:8080 --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The running container is now visible in Podman Desktop. Test the endpoint, and this functions properly. Shell $ curl http://localhost:8080/hello Hello Podman! Same conclusion as for building the image. At the time of writing, it is not possible to start a container via Podman Desktop. What is really interesting is the actions menu. You can view the container logs. The Inspect tab shows you the details of the container. The Kube tab shows you what the Kubernetes deployment yaml file will look like. The Terminal tab gives you access to a terminal inside the container. You can also stop, restart, and remove the container from Podman Desktop. Although starting the container did not work, Podman Desktop offers some interesting features that make it easier to work with containers. Volume Mount Remove the container from the previous section. You will create the container again, but this time with a volume mount to a specific application.properties file, which will ensure that the Spring Boot application runs on port 8082 inside the container. Execute the following command from the root of the repository: Shell $ podman run -p 8080:8082 --volume ./properties/application.properties:/opt/app/application.properties:ro --name mypodmanplanet -d docker.io/mydeveloperplanet/mypodmanplanet:0.0.1-SNAPSHOT The container is started successfully, but an error message is shown in Podman Desktop. This error will show up regularly from now on. Restarting Podman Desktop resolves the issue. An issue has been filed for this problem. Unfortunately, the issue cannot be reproduced consistently. The volume is not shown in the Volumes tab, but that’s because it is an anonymous volume. Let’s create a volume and see whether this shows up in the Volumes tab. Shell $ podman volume create myFirstVolume myFirstVolume The volume is not shown in Podman Desktop. It is available via the command line, however. Shell $ podman volume ls DRIVER VOLUME NAME local myFirstVolume Viewing volumes is not possible with Podman Desktop at the time of writing. Delete the volume. Shell $ podman volume rm myFirstVolume myFirstVolume Create Pod In this section, you will create a Pod containing two containers. The setup is based on the one used for a previous blog. Choose in the left menu the Pods tab and click the Play Kubernetes YAML button. Select the YAML file Dockerfiles/hello-pod-2-with-env.yaml. Click the Play button. The Pod has started. Check the Containers tab, and you will see the three containers which are part of the Pod. Verify whether the endpoints are accessible. Shell $ curl http://localhost:8080/hello Hello Podman! $ curl http://localhost:8081/hello Hello Podman! The Pod can be stopped and deleted via Podman Desktop. Sometimes, Podman Desktop stops responding after deleting the Pod. After a restart of Podman Desktop, the Pod can be deleted without experiencing this issue. Conclusion Podman Desktop is a nice tool with some fine features. However, quite some bugs were encountered when using Podman Desktop (I did not create an issue for all of them). This might be due to the older version of Podman, which is available for Ubuntu, but then I would have expected that an incompatibility warning would be raised when starting Podman Desktop. However, it is a nice tool, and I will keep on using it for the time being.

By Gunter Rotsaert CORE

Deploy a Session Recording Solution Using Ansible and Audit Your Bastion Host

Learn how to record SSH sessions on a Red Hat Enterprise Linux VSI in a Private VPC network using in-built packages. The VPC private network is provisioned through Terraform and the RHEL packages are installed using Ansible automation. What Is Session Recording and Why Is It Required? As noted in "Securely record SSH sessions on RHEL in a private VPC network," a Bastion host and a jump server are both security mechanisms used in network and server environments to control and enhance security when connecting to remote systems. They serve similar purposes but have some differences in their implementation and use cases. The Bastion host is placed in front of the private network to take SSH requests from public traffic and pass the request to the downstream machine. Bastion hosts and jump servers are vulnerable to intrusion as they are exposed to public traffic. Session recording helps an administrator of a system to audit user SSH sessions and comply with regulatory requirements. In the event of a security breach, you as an administrator would like to audit and analyze the user sessions. This is critical for a security-sensitive system. Before deploying the session recording solution, you need to provision a private VPC network following the instructions in the article, "Architecting a Completely Private VPC Network and Automating the Deployment." Alternatively, if you are planning to use your own VPC infrastructure, you need to attach a floating IP to the virtual server instance and a public gateway to each of the subnets. Additionally, you need to allow network traffic from public internet access. Deploy Session Recording Using Ansible To be able to deploy the Session Recording solution you need to have the following packages installed on the RHEL VSI: tlog SSSD cockpit-session-recording The packages will be installed through Ansible automation on all the VSIs both bastion hosts and RHEL VSI. If you haven't done so yet, clone the GitHub repository and move to the Ansible folder. Shell git clone https://github.com/VidyasagarMSC/private-vpc-network cd ansible Create hosts.ini from the template file. Shell cp hosts_template.ini hosts.ini Update the hosts.ini entries as per your VPC IP addresses. Plain Text [bastions] 10.10.0.13 10.10.65.13 [servers] 10.10.128.13 [bastions:vars] ansible_port=22 ansible_user=root ansible_ssh_private_key_file=/Users/vmac/.ssh/ssh_vpc packages="['tlog','cockpit-session-recording','systemd-journal-remote']" [servers:vars] ansible_port=22 ansible_user=root ansible_ssh_private_key_file=/Users/vmac/.ssh/ssh_vpc ansible_ssh_common_args='-J root@10.10.0.13' packages="['tlog','cockpit-session-recording','systemd-journal-remote']" Run the Ansible playbook to install the packages from an IBM Cloud private mirror/repository. Shell ansible-playbook main_playbook.yml -i hosts.ini --flush-cache Running Ansible playbooks You can see in the image that after you SSH into the RHEL machine now, you will see a note saying that the current session is being recorded. Check the Session Recordings, Logs, and Reports If you closely observe the messages post SSH, you will see a URL to the web console that can be accessed using the machine name or private IP over port 9090. To allow traffic on port 9090, in the Terraform code, Change the value of the allow_port_9090 variable to true and run terraform apply. The latest terraform apply will add ACL and security group rules to allow traffic on port 9090. Now, open a browser and navigate to http://10.10.128.13:9090 . To access using the VSI name, you need to set up a private DNS (out of scope for this article). You need a root password to access the web console. RHEL web console Navigate to session recording to see the list of session recordings. Along with session recordings, you can check the logs, diagnostic reports, etc. Session recording on the Web console Recommended Reading How to use Schematics - Terraform UI to provision the cloud resources Automation, Ansible, AI

By Vidyasagar (Sarath Chandra) Machupalli CORE

Automate Your Quarkus Deployment Using Ansible

In this article, we’ll explain how to use Ansible to build and deploy a Quarkus application. Quarkus is an exciting, lightweight Java development framework designed for cloud and Kubernetes deployments, and Red Hat Ansible Automation Platform is one of the most popular automation tools and a star product from Red Hat. Set Up Your Ansible Environment Before discussing how to automate a Quarkus application deployment using Ansible, we need to ensure the prerequisites are in place. First, you have to install Ansible on your development environment. On a Fedora or a Red Hat Enterprise Linux machine, this is achieved easily by utilizing the dnf package manager: Shell $ dnf install ansible-core The only other requirement is to install the Ansible collection dedicated to Quarkus: Shell $ ansible-galaxy collection install middleware_automation.quarkus This is all you need to prepare the Ansible control machine (the name given to the machine executing Ansible). Generally, the control node is used to set up other systems that are designated under the name targets. For the purpose of this tutorial, and for simplicity's sake, we are going to utilize the same system for both the control node and our (only) target. This will make it easier to reproduce the content of this article on a single development machine. Note that you don’t need to set up any kind of Java development environment, because the Ansible collection will take care of that. The Ansible collection dedicated to Quarkus is a community project, and it’s not supported by Red Hat. However, both Quarkus and Ansible are Red Hat products and thus fully supported. The Quarkus collection might be supported at some point in the future but is not at the time of the writing of this article. Inventory File Before we can execute Ansible, we need to provide the tool with an inventory of the targets. There are many ways to achieve that, but the simplest solution for a tutorial such as this one is to write up an inventory file of our own. As mentioned above, we are going to use the same host for both the controller and the target, so the inventory file has only one host. Here again, for simplicity's sake, this machine is going to be the localhost: Shell $ cat inventory [all] localhost ansible_connection=local Refer to the Ansible documentation for more information on Ansible inventory. Build and Deploy the App With Ansible For this demonstration, we are going to utilize one of the sample applications provided as part of the Quarkus quick starts project. We will use Ansible to build and deploy the Getting Started application. All we need to provide to Ansible is the application name, repository URL, and the destination folder, where to deploy the application on the target. Because of the directory structure of the Quarkus quick start, containing several projects, we'll also need to specify the directory containing the source code: Shell $ ansible-playbook -i inventory middleware_automation.quarkus.playbook \ -e app_name='optaplanner-quickstart' \ -e quarkus_app_source_folder='optaplanner-quickstart' \ -e quarkus_path_to_folder_to_deploy=/opt/optplanner \ -e quarkus_app_repo_url='https://github.com/quarkusio/quarkus-quickstarts.git' Below is the output of this command: PLAY [Build and deploy a Quarkus app using Ansible] **************************** TASK [Gathering Facts] ********************************************************* ok: [localhost] TASK [Build the Quarkus from https://github.com/quarkusio/quarkus-quickstarts.git.] *** TASK [middleware_automation.quarkus.quarkus : Ensure required parameters are provided.] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Define path to mvnw script.] ***** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure that builder host localhost has appropriate JDK installed: java-17-openjdk] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Delete previous workdir (if requested).] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure app workdir exists: /tmp/workdir] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Checkout the application source code.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Build the App using Maven] ******* ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Display build application log] *** skipping: [localhost] TASK [Deploy Quarkus app on target.] ******************************************* TASK [middleware_automation.quarkus.quarkus : Ensure required parameters are provided.] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure requirements on target system are fullfilled.] *** included: /root/.ansible/collections/ansible_collections/middleware_automation/quarkus/roles/quarkus/tasks/deploy/prereqs.yml for localhost TASK [middleware_automation.quarkus.quarkus : Ensure required OpenJDK is installed on target.] *** skipping: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus system group exists on target system] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus user exists on target system.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure deployement directory exits: /opt/optplanner.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Set Quarkus app source dir (if not defined).] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Deploy application as a systemd service on target system.] *** included: /root/.ansible/collections/ansible_collections/middleware_automation/quarkus/roles/quarkus/tasks/deploy/service.yml for localhost TASK [middleware_automation.quarkus.quarkus : Deploy application from to target system] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Deploy Systemd configuration for Quarkus app] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Perform daemon-reload to ensure the changes are picked up] *** ok: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure Quarkus app service is running.] *** changed: [localhost] TASK [middleware_automation.quarkus.quarkus : Ensure firewalld configuration is appropriate (if requested).] *** skipping: [localhost] PLAY RECAP ********************************************************************* localhost : ok=19 changed=8 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0 As you can see, the Ansible collection for Quarkus does all the heavy lifting for us: its content takes care of checking out the source code from GitHub and builds the application. It also ensures the system used for this step has the required OpenJDK installed on the target machine. Once the application is successfully built, the collection takes care of the deployment. Here again, it checks that the appropriate OpenJDK is available on the target system. Then, it verifies that the required user and group exist on the target and if not, creates them. This is recommended mostly to be able to run the Quarkus application with a regular user, rather than with the root account. With those requirements in place, the jars produced during the build phase are copied over to the target, along with the required configuration for the application integration into systemd as a service. Any change to the systemd configuration requires reloading its daemon, which the collection ensures will happen whenever it is needed. With all of that in place, the collection starts the service itself. Validate the Execution Results Let’s take a minute to verify that all went well and that the service is indeed running: Shell # systemctl status optaplanner-quickstart.service ● optaplanner-quickstart.service - A Quarkus service named optaplanner-quickstart Loaded: loaded (/usr/lib/systemd/system/optaplanner-quickstart.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2023-04-26 09:40:13 UTC; 3h 19min ago Main PID: 934 (java) CGroup: /system.slice/optaplanner-quickstart.service └─934 /usr/bin/java -jar /opt/optplanner/quarkus-run.jar Apr 26 09:40:13 be44b3acb1f3 systemd[1]: Started A Quarkus service named optaplanner-quickstart. Apr 26 09:40:14 be44b3acb1f3 java[934]: __ ____ __ _____ ___ __ ____ ______ Apr 26 09:40:14 be44b3acb1f3 java[934]: --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ Apr 26 09:40:14 be44b3acb1f3 java[934]: -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \ Apr 26 09:40:14 be44b3acb1f3 java[934]: --\___\_\____/_/ |_/_/|_/_/|_|\____/___/ Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,843 INFO [io.quarkus] (main) optaplanner-quickstart 1.0.0-SNAPSHOT on JVM (powered by Quarkus 2.16.6.Final) started in 1.468s. Listening on: http://0.0.0.0:8080 Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,848 INFO [io.quarkus] (main) Profile prod activated. Apr 26 09:40:14 be44b3acb1f3 java[934]: 2023-04-26 09:40:14,848 INFO [io.quarkus] (main) Installed features: [agroal, cdi, hibernate-orm, hibernate-orm-panache, hibernate-orm-rest-data-panache, jdbc-h2, narayana-jta, optaplanner, optaplanner-jackson, resteasy-reactive, resteasy-reactive-jackson, resteasy-reactive-links, smallrye-context-propagation, vertx, webjars-locator] Having the service running is certainly good, but it does not guarantee by itself that the application is available. To double-check, we can simply confirm the accessibility of the application by connecting to it: PowerShell # curl -I http://localhost:8080/ HTTP/1.1 200 OK accept-ranges: bytes content-length: 8533 cache-control: public, immutable, max-age=86400 last-modified: Wed, 26 Apr 2023 10:00:18 GMT date: Wed, 26 Apr 2023 13:00:19 GMT Writing up a Playbook The default playbook provided with the Ansible collection for Quarkus is quite handy and allows you to bootstrap your automation with a single command. However, most likely, you’ll need to write your own playbook so you can add the automation required around the deployment of your Quarkus app. Here is the content of the playbook provided with the collection that you can simply use as a base for your own: YAML --- - name: "Build and deploy a Quarkus app using Ansible" hosts: all gather_facts: false vars: quarkus_app_repo_url: 'https://github.com/quarkusio/quarkus-quickstarts.git' app_name: optaplanner-quickstart' quarkus_app_source_folder: 'optaplanner-quickstart' quarkus_path_to_folder_to_deploy: '/opt/optaplanner' pre_tasks: - name: "Build the Quarkus from {{ quarkus_app_repo_url }." ansible.builtin.include_role: name: quarkus tasks_from: build.yml tasks: - name: "Deploy Quarkus app on target." ansible.builtin.include_role: name: quarkus tasks_from: deploy.yml To run this playbook, you again use the ansible-playbook command, but providing the path to the playbook: Shell $ ansible-playbook -i inventory playbook.yml Conclusion Thanks to the Ansible collection for Quarkus, the work needed to automate the deployment of a Quarkus application is minimal. The collection takes care of most of the heavy lifting and allows its user to focus on the automation needs specific to their application and business needs.

By Romain Pelisse

Fortifying the Cloud: A Look at AWS Shield's Scalable DDoS Protection

As businesses shift operations to the cloud, robust security is crucial. DDoS attacks pose significant threats to cloud-based services, aiming to disrupt infrastructure and cause downtime and financial losses. AWS Shield from Amazon Web Services provides comprehensive DDoS protection, fortifying cloud security. This article explores how AWS Shield safeguards applications and resources from evolving DDoS threats. Understanding DDoS Attacks To understand the role of AWS Shield, it's essential to grasp how DDoS attacks work. They involve compromised devices flooding a target with excessive traffic, blocking legitimate users from accessing it. DDoS attacks can target different network layers, making mitigation easier with specialized protection. Introducing AWS Shield AWS Shield is a DDoS protection service provided by AWS. It offers two tiers of protection: AWS Shield Standard and AWS Shield Advanced. AWS Shield Standard Automatic protection: AWS Shield Standard is automatically integrated with AWS resources such as Amazon CloudFront, Amazon Route 53, and Elastic Load Balancing (ELB). It provides automatic protection against common DDoS attacks at no extra cost. Global network resilience: By leveraging the robust AWS global network, Shield Standard can distribute and absorb DDoS traffic across multiple Availability Zones, ensuring uninterrupted services. Cost-effective solution: Customers can use Shield Standard, which is included in the AWS resource fees. This provides a cost-effective security solution that requires minimal setup and management. AWS Shield Advanced Real-time attack monitoring: AWS Shield Advanced allows proactive monitoring and analysis of ongoing DDoS attacks in real-time, providing visibility into potential threats. Advanced DDoS mitigation: Shield Advanced offers enhanced protection against complex and sophisticated DDoS attacks by employing additional security features like AWS Web Application Firewall (WAF) and AWS Firewall Manager. 24/7 DDoS Response Team (DRT): Subscribers to Shield Advanced can rely on the AWS DDoS Response Team, a group of DDoS mitigation experts available 24/7, for personalized assistance during active attacks. Integration With Other AWS Services AWS CloudWatch Integration AWS Shield integrates with AWS CloudWatch to monitor and analyze DDoS protection metrics, enabling automated threat responses. AWS CloudTrail Integration Integrating with AWS CloudTrail gives users enhanced visibility into security logs and events, strengthening cloud security. Scalable Mitigation and Resilience AWS Shield scales effectively to handle large-scale DDoS attacks, distributing traffic and mitigating attacks closer to their source. This reduces latency and improves application availability. A Layered Approach to Cloud Security AWS Shield provides a foundational layer of security for cloud-based applications. To create a comprehensive security strategy, businesses can combine AWS Shield with other security services like AWS WAF, AWS Firewall Manager, and AWS Security Hub. This layered approach addresses various security concerns. Conclusion As the cloud landscape expands, safeguarding cloud-based applications and resources from DDoS attacks becomes crucial. AWS Shield provides a reliable solution to defend against DDoS threats, fortifying cloud security and ensuring uninterrupted availability of essential services. Whether utilizing AWS Shield Standard's automated protection or AWS Shield Advanced's advanced features, businesses can rely on AWS's expertise to protect their cloud infrastructure. This allows them to concentrate on innovation and growth with confidence in their cloud security.

By Raghava Dittakavi

Mastering the Trino Connection: Unleash the Power of DbVisualizer!

In today's data-driven world, organizations face the challenge of handling massive volumes of data across various systems. To extract valuable insights, powerful tools are needed. Enter Trino, an open-source distributed SQL query engine that empowers organizations to process and query large datasets from multiple sources. But to unleash Trino's full potential, you need a trusty sidekick like DbVisualizer. This superhero of database management and development tools offers a user-friendly interface and a complete platform for working with different databases. DbVisualizer acts as a centralized hub, effortlessly connecting you to Trino and other data stores like Hadoop, Cassandra, and MySQL. With DbVisualizer, exploring databases, building queries, and visualizing data becomes a breeze. Its query builder tool simplifies query construction, making it easy to manipulate data visually. Moreover, DbVisualizer's data visualization powers are truly impressive, allowing you to create stunning charts, graphs, and dashboards. By connecting DbVisualizer with Trino, you seamlessly blend data from various sources into these visualizations, revealing a world of insights. Prerequisites 1. Basic knowledge of databases and SQL. 2. Docker 3. DbVisualizer What Is Trino? Trino, formerly known as PrestoSQL, is a powerful open-source distributed SQL query engine designed for large-scale data processing and analysis. It offers a unified interface to query data from various sources, including traditional databases and distributed storage systems. With its distributed architecture, Trino scales horizontally and processes queries in parallel, enabling efficient handling of massive datasets. It supports standard SQL syntax and provides advanced functions for complex data manipulation. Trino can push down query execution to data sources, reducing data movement and improving performance. Widely adopted by organizations, Trino is valued for its flexibility, speed, and ease of use, making it an indispensable tool for data analytics and real-time insights. What Is Trino SQL? Trino SQL is a powerful language used to query data in Trino, the distributed SQL query engine. It follows the SQL standard and provides a familiar syntax for data analysis tasks. Trino SQL supports a wide range of operations, including querying, filtering, joining, aggregating, and transforming datasets. It includes advanced features such as subqueries and a rich set of functions for data manipulation. Trino SQL leverages the distributed nature of Trino for fast and scalable query execution. It supports various data sources and formats, making it versatile for heterogeneous environments. Overall, Trino SQL offers a robust and efficient solution for querying and analyzing data in Trino. Setting up Trino For this tutorial, we will be running Trino locally on a docker container. Follow these steps to install Trino on your docker container: Step 1: Pull the Trino Docker Image The Trino project provides the "trinodb/trino" Docker image, which includes the Trino server and a default configuration. Pull the image from Docker Hub using the following command: docker pull trinodb/trino This command will download the latest version of the Trino Docker image. Step 2: Run the Trino Container Create a container from the Trino image using the following command: docker run --name trino -d -p 8080:8080 trinodb/trino This command creates a container named "trino" from the "trinodb/trino" image. The container runs in the background and maps the default Trino port, 8080, from inside the container to port 8080 on your workstation. Step 3: Verify the Container To verify that the Trino container is running, use the following command: docker ps This command displays all the running containers. Look for the "trino" container and ensure that it is listed with the appropriate status and port mapping. Step 4: Wait for Trino To Start When the Trino container starts, it might take a few moments for it to become fully ready. You can check its status using the following command: docker logs trino This command displays the container logs. Look for the "health: starting" status initially, and once it becomes ready, it should display "(healthy)." Congratulations! You have successfully installed Trino on a Docker container. You can now access Trino by visiting these local hosts in your web browser and start running SQL queries against your Trino cluster. Setting up the Trino Connection in DbVisualizer Setting up the Trino connection in DbVisualizer is a straightforward process that allows you to unleash the power of Trino's distributed SQL query capabilities within the user-friendly environment of DbVisualizer. Here's how you can get started: Now that we have a running Trino database in docker, we can connect [DbVisualizer] to it by following the steps below: 1. Go to the Connection tab. Click the "Create a Connection" button to create a new connection. Creating a database connection in DbVisualizer 2. Select your server type. For this tutorial, we will be choosing Trino as the driver. Choosing the driver in DbVisualizer 3. In the Driver Connection tab, enter the following information: Database server: localhost Database Port: 8080 UseId: “user_name” Connection Details for the Trino Server in DbVisualizer 4. Click the "Connect" button to test the connection. If you haven't updated your Trino driver, you will receive a prompt to do so. Driver download Open the Driver Manager tab and update the driver to connect to your Trino database. Trino download jdbc driver in DbVisulaizer Click on “Connect” again to test your connection. If the connection is successful, you should see a message indicating that the connection was established. You can now browse the database using DbVisualizer. A Message Signifying a Successful Connection 5. Explore and Query Trino Data With the Trino connection established in DbVisualizer, you are now ready to explore and query your Trino data. Utilize DbVisualizer's intuitive interface, query builder, and visualization tools to interact with Trino and extract valuable insights from your distributed datasets. The Trino server tree Now follow along as we walk you through the CLI capabilities of Trino as well! Trino CLI Trino CLI is your go-to command-line buddy for seamless interaction with Trino. The command-line interface allows interaction with Trino, providing capabilities to execute queries, manage connections, and retrieve results directly from your terminal. With its SQL prowess, you can write queries with ease, thanks to nifty features like auto-completion and syntax highlighting. Trino CLI goes the extra mile by allowing you to fine-tune your query experience through configurable session properties and optimized performance options. And guess what? It offers a plethora of output formats to jazz up your query results! To run Trino CLI on your docker container, use the following command: docker exec -it trino trino Then, enter your Trino SQL query in the terminal and run it to execute the query on your Trino server. Executing a query in the Trino CLI But hold on! There's an exciting alternative that takes your Trino journey to the next level. Imagine stepping into a world of graphical interfaces and advanced visualization wonders. That's where tools like DbVisualizer enter the scene. By harnessing the power of a JDBC driver, you can connect with Trino in DbVisualizer and unlock a universe of interactive exploration, query building, and mind-blowing visualizations. It's like adding a touch of magic to your Trino experience. So, whether you're a command-line aficionado or prefer the captivating realm of graphical tools, Trino CLI and DbVisualizer offer you the best of both worlds. Get ready to embark on an exhilarating data exploration journey fueled by the boundless potential of Trino and the seamless connectivity of DbVisualizer. Executing Queries in DbVisualizer With Trino DbVisualizer provides a powerful interface for writing and executing SQL queries against Trino. You can leverage its user-friendly query editor to compose SQL statements efficiently. Simply expand the Trino server tree, pick any catalog from the list, and create an SQL query commander by clicking on the play icon with a plus next to it. The create sql commander button You can start writing SQL queries in the SQL commander editor. A good query example is one to count the number of nations in the nation table: select count(\*) from tpch.sf1.nation; Click on the play button above the SQL commander to execute the query. You will get the result shown in the image below: The Trino query result Now, we’ll visualize the queries in Trino with DbVisualizer. Follow along! Visualizing Trino Queries With DbVisualier By using SQL, we have the power to create a wide range of analytical queries on this table. For example, let's calculate the average length of the nation names across all regions: SELECT regionkey, AVG(LENGTH(name)) AS avg_name_length FROM tpch.sf1.nation GROUP BY regionkey; The modified query retrieves data from the nations table in Trino and calculates the average length of nation names (avg_name_length) for each region (regionkey). By grouping the results based on the regionkey column, the query provides a summary of the average name length for nations within each specific region. Running the query above will provide you with the results seen in the table below: You can use this statistic to create a visualization such as a line chart, bar chart, or area chart. To create a visualization for this table, click on the rightmost button in the result tab toolbar. The Show as Chart button Then, select the values for the x and y axis of your chart by clicking on the select button above the chart panel. Select the avg_name_length as the x-axis and the regionkey as the y-axis. Setting the chart axis Great! We have successfully created a line chart visualization of our Trino query data. The Trino Line chart By default, the visualization displays a line chart, but don't let that limit you. Get creative and explore the various customization options available to you. You can try out options like line chart, point chart, area chart, stacked area chart, bar chart, stacked bar chart, and pie chart by clicking on the chart icon above the chart panel to reveal a dropdown menu of various chart types. The Chart Type Dropdown Impressive, isn't it? DbVisualizer offers a range of customizable features. To explore these options, simply click on the tool button located at the top of the chart tab. From there, you have the freedom to fine-tune your charts according to your preferences. Once you've crafted the ideal chart, it's a breeze to export it as an image — just click on the document icon situated at the top of the chart tab. Configure chart and export chart buttons Conclusion In this tutorial, we've uncovered the power of Trino and DbVisualizer by unleashing the capabilities of distributed SQL queries for data analysis. Trino, the open-source SQL query engine, offers the muscle to handle massive data volumes across various systems. With DbVisualizer as our trusty sidekick, we effortlessly connect to Trino and other data stores. Its user-friendly interface and comprehensive tools make exploring and querying data a breeze. We've learned how to establish the Trino connection in DbVisualizer, executing SQL queries and retrieving results with ease. But the excitement doesn't stop there! DbVisualizer's visualization capabilities let us create stunning charts to bring our data to life. We can customize these visualizations to suit our needs and, with a simple click, export them as image masterpieces. By mastering the Trino connection with DbVisualizer, we can gain valuable insights and supercharge our data analysis. So, don't stop here — explore, experiment, and unlock the full potential of Trino and DbVisualizer using their documentation and blog in your data-driven journey, and until next time! FAQ (Frequently Asked Questions) 1. How do I install Trino on a Docker container? To install Trino on a Docker container, use the command docker pull trinodb/trino to download the Trino Docker image. Then, create a container from the image using docker run --name trino -d -p 8080:8080 trinodb/trino. Verify the container status with docker ps and ensure it is running. 2. How do I connect DbVisualizer to Trino? In DbVisualizer, go to the Connection tab and click "Create a Connection." Choose Trino as the driver and enter the connection details, such as localhost for the Database server and 8080 for the Database Port. Click "Connect" to establish the connection. 3. How can I execute SQL queries in DbVisualizer with Trino? To execute SQL queries in DbVisualizer with Trino, expand the Trino server tree, create an SQL commander, and write your SQL query in the editor. Click the play button to execute the query and view the results. 4. How can I visualize Trino queries using DbVisualizer? DbVisualizer allows you to visualize Trino queries by creating charts. After executing a query, click on the rightmost button in the result tab toolbar to show the chart panel. Select the desired values for the x and y axes, and customize the chart type and appearance as needed. 5. Can I export the charts created in DbVisualizer as images? Yes, you can export charts created in DbVisualizer as images. In the chart tab, click on the document icon located at the top to export the chart as an image file.

By Ochuko Onojakpor CORE

Understanding Git

What Is Git? Git is a distributed revision control system. This definition sounds complicated, so let's break it down and look at the individual parts. The definition can be broken down into two parts: Git is distributed. Git is a revision control system. In this article, we'll elaborate on each of these characteristics of Git in order to understand how Git does what it does. Revision Control System A revision control system tracks content as it changes over time, which makes it a content tracker. Git tracks changes to contents by computing their SHA1 hash. If the hash of an object that Git is tracking has changed, Git treats it as a new object. To provide persistence, Git stores this map of SHA1 as a key and object as the value in a repository on the project's directory. So, at its very core, Git is essentially a persistent map. This is illustrated in the figure below. We'll start our journey from the core and explore Git layer-by-layer as we move outwards to understand the complete picture. Persistent Map In programming languages, a map is an interface that represents a collection of key-value pairs, where each key is associated with a unique value. Git computes the hash of an object that it stores in its repository. Shell git hash-object "Flash 9000" This returns the SHA1 of the string object having the content "Flash 9000". The Git repository is instantiated with the following command. This creates a repository in a hidden folder named .git. Shell $ git init Initialized empty Git repository in D:/git/test/.git/ To store an object in its repository, we can pass '-w' flag. Shell $ echo "Flash 9000" | git hash-object --stdin -w fc75e0215a2fcaeea1b949dab29c6014a2333399 Every object in Git has its own SHA1. Git is a map where keys are SHA1 and values are the content. Persistence is provided with the flag (-w) in the repository. Notice how Git stored the object in its repository in .git folder. Shell $ ls -ltr .git/objects/fc/ total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:01 75e0215a2fcaeea1b949dab29c6014a2333399 Now that we understand the very core of Git, i.e., it is a persistent map, let's look at the next layer - content tracker, i.e., how Git tracks changes made to an object over time. Content Tracker We store content in files and directories. So, let's create a file and store some content that we want to track. We initialize a Git repository and store our content in the repository. Create a file with the content shown below. Add the file and commit it to the repository. Shell # create a file $ touch storage_insights.txt $ echo "Flash 9000" >> storage_insights.txt $ git add . $ git commit -m "First commit" [master (root-commit) b7d1ea0] First commit 1 file changed, 1 insertion(+) create mode 100644 storage_insights.txt Let's check the .git folder to find out what objects Git created to track the single file we've in our repository. Shell $ ls -ltR .git/objects/ .git/objects/: total 0 drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 b7/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 af/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:56 fc/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 info/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 pack/ .git/objects/b7: total 1 -r--r--r-- 1 ragha 197121 137 Sep 13 16:57 d1ea0ff44167b0daa2b3016d3fced984618612 .git/objects/af: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 16:57 ddf78df335c4a85c0e05ba3804fa1ab64fd4fd .git/objects/fc: total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:56 75e0215a2fcaeea1b949dab29c6014a2333399 .git/objects/info: total 0 .git/objects/pack: total 0 Git created three objects with three SHA1. Let's check the kind of objects created and their contents. Git provides utility methods for this purpose. To fetch the content of an object, Git provides the utility method. Shell git cat-file -p <SHA1> There are different types of objects stored in the Git repository, Git provides the following method to find the type of an object. Shell git cat-file -t <SHA1> The types of three objects are shown below. Shell $ git cat-file -t b7d1ea0ff44167b0daa2b3016d3fced984618612 commit $ git cat-file -t afddf78df335c4a85c0e05ba3804fa1ab64fd4fd tree $ git cat-file -t fc75e0215a2fcaeea1b949dab29c6014a2333399 blob Let's show their contents to understand better. Shell $ git cat-file -p b7d1ea0ff44167b0daa2b3016d3fced984618612 tree afddf78df335c4a85c0e05ba3804fa1ab64fd4fd author Randhir Singh <randhirkumar.singh@gmail.com> 1694604421 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694604421 +0530 First commit $ git cat-file -p afddf78df335c4a85c0e05ba3804fa1ab64fd4fd 100644 blob fc75e0215a2fcaeea1b949dab29c6014a2333399 storage_insights.txt $ git cat-file -p fc75e0215a2fcaeea1b949dab29c6014a2333399 Flash 9000 The first object is a commit that is created as a result of the git commit command. The commit object is pointing to a tree object that refers to the file that we created. The tree object points to a blob object that has the content that we put in the file. Pictorially, the Git repository at this point can be depicted as shown below. Let's modify the content and commit the updated file. Shell $ echo "Storwize" >> storage_insights.txt $ git add . $ git commit -m "Second commit" [master b228401] Second commit 1 file changed, 1 insertion(+) How many objects are there in the Git repository now? Shell $ git count-objects 6 objects, 0 kilobytes Let's check the content of our Git repository. Shell $ ls -ltR .git/objects/ .git/objects/: total 0 drwxr-xr-x 1 ragha 197121 0 Sep 13 17:21 b2/ drwxr-xr-x 1 ragha 197121 0 Sep 13 17:21 b1/ drwxr-xr-x 1 ragha 197121 0 Sep 13 17:20 b0/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 b7/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:57 af/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:56 fc/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 info/ drwxr-xr-x 1 ragha 197121 0 Sep 13 16:50 pack/ .git/objects/b2: total 1 -r--r--r-- 1 ragha 197121 167 Sep 13 17:21 28401ce532180aa8fdffaa54731d9d2085f15d .git/objects/b1: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 17:21 f1a7abd35ad7178efe94a13ccf6de2868f68ce .git/objects/b0: total 1 -r--r--r-- 1 ragha 197121 36 Sep 13 17:20 0f271ba3e94459a48af1620ec9d2050df8e8f5 .git/objects/b7: total 1 -r--r--r-- 1 ragha 197121 137 Sep 13 16:57 d1ea0ff44167b0daa2b3016d3fced984618612 .git/objects/af: total 1 -r--r--r-- 1 ragha 197121 65 Sep 13 16:57 ddf78df335c4a85c0e05ba3804fa1ab64fd4fd .git/objects/fc: total 1 -r--r--r-- 1 ragha 197121 27 Sep 13 16:56 75e0215a2fcaeea1b949dab29c6014a2333399 .git/objects/info: total 0 .git/objects/pack: total 0 Git created three new objects. Shell $ git cat-file -t b228401ce532180aa8fdffaa54731d9d2085f15d commit $ git cat-file -t b1f1a7abd35ad7178efe94a13ccf6de2868f68ce tree $ git cat-file -t b00f271ba3e94459a48af1620ec9d2050df8e8f5 blob Let's check their contents. Shell $ git cat-file -p b228401ce532180aa8fdffaa54731d9d2085f15d tree b1f1a7abd35ad7178efe94a13ccf6de2868f68ce parent b7d1ea0ff44167b0daa2b3016d3fced984618612 author Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 Second commit $ git cat-file -p b1f1a7abd35ad7178efe94a13ccf6de2868f68ce 100644 blob b00f271ba3e94459a48af1620ec9d2050df8e8f5 storage_insights.txt $ git cat-file -p b00f271ba3e94459a48af1620ec9d2050df8e8f5 Flash 9000 Storwize The new commit object now has a parent, which is the previous commit object. The new commit refers to the new tree object, which is the updated file, and the new tree refers to the new blob, which is the updated content. Pictorially, the situation at this point is shown below. In a nutshell, the Git repository stores these objects, and the objects are linked with each other via pointers. The objects are immutable; each time they are modified, a new object is created, and references are updated. This is how Git tracks the content as it changes over time. Now that we understand how Git tracks the content let's move on to the next layer and understand what makes Git a revision control system. Revision Control System Building upon the persistent map and the content tracker, Git is a revision control system that allows developers and teams to: Track changes made to files. Maintain a history of revisions, making it possible to revert to the previous version. Manage code branches and merge changes from different contributors. In order to achieve these, Git provides some artifacts that make it a revision control system. We'll explain these one by one in this section. Branches Branches allow developers to experiment with different changes to their code without affecting the main codebase. This can help them to avoid introducing bugs into the main codebase. Branches can also be used to collaborate with other developers on the same project. Just as Git stores various objects in its repository, branches are also stored there. Let's take a look. Shell $ cat .git/refs/heads/master b228401ce532180aa8fdffaa54731d9d2085f15d $ git cat-file -p b228401ce532180aa8fdffaa54731d9d2085f15d tree b1f1a7abd35ad7178efe94a13ccf6de2868f68ce parent b7d1ea0ff44167b0daa2b3016d3fced984618612 author Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694605866 +0530 Second commit The branch is pointing to the second commit. A branch is just a reference to a commit. The master branch was created when we initialized the Git repository. To create a new branch, Git provides a method. Shell git branch <branchname> Notice another reference named HEAD. HEAD is a reference to a branch. HEAD changes as we switch branches. This is explained in the diagram below. To change to a different branch. Shell $ git switch branch Switched to branch 'branch' HEAD will now move to the branch branch. To check where the current HEAD is pointing to. Shell $ cat .git/HEAD ref: refs/heads/branch As we switch branches, files, and folders in the working area change. Git doesn't track them unless they are committed (i.e., available in the Git repository). Merge Next, let's look at the concept of merging. A merge in Git is the process of combining two or more branches into a single branch. This is typically done when you have finished working on a feature branch and want to integrate your changes into the main codebase. To merge the changes from <branch> into the current branch. Shell git merge <branch> Let's see what happens if we merge a branch. We will add one commit to the branch branch and another commit to the branch master. When done, we'll merge the branch branch into the master branch. Shell $ git status On branch branch nothing to commit, working tree clean $ echo "DS8000" >> storage_insights.txt $ git add . $ git commit -m "Added DS8000" [branch aac6280] Added DS8000 1 file changed, 1 insertion(+) Now, switch to the master branch and add some content to the file. Shell $ git switch master Switched to branch 'master' $ echo "XIV" >> storage_insights.txt $ git add . $ git commit -m "Added XIV" [master d8a6319] Added XIV 1 file changed, 1 insertion(+) Merge the branch branch into the master branch. Since the same line of the file is modified in both branches, this will give rise to a conflict. Shell $ git merge branch Auto-merging storage_insights.txt CONFLICT (content): Merge conflict in storage_insights.txt Automatic merge failed; fix conflicts and then commit the result. Resolve the conflict in the file and commit it. This will create another kind of Git object called merge commit. Shell $ git log --graph --decorate --oneline * fcc8bae (HEAD -> master) Resolved merge conflict |\ | * aac6280 (branch) Added DS8000 * | d8a6319 Added XIV |/ * b228401 Second commit * b7d1ea0 First commit Let's examine the merge commit. It has two parents; one is the latest commit from the branch branch, and the other parent is the latest commit from the branch master. Shell $ git cat-file -p fcc8bae tree 6031b0e96170c70d7ae4ad264840168c3fc0b1fa parent d8a63194e4054fcd5c4289b5c0488514691c6beb parent aac6280973f401bfe7a7d5a6904794b9133bac6c author Randhir Singh <randhirkumar.singh@gmail.com> 1694609622 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694609622 +0530 Resolved merge conflict Pictorially, the Git repository at this point in time looks like this. Git creates a merge commit only if it is required. A fast-forward merge is a type of merge in Git that combines two branches without creating a new merge commit. This is only possible if the two branches have a linear history, meaning that the target branch is a direct descendant of the source branch. Losing HEAD Normally, the HEAD points to the branch that points to the latest commit. However, it is possible for the HEAD to not point to a branch. In that case, HEAD is said to be detached. A detached HEAD is a state where the HEAD pointer is not pointing to a branch but instead pointing to a specific commit. This can happen if we check out a commit instead of a branch. Shell $ git checkout b228401 Note: switching to 'b228401'. You are in 'detached HEAD' state. When you are in a detached HEAD state, Git will not be able to automatically track your changes. To get out of a detached HEAD state, you can do one of the following: Create a new branch and checkout to it. Merge the changes in the detached HEAD state into your current branch. Checkout to a different branch. Git Object Model This is a good time to review the Git object model, as we've covered all the main Git objects. A Git repository is a bunch of objects linked to each other in a graph. Branch references to a commit, and HEAD is a reference to a branch. Objects are immutable, meaning that they cannot be changed once they are created. This makes them very efficient for storing data, as Git can simply compare the contents of two objects to determine if they are different. There are four main types of objects in the Git object model: Blobs: Blobs store the contents of files. Trees: Trees store the contents of directories. Commits: Commits store metadata about changes to files, such as the author, date, and commit message. Tags: Tags are used to mark specific commits as being important. Objects are stored in the .git directory of your repository. When you make a change to a file and commit it, Git creates a new blob object to store the contents of the changed file and a new commit object to store metadata about the change. Git maintains objects in the repository by following three rules: The current branch tracks new commits. When you move to another commit, Git updates your working directory. Unreachable objects are garbage collected. Rebase Rebase is a process of replaying a sequence of commits onto a new base commit. This means that Git creates new commits, one for each commit in the original sequence, and applies them to the new base commit. To rebase a branch, you can use the git rebase command. For example, to rebase the branch branch onto the master branch, you would run the following command: Shell git rebase master branch Let's look at the history of the branch branch. Contrast this to the earlier scenario when we merged the branch branch into the master branch. Shell $ git log --graph --decorate --oneline * d1f27ee (HEAD -> branch) Added DS8000 * d8a6319 (master) Added XIV * b228401 Second commit * b7d1ea0 First commit Pictorially, the following diagram explains what happens when we rebased. The choice between merge and rebase is to be made based on your preferences. Remember Merge preserves history Rebase refactor history Tags A tag is like a branch that doesn't move. To create a tag. Shell git tag release An annotated tag has a message that can be displayed, while a tag without annotation is just a named pointer to a commit. Where is the tag stored? In the Git repository, like other objects. It points to the commit when the tag was created. Shell $ cat .git/refs/tags/release d8a63194e4054fcd5c4289b5c0488514691c6beb $ git cat-file -p d8a63194e4054fcd5c4289b5c0488514691c6beb tree e4e282cde76ffa017c2d837a6d39bd0259715e2a parent b228401ce532180aa8fdffaa54731d9d2085f15d author Randhir Singh <randhirkumar.singh@gmail.com> 1694609414 +0530 committer Randhir Singh <randhirkumar.singh@gmail.com> 1694609414 +0530 Added XIV We've covered the major concepts behind a revision control system and how those concepts are used to achieve its objectives. This completes the part that explains how Git serves as a revision control system. Next, let us discuss the "distributed" nature of Git. Git Is a Distributed Version Control Git, being a distributed version control, every developer has a complete copy of the repository. This is in contrast to centralized version control systems, where there is a single central repository that all developers must access. When you clone a Git repository, you create a local copy of the entire project, including all files and the entire commit history. This local repository contains everything you need to work on the project independently, without needing a constant internet connection or access to a central server. A remote repository can be cloned using. Shell git clone <remote> In our case, we have a Git repository created locally. We'll create a remote repository on GitHub and set the remote. The configured remote repository is stored in .git/config Shell $ git remote add origin https://github.com/Randhir123/test.git $ git push -u origin master Enumerating objects: 9, done. Counting objects: 100% (9/9), done. Delta compression using up to 8 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (9/9), 744 bytes | 372.00 KiB/s, done. Total 9 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/Randhir123/test.git * [new branch] master -> master branch 'master' set up to track 'origin/master'. $ cat .git/config [core] repositoryformatversion = 0 filemode = false bare = false logallrefupdates = true symlinks = false ignorecase = true [remote "origin"] url = https://github.com/Randhir123/test.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "master"] remote = origin merge = refs/heads/master Like a local branch, a remote branch is just a reference to a commit. Shell $ git show-ref master d8a63194e4054fcd5c4289b5c0488514691c6beb refs/heads/master d8a63194e4054fcd5c4289b5c0488514691c6beb refs/remotes/origin/master Pushing Commits on the local branches can be pushed to remote branches, as shown below. Shell $ echo "SVC" >> storage_insights.txt $ git add . $ git commit -m "Added SVC" [master 7552e51] Added SVC 1 file changed, 1 insertion(+) $ git push Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Writing objects: 100% (3/3), 291 bytes | 291.00 KiB/s, done. Total 3 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/Randhir123/test.git d8a6319..7552e51 master -> master Pulling Commits on the remote branches can be fetched using. Shell git fetch And merged to a branch using. Shell git merge origin/master These two steps can be done in a single step using. Shell git pull This will fetch remote commits and merge them into the local branch with a single command. We can configure multiple remotes to our repository. All the remotes can displayed using the command. Shell $ git remote -v origin https://github.com/Randhir123/test.git (fetch) origin https://github.com/Randhir123/test.git (push) Pull Request A pull request (PR), often used in the context of Git and code collaboration platforms like GitHub, GitLab, and Bitbucket, is used for proposing and discussing changes to a codebase. Pull requests are typically created to merge or pull our changes to the upstream. Summary In this article, we described the layers that Git is made of. We started our journey of understanding Git from the core, which is the persistent map. Next, we looked at how Git builds upon the persistent map to track the content. The content tracker layer forms the basis of the revision control system. Finally, we looked at the distributed nature of Git, which makes it such a powerful revision control system.

By Randhir Singh

Tools

DZone's Featured Tools Resources

Top Tools Experts

The Latest Tools Topics