Software Design and Architecture Resources

DZone's Featured Software Design and Architecture Resources

Containers

The proliferation of containers in recent years has increased the speed, portability, and scalability of software infrastructure and deployments across all kinds of application architectures and cloud-native environments. Now, with more and more organizations migrated to the cloud, what's next? The subsequent need to efficiently manage and monitor containerized environments remains a crucial task for teams. With organizations looking to better leverage their containers — and some still working to migrate out of their own monolithic environments — the path to containerization and architectural modernization remains a perpetual climb. In DZone's 2023 Containers Trend Report, we will explore the current state of containers, key trends and advancements in global containerization strategies, and constructive content for modernizing your software architecture. This will be examined through DZone-led research, expert community articles, and other helpful resources for designing and building containerized applications.

Scaling Up With Kubernetes

By Saurabh Dashora CORE

This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report Cloud-native architecture is a transformative approach to designing and managing applications. This type of architecture embraces the concepts of modularity, scalability, and rapid deployment, making it highly suitable for modern software development. Though the cloud-native ecosystem is vast, Kubernetes stands out as its beating heart. It serves as a container orchestration platform that helps with automatic deployments and the scaling and management of microservices. Some of these features are crucial for building true cloud-native applications. In this article, we explore the world of containers and microservices in Kubernetes-based systems and how these technologies come together to enable developers in building, deploying, and managing cloud-native applications at scale. The Role of Containers and Microservices in Cloud-Native Environments Containers and microservices play pivotal roles in making the principles of cloud-native architecture a reality. Figure 1: A typical relationship between containers and microservices Here are a few ways in which containers and microservices turn cloud-native architectures into a reality: Containers encapsulate applications and their dependencies. This encourages the principle of modularity and results in rapid development, testing, and deployment of application components. Containers also share the host OS, resulting in reduced overhead and a more efficient use of resources. Since containers provide isolation for applications, they are ideal for deploying microservices. Microservices help in breaking down large monolithic applications into smaller, manageable services. With microservices and containers, we can scale individual components separately. This improves the overall fault tolerance and resilience of the application as a whole. Despite their usefulness, containers and microservices also come with their own set of challenges: Managing many containers and microservices can become overly complex and create a strain on operational resources. Monitoring and debugging numerous microservices can be daunting in the absence of a proper monitoring solution. Networking and communication between multiple services running on containers is challenging. It is imperative to ensure a secure and reliable network between the various containers. How Does Kubernetes Make Cloud Native Possible? As per a survey by CNCF, more and more customers are leveraging Kubernetes as the core technology for building cloud-native solutions. Kubernetes provides several key features that utilize the core principles of cloud-native architecture: automatic scaling, self-healing, service discovery, and security. Figure 2: Kubernetes managing multiple containers within the cluster Automatic Scaling A standout feature of Kubernetes is its ability to automatically scale applications based on demand. This feature fits very well with the cloud-native goals of elasticity and scalability. As a user, we can define scaling policies for our applications in Kubernetes. Then, Kubernetes adjusts the number of containers and Pods to match any workload fluctuations that may arise over time, thereby ensuring effective resource utilization and cost savings. Self-Healing Resilience and fault tolerance are key properties of a cloud-native setup. Kubernetes excels in this area by continuously monitoring the health of containers and Pods. In case of any Pod failures, Kubernetes takes remedial actions to ensure the desired state is maintained. It means that Kubernetes can automatically restart containers, reschedule them to healthy nodes, and even replace failed nodes when needed. Service Discovery Service discovery is an essential feature of a microservices-based cloud-native environment. Kubernetes offers a built-in service discovery mechanism. Using this mechanism, we can create services and assign labels to them, making it easier for other components to locate and communicate with them. This simplifies the complex task of managing communication between microservices running on containers. Security Security is paramount in cloud-native systems and Kubernetes provides robust mechanisms to ensure the same. Kubernetes allows for fine-grained access control through role-based access control (RBAC). This certifies that only authorized users can access the cluster. In fact, Kubernetes also supports the integration of security scanning and monitoring tools to detect vulnerabilities at an early stage. Advantages of Cloud-Native Architecture Cloud-native architecture is extremely important for modern organizations due to the evolving demands of software development. In this era of digital transformation, cloud-native architecture acts as a critical enabler by addressing the key requirements of modern software development. The first major advantage is high availability. Today's world operates 24/7, and it is essential for cloud-native systems to be highly available by distributing components across multiple servers or regions in order to minimize downtime and ensure uninterrupted service delivery. The second advantage is scalability to support fluctuating workloads based on user demand. Cloud-native applications deployed on Kubernetes are inherently elastic, thereby allowing organizations to scale resources up or down dynamically. Lastly, low latency is a must-have feature for delivering responsive user experiences. Otherwise, there can be a tremendous loss of revenue. Cloud-native design principles using microservices and containers deployed on Kubernetes enable the efficient use of resources to reduce latency. Architecture Trends in Cloud Native and Kubernetes Cloud-native architecture with Kubernetes is an ever-evolving area, and several key trends are shaping the way we build and deploy software. Let's review a few important trends to watch out for. The use of Kubernetes operators is gaining prominence for stateful applications. Operators extend the capabilities of Kubernetes by automating complex application-specific tasks, effectively turning Kubernetes into an application platform. These operators are great for codifying operational knowledge, creating the path to automated deployment, scaling, and management of stateful applications such as databases. In other words, Kubernetes operators simplify the process of running applications on Kubernetes to a great extent. Another significant trend is the rise of serverless computing on Kubernetes due to the growth of platforms like Knative. Over the years, Knative has become one of the most popular ways to build serverless applications on Kubernetes. With this approach, organizations can run event-driven and serverless workloads alongside containerized applications. This is great for optimizing resource utilization and cost efficiency. Knative's auto-scaling capabilities make it a powerful addition to Kubernetes. Lastly, GitOps and Infrastructure as Code (IaC) have emerged as foundational practices for provisioning and managing cloud-native systems on Kubernetes. GitOps leverages version control and declarative configurations to automate infrastructure deployment and updates. IaC extends this approach by treating infrastructure as code. Best Practices for Building Kubernetes Cloud-Native Architecture When building a Kubernetes-based cloud-native system, it's great to follow some best practices: Observability is a key practice that must be followed. Implementing comprehensive monitoring, logging, and tracing solutions gives us real-time visibility into our cluster's performance and the applications running on it. This data is essential for troubleshooting, optimizing resource utilization, and ensuring high availability. Resource management is another critical practice that should be treated with importance. Setting resource limits for containers helps prevent resource contention and ensures a stable performance for all the applications deployed on a Kubernetes cluster. Failure to manage the resource properly can lead to downtime and cascading issues. Configuring proper security policies is equally vital as a best practice. Kubernetes offers robust security features like role-based access control (RBAC) and Pod Security Admission that should be tailored to your organization's needs. Implementing these policies helps protect against unauthorized access and potential vulnerabilities. Integrating a CI/CD pipeline into your Kubernetes cluster streamlines the development and deployment process. This promotes automation and consistency in deployments along with the ability to support rapid application updates. Conclusion This article has highlighted the significant role of Kubernetes in shaping modern cloud-native architecture. We've explored key elements such as observability, resource management, security policies, and CI/CD integration as essential building blocks for success in building a cloud-native system. With its vast array of features, Kubernetes acts as the catalyst, providing the orchestration and automation needed to meet the demands of dynamic, scalable, and resilient cloud-native applications. As readers, it's crucial to recognize Kubernetes as the linchpin in achieving these objectives. Furthermore, the takeaway is to remain curious about exploring emerging trends within this space. The cloud-native landscape continues to evolve rapidly, and staying informed and adaptable will be key to harnessing the full potential of Kubernetes. Additional Reading: CNCF Annual Survey 2021 CNCF Blog "Why Google Donated Knative to the CNCF" by Scott Carey Getting Started With Kubernetes Refcard by Alan Hohn "The Beginner's Guide to the CNCF Landscape" by Ayrat Khayretdinov This is an article from DZone's 2023 Kubernetes in the Enterprise Trend Report.For more: Read the Report More

Refcard #385

Observability Maturity Model

By Lodewijk Bogaards

Kubernetes Today

By Gal Cohen

The State of Kubernetes: Self-Managed vs. Managed Platforms

By Yitaek Hwang CORE

Kafka Event Streaming AI and Automation

Apache Kafka has emerged as a clear leader in corporate architecture for moving from data at rest (DB transactions) to event streaming. There are many presentations that explain how Kafka works and how to scale this technology stack (either on-premise or cloud). Building a microservice using ChatGPT to consume messages and enrich, transform, and persist is the next phase of this project. In this example, we will be consuming input from an IoT device (RaspberryPi) which sends a JSON temperature reading every few seconds. Consume a Message As each Kafka event message is produced (and logged), a Kafka microservice consumer is ready to handle each message. I asked ChatGPT to generate some Python code, and it gave me the basics to poll and read from the named "topic." What I got was a pretty good start to consume a topic, key, and JSON payload. The ChatGPT created code to persist this to a database using SQLAlchemy. I then wanted to transform the JSON payload and use API Logic Server (ALS - an open source project on GitHub) rules to unwarp the JSON, validate, calculate, and produce a new set of message payloads based on the source temperature outside a given range. Shell ChatGPT: “design a Python Event Streaming Kafka Consumer interface” Note: ChatGPT selected Confluent Kafka libraries (and using their Docker Kafka container)- you can modify your code to use other Python Kafka libraries. SQLAlchemy Model Using API Logic Server (ALS: a Python open-source platform), we connect to a MySQL database. ALS will read the tables and create an SQLAlchemy ORM model, a react-admin user interface, safrs-JSON Open API (Swagger), and a running REST web service for each ORM endpoint. The new Temperature table will hold the timestamp, the IoT device ID, and the temperature reading. Here we use the ALS command line utility to create the ORM model: Shell ApiLogicServer create --project_name=iot --db_url=mysql+pymysql://root:password@127.0.0.1:3308/iot The API Logic Server generated class used to hold our Temperature values. Python class Temperature(SAFRSBase, Base): __tablename__ = 'Temperature' _s_collection_name = 'Temperature' # type: ignore __bind_key__ = 'None' Id = Column(Integer, primary_key=True) DeviceId = Column(Integer, nullable=False) TempReading = Column(Integer, nullable=False) CreateDT = Column(TIMESTAMP, server_default=text("CURRENT_TIMESTAMP"), nullable=False) KafkaMessageSent = Column(Booelan, default=text("False")) Changes So instead of saving the Kafka JSON consumer message again in a SQL database (and firing rules to do the work), we unwrap the JSON payload (util.row_to_entity) and insert it into the Temperature table instead of saving the JSON payload. We let the declarative rules handle each temperature reading. Python entity = models.Temperature() util.row_to_entity(message_data, entity) session.add(entity) When the consumer receives the message, it will add it to the session which will trigger the commit_event rule (below). Declarative Logic: Produce a Message Using API Logic Server (an automation framework built using SQLAlchemy, Flask, and LogicBank spreadsheet-like rules engine: formula, sum, count, copy, constraint, event, etc), we add a declarative commit_event rule on the ORM entity Temperature. As each message is persisted to the Temperature table, the commit_event rule is called. If the temperature reading exceeds the MAX_TEMP or less than MIN_TEMP, we will send a Kafka message on the topic “TempRangeAlert”. We also add a constraint to make sure we receive data within a normal range (32 -132). We will let another event consumer handle the alert message. Python from confluent_kafka import Producer conf = {'bootstrap.servers': 'localhostd:9092'} producer = Producer(conf) MAX_TEMP = arg.MAX_TEMP or 102 MIN_TEMP = arg.MIN_TTEMP or 78 def produce_message( row: models.KafkaMessage, old_row: models.KafkaMessage, logic_row: LogicRow): if logic_row.isInserted() and row.TempReading > MAX_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F exceeds {MAX_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True if logic_row.isInserted() and row.TempReading < MIN_TEMP: produce(topic="TempRangeAlert", key=row.Id, value=f"The temperature {row.TempReading}F less than {MIN_TEMP}F on Device {row.DeviceId}") row.KafkaMessageSent = True Rules.constraint(models.Temperature, as_expression= lambda row: row.TempReading < 32 or row.TempReading > 132, error_message= "Temperature {row.TempReading} is out of range" Rules.commit_event(models.Temperature, calling=produce_message) Only produce an alert message if the temperature reading is greater than MAX_TEMP or less than MIN_TEMP. Constraint will check the temperature range before calling the commit event (note that rules are always unordered and can be introduced as specifications change). TDD Behave Testing Using TDD (Test Driven Development), we can write a Behave test to insert records directly into the Temperature table and then check the return value KafkaMessageSent. Behave begins with a Feature/Scenario (.feature file). For each scenario, we write a corresponding Python class using Behave decorators. Feature Definition Plain Text Feature: TDD Temperature Example Scenario: Temperature Processing Given A Kafka Message Normal (Temperature) When Transactions normal temperature is submitted Then Check KafkaMessageSent Flag is False Scenario: Temperature Processing Given A Kafka Message Abnormal (Temperature) When Transactions abnormal temperature is submitted Then Check KafkaMessageSent Flag is True TDD Python Class Python from behave import * import safrs db = safrs.DB session = db.session def insertTemperature(temp:int) -> bool: entity = model.Temperature() entity.TempReading = temp entity.DeviceId = 'local_behave_test' session.add(entity) return entity.KafkaMessageSent @given('A Kafka Message Normal (Temperature)') def step_impl(context): context.temp = 76 assert True @when('Transactions normal temperature is submitted') def step_impl(context): context.response_text = insertTemperature(context.temp) @then('Check KafkaMessageSent Flag is False') def step_impl(context): assert context.response_text == False Summary Using ChatGPT to generate the Kafka message code for both the Consumer and Producer seems like a good starting point. Install Confluent Docker for Kafka. Using API Logic Server for the declarative logic rules allows us to add formulas, constraints, and events to the normal flow of transactions into our SQL database and produce (and transform) new Kafka messages is a great combination. ChatGPT and declarative logic is the next level of "paired programming."

By Tyler Band

Composite Container Patterns in K8S From a Developer's Perspective

Building complex container-based architectures is not very different from programming in terms of applying design best practices and principles. The goal of this article is to present three popular extensibility architectural patterns from a developer's perspective using well-known programming principles. Let's start with the Single Responsibility Principle. According to R. Martin, "A class should have only one reason to change." But classes are abstractions used to simplify real-world problems and represent software components. Hence, a component should have only one reason to change over time. Software services and microservices in particular are also components (runtime components) and should have only one reason to change. Microservices are supposed to be a single deployable unit, meaning they are deployed independently of other components and can have as many instances as needed. But is that always true? Are microservices always deployed as a single unit? In Kubernetes, the embodiment of a microservice is a Pod. A Pod is defined as a group of containers that share resources like file systems, kernel namespaces, and an IP address. The Pod is the atomic unit of scheduling in a Kubernetes cluster and each Pod is meant to run a single instance of a given application. According to the documentation, "Pods are designed to support multiple cooperating processes (as containers) that form a cohesive unit of service. The containers in a Pod are automatically co-located and co-scheduled on the same physical or virtual machine in the cluster. Scaling an application horizontally means replicating Pods. According to the Kubernetes documentation, Pods can be configured using two strategies: Pods that run a single container: The "one-container-per-Pod" model is the most common Kubernetes use case; the Pod is a wrapper around a single container and Kubernetes manages Pods rather than managing the containers directly. Pods that run multiple containers working together: A Pod can encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers form a single cohesive unit of service—for example, one container serving data stored in a shared volume to the public, while a separate sidecar container refreshes or updates those files. The Pod wraps these containers, storage resources, and an ephemeral network identity together as a single unit." The answer is: NO! Microservices are NOT always deployed as a single unit! Next to some popular architectural patterns for the cloud like scalability patterns, deployment and reliability patterns are extensibility architectural patterns. We will have a closer look at the three most popular extensibility patterns for cloud architectures: Sidecar pattern Ambassador pattern Adapter pattern Sidecar Pattern Problem Each deployable service/application has its own "reason to change," or responsibility. However, in addition to its core functionality it needs to do other things called in the software developer terminology "cross-cutting concerns." One example is the collection of performance metrics that need to be sent to a monitoring service. Another one is logging events and sending them to a distributed logging service. I called them cross-cutting concerns, as they do not directly relate to business logic and are needed by multiple services, they basically represent reusable functionality that needs to be part of each deployed unit. Solution The solution to that problem is called the sidecar pattern and imposes the creation of an additional container called a sidecar container. Sidecar containers are an extension of the main container following the Open-Closed design principle (opened for extension, closed for modification). They are tightly coupled with the "main" container in terms of deployment as they are deployed as part of the same Pod but are still easy to replace and do not break the single responsibility of the extended container. Furthermore, the achieved modularity allows for isolated testing of business-related functionality and additional helper services like event logging or monitoring. The communication of the two containers is fast and reliable and they share access to the same resources enabling the helper component to provide reusable infrastructure-related services. In addition, it is applicable to many types of services solving the issue with heterogeneity in terms of different technologies used. The upgrade of the sidecar components is also straightforward as it usually means the upgrade of a Docker container version and redeploying using for example the no-down-time Kubernetes strategies. Ambassador Containers Problem Deployed services do not function in isolation. They usually communicate over the network to other services even outside the application or software platform controlled by a single organization. Integrations between components in general imply integration with external APIs and also dealing with failures or unavailability of external systems. A common practice for external systems integration is to define the so-called API Facade, an internal API that hides the complexity of external system APIs. The role of the API Facades is to isolate the external dependencies providing an implementation of the internal API definition and taking care of security and routing if needed. In addition, failures and unavailability of external systems are usually handled using some common patterns like the Retry Pattern, Circuit Breaker Pattern, and sometimes backed by Local Caching. All these technicalities would complicate the main service and appear to be candidates for a helper container. Solution The solution to that problem is called Ambassador Pattern and implies the creation of an additional container called an Ambassador container. Ambassador containers proxy a local connection to the world, they are basically a type of Sidecar container. This composition of containers is powerful, not just because of the separation of concerns and the fact that different teams can easily own the components but it also allows for an easy mocking of external services for local development environments. Adapter Containers Problem There are still many monolith systems planned for migration to more lightweight architectures. Migrations, though, can not happen in one pass, and it is also risky to wait for the rewriting of a whole system for years while also supporting the addition of new features in both versions of the system. Migrations should happen in small pieces publishing separate services and integrating them one by one. That process repeats until the legacy monolith system is gone. So we have a new part of the system supporting new APIs and an old part that still supports old APIs. For example, we might have newly implemented REST services and still have some old SOAP-based services. We need something that takes care of exposing the old functionality as if all the services were migrated and can be integrated by the clients' systems. Solution The solution to that problem is called Adapter or Anti-Corruption pattern. The Adapter container takes care of translating from one communication protocol to another and from one data model to another while hiding the actual service from the external world. Furthermore, the Adapter container can provide two-way communication. If the legacy system needs to communicate with the new services it could also be the adapting component for that communication serving as a kind of an Ambassador container until the migration is finalized. In this article, we saw how container composition provides an extensibility mechanism without an actual change of the main application container providing stability and reusability by allowing the composite pod to be treated as any other simple pod exposing a single and simple service in a microservice architecture. One would ask why not use a library and share it across many containers. Well, that is also a solution but then we are facing the shared responsibility problem of introducing coupling between all the services using it. In addition, heterogeneous services would require rewriting the libraries using all the supported languages. That also breaks the Single Responsibility Principle, which we would in any case like to keep.

By Daniela Kolarova CORE

Unlocking Performance: Exploring Java 21 Virtual Threads [Video]

In this Java 21 tutorial, we dive into virtual threads, a game-changing feature for developers. Virtual threads are a lightweight and efficient alternative to traditional platform threads, designed to simplify concurrent programming and enhance the performance of Java applications. In this article, we’ll explore the ins and outs of virtual threads, their benefits, compatibility, and the migration path to help you leverage this powerful Java 21 feature. Introducing Virtual Threads Virtual threads represent a significant evolution in the Java platform’s threading model. They are designed to address the challenges of writing, maintaining, and optimizing high-throughput concurrent applications. It’s essential to differentiate virtual threads from traditional platform threads to understand them. In traditional Java, every instance of java.lang.Thread is a platform thread. A platform thread runs Java code on an underlying OS thread and occupies that OS thread for the duration of its execution. It means that the number of platform threads is limited to the number of available OS threads, leading to potential resource constraints and suboptimal performance in highly concurrent applications. On the other hand, a virtual thread is also an instance of java.lang.Thread, but it operates differently. Virtual threads run Java code on an underlying OS thread without capturing the OS thread for its entire lifecycle. This crucial difference means multiple virtual threads can share the same OS thread, offering a highly efficient way to utilize system resources. Unlike platform threads, virtual threads do not monopolize precious OS threads, which can lead to a significantly higher number of virtual threads than the number of available OS threads. The Roots of Virtual Threads Virtual threads draw inspiration from user-mode threads, successfully employed in other multithreaded languages such as Go (with goroutines) and Erlang (with processes). In the early days of Java, user-mode threads were implemented as “green threads” due to the immaturity and limited support for OS threads. These green threads were eventually replaced by platform threads, essentially wrappers for OS threads, operating under a 1:1 scheduling model. Virtual threads take a more sophisticated approach, using an M:N scheduling model. In this model, many virtual threads (M) are scheduled to run on fewer OS threads (N). This M:N scheduling approach allows Java applications to achieve a high concurrency level without the resource constraints typically associated with platform threads. Leveraging Virtual Threads In Java 21, developers can easily harness the power of virtual threads. A new thread builder is introduced to create virtual and platform threads, providing flexibility and control over the threading model. To create a virtual thread, you can use the following code snippet: Java Thread.Builder builder = Thread.ofVirtual().name("Virtual Thread"); Runnable task = () -> System.out.println("Hello World"); Thread thread = builder.start(task); System.out.println(thread.getName()); thread.join(); It’s important to note that virtual threads are significantly cheaper in terms of resource usage when compared to platform threads. You can create multiple virtual threads, allowing you to exploit the advantages of this new threading model fully: Java Thread.Builder builder = Thread.ofVirtual().name("Virtual Thread", 0); Runnable task = () -> System.println("Hello World: " + Thread.currentThread().threadId()); Thread thread1 = builder.start(task); Thread thread2 = builder.start(task); thread1.join(); thread2.join(); Virtual threads can also be effectively utilized with the ExecutorService, as demonstrated in the code below: Java try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) { Future<String> future = executor.submit(() -> "Hello World"); System.out.println(future.get()); System.println("The end!"); } The Virtual vs. Platform Thread Trade-Off It’s crucial to understand that platform threads are not deprecated in Java 21, and virtual threads are not a one-size-fits-all solution. Each type of thread has its own set of trade-offs, and the choice between them should be made based on your application’s specific requirements. Virtual threads: Virtual threads are excellent for high-throughput concurrent tasks, especially when managing many lightweight threads without OS thread limitations. They are well-suited for I/O-bound operations, event-driven tasks, and workloads with many short-lived threads. Platform threads: Platform threads are still valuable for applications where fine-grained control over thread interactions is essential. They are ideal for CPU-bound operations, real-time applications, and scenarios that require precise thread management. In conclusion, Java 21’s virtual threads are a groundbreaking addition to the Java platform, offering developers a more efficient and scalable way to handle concurrency. By understanding the differences and trade-offs between virtual and platform threads, you can make informed decisions on when and how to leverage these powerful features to unlock the full potential of your Java applications. Video References Source JPE

By Otavio Santana CORE

A Technical Deep Dive on Meltdown and Does It Work?

Meltdown has definitely taken the internet by storm. The attack seems quite simple and elegant, yet the whitepaper leaves out critical details on the specific vulnerability. It relies mostly on a combination of cache timing side-channels and speculative execution that accesses globally mapped kernel pages. This deep dive assumes some familiarity with CPU architecture and OS kernel behavior. Read the background section first for a primer on paging and memory protection. Simplified Version of the Attack Speculative memory reads to kernel mapped (supervisor) pages and then performing a calculation on value. Conditionally issuing a load to some other non-cached location from memory based on the result of the calculation. While the 2nd load will be nuked from the pipeline when the faulting exception retires, it already issued a load request out to L2$ and beyond ensuring the outstanding memory request still brings the line into the cache hierarchy like a prefetch. Finally, a separate process can issue loads to those same memory locations and measure the time for those loads. A cache hit will be much quicker than a cache miss which can be used to represent binary 1s (i.e., hits) and binary 0s (i.e., misses). Parts 1 and 2 have to do with speculative execution of instructions, while parts 3 and 4 enable the microarchitecture state (i.e., in cache or not) to be committed to an architectural state. Is the Attack Believable? What is not specified in the Meltdown whitepaper is what specific x86 instruction sequence or CPU state can enable the memory access to be speculatively executed AND allow the vector or integer unit to consume that value. or L2$. In modern Intel CPUs, when a fault happens such as a page fault, the pipeline is not squashed/nuked until the retirement of the offending instruction. However, memory permission checks for page protection, segmentation limits, and canonical checks are done in what is called the address generation (AGU) stage and TLB lookup stage before the load even look up in the L1D$ or go out to memory. More on this below. Performing Memory Permission Checks Intel CPUs implement physically tagged L1D$ and L1I$ which requires translating the linear (virtual) address to a physical address before the L1D$ can determine if it hits or misses in the cache via a tag match. This means the CPU will attempt to find the translation in the (Translation Lookaside Buffer) TLB cache. The TLB caches these translations along with the page table or page directory permissions (privileges required to access a page are also stored along with the physical address translation in the page tables). A TLB entry may contain the following: Valid Physical Address minus page offset. Read/Write User/Supervisor Accessed Dirty Memtype Thus, even a speculative load already knows the permissions required to access the page is compared against the Current Privilege Level (CPL) and required op privilege and thus can be blocked from any arithmetic unit from ever consuming the speculative load. Such permission checks include: Segment limit checks Write faults User/Supervisor faults Page not present faults This is what many x86 CPUs in fact are designed to do. The load would be rejected until the fault is later handled by software/uCode when the op is at retirement. The load would be zeroed out on the way to the integer/vector units. In other words, a fault on User/Supervisor protection fault would be similar to a page not present fault or other page translation issue meaning the line read out of the L1D$ should be thrown away immediately and the uOp simply put in a waiting state. Preventing integer/floating units from consuming faulting loads is beneficial not just to prevent such leaks, but can actually boost performance. I.e., loads that fault won’t train the prefetchers with bad data, allocate buffers to track memory ordering, or allocate a fill buffer to fetch data from L2$ if it missed the L1D$. These are limited resources in modern CPUs and shouldn’t be consumed by loads that are not good anyway. In fact, if the load missed the TLB’s and had to perform a page walk, some Intel CPUs will even kill the page walk in the PMH (Page Miss Handler) if a fault happens during the walk. Page walks perform a lot of pointer chasing and have to consume precious load cycles, so it makes sense to cancel if it’ll be thrown away later anyway. In addition, the PMH Finite State Machine can usually handle only a few page walks simultaneously. In other words, aborting the L1D Load uOp can actually a good thing from a performance stand point. The press articles saying Intel slipped because they were trying to extract as much performance as possible with the tradeoff of being less secure isn’t true unless they want to claim the basic concepts of speculation and caching are considered tradeoffs. The Fix This doesn’t mean the Meltdown vulnerability doesn’t exist. There is more to the story than what the whitepaper and most news posts discuss. Most posts claim that only the mere act of having speculative memory accesses and cache timing attacks can create the attack, and now Intel has to completely redesign its CPUs or eliminate speculative execution. Meltdown is more of a logic bug that slipped Intel CPU validation rather than a “fundamental breakdown in modern CPU architecture” like the press is currently saying. It can probably be fixed with a few gate changes in hardware. In fact, the bug fix would probably be a few gate changes to add the correct rejection logic in the L1D$ pipelines to mask the load hit. Intel CPUs certainly have the information already as the address generation and TLB lookup stages have to be complete before a L1D$ cache hit can be determined anyway. It is unknown what are all the scenarios that cause the vulnerability. Is it certain CPU designs that missed validation on this architectural behavior? Is it a special x86 instruction sequence that bypasses these checks, or some additional steps to set up the state of the CPU to ensure the load is actually executed? Project Zero believes the attack can only occur if the faulting load hits in L1D$. Maybe Intel had the logic on the miss path but had a logic bug for the hit path? I wouldn’t be surprised if certain Intel OoO designs are immune to Meltdown as it’s a specific CPU design and validation problem, rather than a general CPU architecture problem. Unfortunately, x86 has many different flows through the Memory Execution Unit. For example, certain instructions like MOVNTDQA have different memory ordering and flows in the L1D$ than a standard cacheable load. Haswell Transactional Synchronization Extensions and locks add even additional complexity to validate correctness. Instruction fetches go through a different path than D-side loads. The validation state space is very large. Throw in all the bypass networks and then you can see how many different places where fault checks need to be validated. One thing is for certain, caching and speculation are not going away anytime soon. If it is a logic bug, it may be a simple fix for future Intel CPUs. Why Is This Attack Easier Today? Let’s say there is an instruction that enable loads to hit and consumed even in presence of faults, or I am wrong on the above catch, why is it happening now rather than discovered decades ago. Today’s CPUs have much deeper pipelines cough Prescott cough which provides a wider window between the speculative memory access and the actual nuke/squash of those faulting accesses. Faulting instructions are not handled until the instruction is up for retirement/committal to processor architectural state. Only at retirement is the pipeline nuked. Long pipelines allow for a large window between the execution of the faulting instruction and the retirement of it allowing other speculative instructions to race ahead. Larger cache hierarchy and slower memory fabric speed relative to fast CPU only ops such as cache hit/integer ops which provides a much larger time difference in cycles between a cache hit and a cache miss/memory to enable more robust cache timing attacks. Today’s large multi-core server CPUs with elaborate mesh fabrics to connect tens or hundreds of cores exaggerates this. Addition of performance enhancing features for fine granularity cache control such as the x86 CLFLUSH and PREFETx give more control for cache timing attacks. Wider issue processors that enable parallel integer, floating, and memory ops simultaneously. One could place long floating point operations such as divide or sqrt right before the faulting instruction to keep the core busy, but still keeping the integer and memory pipelines free for the attack. Since the faulting instruction will not nuke the pipeline until retirement, it has to wait until any earlier instructions in the instruction sequence to be committed including long running floating point ops. Virtualization and PaaS. Many web scale companies are now running workloads cloud providers like AWS and Azure. Before cloud, Fortune 500 companies would run their own trusted applications on their own hardware. Thus, applications from different companies were physically separated, unlike today. While it is unknown if Meltdown can allow a guest OS to break into the hypervisor or host OS, what is known is that many virtualization techniques are more lightweight than full blown VT-x. For example, multiple apps in Heroku, AWS Beanstalk, or Azure Web apps along with Docker containers are running within the same VM. Companies no longer spin up a separate VM for each application. This could allow a rogue application to read kernel memory of the specific VM. Shares resources were not a thing in the 90’s when OoO execution became mainstream with the Pentium Pro/Pentium III. The use of the Global and User/Supervisor bits in x86 paging entries which enables the kernel memory space to be mapped into every user process (but protected from Ring3 code execution) to reduce pressure on TLBs and slow context switching to a separate kernel process. This performance optimization has been done since the 1990s. Is This x86 Specific? First of all, cache timing attacks and speculative execution is not specific to Intel or x86 CPUs. Most modern CPUs implement multi-level caches and heavy speculation outside of a few embedded microprocessors for your watch or microwave. This isn’t an Intel specific problem or a x86 problem but rather a fundamental problem in general CPU architecture. There are now claims that specific OoO ARM CPUs such as those in iPhones and smartphones exhibit this flaw also. Out of order execution has been done since being introduced by Tomasulo algorithm. At the same time, cache timing attacks have been known for decades as it’s been known stuff may be loaded into caches when it shouldn’t have. However, cache timing attacks have traditionally been used to find the location of kernel memory rather than the ability to actually read it. It’s more of a race condition and window that is enabled depending on the microarchitecture. Some CPUs have shallower pipelines than others causing the nuke to happen sooner. Modern desktop/server CPUs like x86 have more elaborate features from CLFLUSH to PREFETCHTx that can be additional tools to make the attach more robust. Background on Memory Paging Since the introduction of paging to the 386 and Windows 3.0, operating systems have used this feature to isolate the memory space of one process from another. A process will be mapped to its own independent virtual address space which is independent from another running process’s address space. These virtual address spaces are backed by physical memory (pages can be swapped out to disk, but that’s beyond the scope of this post). For example, let’s say Process 1 needs 4KB of memory thus the OS allocates a virtual memory space of 4KB which has a byte-addressable range from 0x0 to 0xFFF. This range is backed by physical memory starting at the location 0x1000. This means processes 1’s [0x0-0xFFF] is “mounted” at the physical location [0x1000-0x1FFF]. If there is another process running, it also needs 4KB, thus the OS will map a second virtual address space for this Process 2 with the range 0x0 to 0xFFF. This virtual memory space also needs to be backed by physical memory. Since process 1 is already using 0x1000-0x1FFF, the OS will decide to allocate the next block of physical memory [0x2000-0x2FFF] for Process 2. Given this setup, if Process 1 issues a load from memory to linear address 0x0, the OS will translate this to physical location 0x1000. Whereas if Process 2 issues a load from memory to linear address 0x0, the OS will translate this to physical location 0x2000. Notice how there needs to be a translation. That is the job of the page tables. An analogy in the web world would be how two different running Docker containers running on a single host can mount the same /data dir inside the container to two different physical locations on the host machine /data/node0 and /data/node1. A range of mapped memory is referred to as a page. CPU architectures have a defined page size such as 4KB. Paging allows the memory to be fragmented across the physical memory space. In our above example, we assumed a page size of 4KB, thus each process only mapped one page. Now, let’s say Process 1 performs a malloc() and forces the kernel to map a second 4KB region to be used. Since the next page of physical memory [0x2000-0x2FFF] is already utilized by Process 2, the OS needs to allocate a free block of physical memory [0x3000-0x3FFF] to Process 1 (Note: Modern OS’s use deferred/lazy memory allocation which means virtual memory may be created before being backed by any physical memory until the page is actually accessed, but that’s beyond the scope of this post. See x86 Page Accessed/Dirty Bits for more). The address space appears contiguous to the process but in reality is fragmented across the physical memory space: Process 1 Virtual Memory Physical Memory [0x0-0xFFF] [0x1000-0x1FFF] [0x1000-0x1FFF] [0x3000-0x3FFF] Process 2 Virtual Memory Physical Memory [0x0-0xFFF] [0x2000-0x2FFF] There is an additional translation step before this to translate logical address to linear address using x86 Segmentation. However, most Operating Systems today do not use segmentation in the classical sense so we’ll ignore it for now. Memory Protection Besides creating virtual address spaces, paging is also used as a form of protection. The above translations are stored in a structure called a page table. Each 4KB page can have specific attributes and access rights stored along with the the translation data itself. For example, pages can be defined as read-only. If a memory store is executed against a read-only page of memory, a fault is triggered by the CPU. Straight from the x86 Reference Manual, the following non-exhaustive list of attribute bits (which behave like boolean true/false) are stored with each page table entry: Bit Name Description P Present must be 1 to map a 4-MByte page R/W Read/write if 0, writes may not be allowed to the page referenced by this entry U/S User/supervisor if 0, user-mode accesses are not allowed to the page referenced by this entry A Accessed indicates whether software has accessed the page referenced by this entry D Dirty indicates whether software has written to the page referenced by this entry G Global if CR4.PGE = 1, determines whether the translation is global, ignored otherwise XD Execution Disable If IA32_EFER.NXE = 1, execute-disable (if 1, instruction fetches are not allowed from the 4-KByte page controlled by this entry; see Section 4.6); otherwise, reserved (must be 0) Minimizing Context Switching Cost We showed how each process has its own virtual address mapping. The kernel process is a process just like any other and also has a virtual memory mapping. When the CPU switches context from one process to another process, there is high switching cost as much of the architectural state needs to be saved to memory so that the old process that was suspended can resume executing with the saved state when it starts executing again. However, many system calls need to be performed by the kernel such as I/O, interrupts, etc. This means a CPU would constantly be switching between a user process and the kernel process to handle those system calls. To minimize this cost, kernel engineers and computer architects map the kernel pages right in the user virtual memory space to avoid the context switching. This is done via the User/Supervisor access rights bits. The OS maps the kernel space but designates it as supervisor (a.k.a. Ring0) access only so that any user code cannot access those pages. Thus, those pages appear invisible to any code running at user privilege level (a.k.a. Ring3). while running in user-mode, if a CPU sees a instruction access a page that requires supervisor rights, a page fault is triggered. In x86, page access rights are one of the paging related reasons that can trigger a #PF (Page Fault). The Global Bit We showed how each process has its own virtual address mapping. The kernel process is a process just like any other and also has a virtual memory mapping. Most translations are private to the process. This ensures Process 1 cannot access Process 2’s data since there won’t be any mapping to [0x2000-0x2FFF] physical memory from Process 1. However, many system calls are shared across many processes to handle a process making I/O calls, interrupts, etc. Normally, this means each process would replicate the kernel mapping putting pressure on caching these translations and higher cost of context switching between processes. The Global bit enables these certain translations (i.e., the kernel memory space) to be visible across all processes. Closing Thoughts It’s always interesting to dig into security issues. Systems are now expected to be secure unlike in the 90s, AND it's only becoming much more critical with the growth of crypto, biometric verification, mobile payments, and digital health. A large breach is much more scary for consumers and businesses today that in the 90s. At the same time, we also need to keep discussion going on new reports. The steps taken to trigger the Meltdown vulnerability have been proven by various parties. However, it probably isn’t the mere act of having speculation and cache timing attacks that caused Meltdown nor is a fundamental breakdown in CPU architecture; rather, it seems like a logical bug that slipped validation. Meaning, speculation and caches are not going away anytime soon, nor will Intel require an entirely new architecture to fix Meltdown. Instead, the only change needed in future x86 CPUs is a few small gate changes to the combinational logic that affects determining if a hit in L1D$ (or any temporary buffers) is good.

By Derric Gilling CORE

Best Practices for Microservices: Building Scalable and Efficient Systems

Microservices architecture has revolutionized modern software development, offering unparalleled agility, scalability, and maintainability. However, effectively implementing microservices necessitates a deep understanding of best practices to harness their full potential while avoiding common pitfalls. In this comprehensive guide, we will delve into the key best practices for microservices, providing detailed insights into each aspect. 1. Defining the "Micro" in Microservices Single Responsibility Principle (SRP) Best Practice: Microservices should adhere to the Single Responsibility Principle (SRP), having a well-defined scope of responsibility that encapsulates all tasks relevant to a specific business domain. Explanation: The Single Responsibility Principle, a fundamental concept in software design, applies to microservices. Each microservice should focus on a single responsibility, encapsulating all the tasks relevant to a specific business domain. This approach ensures that microservices are concise and maintainable, as they don't try to do too much, aligning with the SRP's principle of a class having only one reason to change. Simplifying Deployment Best Practice: Combine small teams with complete ownership, discrete responsibility, and infrastructure for continuous delivery to reduce the cost of deploying microservices. Explanation: The combination of small, self-sufficient teams, each responsible for a specific microservice, simplifies the deployment process. With complete ownership and infrastructure supporting continuous delivery, the cost and effort required to move microservices into production are significantly reduced. 2. Embracing Domain-Driven Design (DDD) Best Practice: Apply Domain-Driven Design (DDD) principles to design microservices with a strong focus on specific business domains rather than attempting to create universal solutions. Explanation: Domain-driven design (DDD) is a strategic approach to designing software systems, emphasizing the importance of aligning the software's structure with the organization's business domains. When implementing microservices, it's crucial to use DDD principles to ensure that each microservice accurately represents a specific business domain. This alignment helps in modeling and organizing microservices effectively, ensuring that they reflect the unique requirements and contexts of each area. 3. Encouraging Reusability Best Practice: Promote reuse of microservices within specific domains while allowing for adaptation for use in different contexts. Explanation: Reuse is a valuable principle in microservice design, but it should be restricted to specific domains within the organization. Teams can collaborate and agree on communication models for adapting microservices for use outside their original contexts. This approach fosters efficiency and consistency while avoiding unnecessary duplication of functionality. 4. Microservices in Comparison to Monolithic Systems Fostering Service Encapsulation Best Practice: Keep microservices small to ensure that a small group of developers can understand the entirety of a single microservice. Explanation: The size of microservices should be such that a small team or even a single developer can fully comprehend the entire service. This promotes agility, reduces complexity, and facilitates faster development and maintenance. Promoting Standardized Interfaces Best Practice: Expose microservices through standardized interfaces (e.g., RESTful APIs or AMQP exchanges) to enable reuse without tight coupling. Explanation: Microservices should communicate with each other through standardized interfaces that abstract the underlying implementation. This approach enables other services and applications to consume and reuse microservices without becoming tightly coupled to them, promoting flexibility and maintainability. Enabling Independent Scaling Best Practice: Ensure that microservices exist as independent deployment artifacts, allowing them to be scaled independently of other services. Explanation: Microservices should be designed to function as independent units that can be deployed and scaled separately. This flexibility allows organizations to allocate resources efficiently based on the specific demands of each microservice, improving performance and resource utilization. Automating Deployment Best Practice: Implement automation throughout the software development lifecycle, including deployment automation and continuous integration. Explanation: Automation is essential for microservices to achieve rapid development, testing, and deployment. Continuous integration and automated deployment pipelines allow organizations to streamline the release process, reducing manual intervention and ensuring consistent and reliable deployments. 5. Service Mesh and Management Practices Command Query Responsibility Segregation (CQRS) Best Practice: Consider separating microservices into command and query responsibilities, especially for high-traffic requirements. Explanation: In situations where specific business capabilities experience high traffic, it may be beneficial to separate the microservices responsible for handling queries (information retrieval) from those handling commands (state-changing functions). This pattern, known as Command Query Responsibility Segregation (CQRS), optimizes performance and scalability. Event Sourcing Best Practice: Embrace eventual consistency by storing changes to state as journaled business events. Explanation: To ensure consistency among microservices, especially when working asynchronously, consider adopting an event-sourcing approach. Instead of relying on distributed transactions, microservices can collaborate using domain events published to a message broker. This approach ensures eventual consistency once all microservices have completed their work. Continuous Delivery of Composed Applications Best Practice: Implement continuous delivery for composed microservice applications to ensure agility and real-time verification of business objectives. Explanation: Continuous delivery is essential for achieving agility and verifying that composed microservice applications meet their business objectives. Short release cycles, fast feedback on build failures, and automated deployment facilities are critical components of this approach. Reduce Complexity With Service Mesh Best Practice: Implement a service mesh architecture to simplify microservice management, ensuring secure, fast, and reliable service-to-service communications. Explanation: A service mesh is an architectural pattern that simplifies the management of microservices by providing secure and reliable communication between services. It abstracts governance considerations and enhances the security and performance of microservices interactions. 6. Fault Tolerance and Resilience Best Practice: Implement fault tolerance and resilience mechanisms to ensure that microservices can withstand and recover from failures gracefully. Explanation: Microservices should be designed to handle failures without causing widespread disruptions. This includes strategies such as circuit breakers, retry mechanisms, graceful degradation, and the ability to self-heal in response to failures. Prioritizing fault tolerance and resilience ensures that the system remains stable and responsive under adverse conditions. 7. Monitoring and Logging Best Practice: Establish comprehensive monitoring and logging practices to gain insights into the health and performance of microservices. Explanation: Monitoring and logging are essential for understanding how microservices are behaving in production. Implement robust monitoring tools and logging frameworks to track key performance metrics, detect anomalies, troubleshoot issues, and gain actionable insights. Proactive monitoring and logging enable timely responses to incidents and continuous improvement of microservices. By incorporating these two additional best practices—Fault Tolerance and Resilience, and Monitoring and Logging—organizations can further enhance the reliability and manageability of their microservices-based systems. 8. Decentralize Data Management Best Practice: In microservices architecture, each microservice should maintain its own copy of the data, avoiding multiple services accessing or sharing the same database. Explanation: Microservices benefit from data decentralization, where each microservice manages its own data independently. It is crucial not to set up multiple services to access or share the same database, as this would undermine the autonomy of microservices. Instead, design microservices to own and manage their data. To enable controlled access to a microservice's data, implement APIs that act as gateways for other services. This approach enforces centralized access control, allowing developers to incorporate features like audit logging and caching seamlessly. Strive for a data structure that includes one or two database tables per microservice, ensuring clean separation and encapsulation of data. 9. Promoting Loose Coupling Strategies Best Practice: Embrace strategies that promote loose coupling between microservices, both in terms of incoming and outgoing dependencies. Explanation: In a microservices architecture, maintaining loose coupling between services is crucial for flexibility and scalability. To achieve this, consider employing various strategies that encourage loose coupling: Point-to-point and Publish-Subscribe: Utilize messaging patterns such as point-to-point and publish-subscribe. These patterns help decouple senders and receivers, as they remain unaware of each other. In this setup, the contract of a reactive microservice, like a Kafka consumer, is defined by the name of the message queue and the structure of the message. This isolation minimizes dependencies between services. API-First Design: Adopt a contract-first design approach, where the API is designed independently of existing code. This practice prevents the creation of APIs tightly coupled to specific technologies and implementations. By defining the contract first, you ensure that it remains technology-agnostic and adaptable to changes, promoting loose coupling between services. By incorporating these strategies, you can enhance the loose coupling between microservices, making your architecture more resilient and adaptable to evolving requirements. Conclusion The core design principles outlined above serve as a solid foundation for crafting effective microservice architectures. While adhering to these principles is essential, the success of a microservice design goes beyond mere compliance. It requires a thorough understanding of quality attribute requirements and the ability to make informed design decisions while considering trade-offs. Additionally, familiarity with design patterns and architectural tactics that align with these principles is crucial. Equally important is a deep understanding of the available technology choices, as they play a pivotal role in the implementation and operation of microservices. Ultimately, a holistic approach that combines these design principles with careful consideration of requirements, design patterns, and technology options paves the way for successful microservice design and implementation.

By Lav Kumar

AI in Java: Building a ChatGPT Clone With Spring Boot and LangChain

Many libraries for AI app development are primarily written in Python or JavaScript. The good news is that several of these libraries have Java APIs as well. In this tutorial, I'll show you how to build a ChatGPT clone using Spring Boot, LangChain, and Hilla. The tutorial will cover simple synchronous chat completions and a more advanced streaming completion for a better user experience. Completed Source Code You can find the source code for the example in my GitHub repository. Requirements Java 17+ Node 18+ An OpenAI API key in an OPENAI_API_KEY environment variable Create a Spring Boot and React project, Add LangChain First, create a new Hilla project using the Hilla CLI. This will create a Spring Boot project with a React frontend. Shell npx @hilla/cli init ai-assistant Open the generated project in your IDE. Then, add the LangChain4j dependency to the pom.xml file: XML <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j</artifactId> <version>0.22.0</version>  </dependency> Simple OpenAI Chat Completions With Memory Using LangChain We'll begin exploring LangChain4j with a simple synchronous chat completion. In this case, we want to call the OpenAI chat completion API and get a single response. We also want to keep track of up to 1,000 tokens of the chat history. In the com.example.application.service package, create a ChatService.java class with the following content: Java @BrowserCallable @AnonymousAllowed public class ChatService { @Value("${openai.api.key}") private String OPENAI_API_KEY; private Assistant assistant; interface Assistant { String chat(String message); } @PostConstruct public void init() { var memory = TokenWindowChatMemory.withMaxTokens(1000, new OpenAiTokenizer("gpt-3.5-turbo")); assistant = AiServices.builder(Assistant.class) .chatLanguageModel(OpenAiChatModel.withApiKey(OPENAI_API_KEY)) .chatMemory(memory) .build(); } public String chat(String message) { return assistant.chat(message); } } @BrowserCallable makes the class available to the front end. @AnonymousAllowed allows anonymous users to call the methods. @Value injects the OpenAI API key from the OPENAI_API_KEY environment variable. Assistant is the interface that we will use to call the chat API. init() initializes the assistant with a 1,000-token memory and the gpt-3.5-turbo model. chat() is the method that we will call from the front end. Start the application by running Application.java in your IDE, or with the default Maven goal: Shell mvn This will generate TypeScript types and service methods for the front end. Next, open App.tsx in the frontend folder and update it with the following content: TypeScript-JSX export default function App() { const [messages, setMessages] = useState<MessageListItem[]>([]); async function sendMessage(message: string) { setMessages((messages) => [ ...messages, { text: message, userName: "You", }, ]); const response = await ChatService.chat(message); setMessages((messages) => [ ...messages, { text: response, userName: "Assistant", }, ]); } return ( <div className="p-m flex flex-col h-full box-border"> <MessageList items={messages} className="flex-grow" /> <MessageInput onSubmit={(e) => sendMessage(e.detail.value)} /> </div> ); } We use the MessageList and MessageInput components from the Hilla UI component library. sendMessage() adds the message to the list of messages, and calls the chat() method on the ChatService class. When the response is received, it is added to the list of messages. You now have a working chat application that uses the OpenAI chat API and keeps track of the chat history. It works great for short messages, but it is slow for long answers. To improve the user experience, we can use a streaming completion instead, displaying the response as it is received. Streaming OpenAI Chat Completions With Memory Using LangChain Let's update the ChatService class to use a streaming completion instead: Java @BrowserCallable @AnonymousAllowed public class ChatService { @Value("${openai.api.key}") private String OPENAI_API_KEY; private Assistant assistant; interface Assistant { TokenStream chat(String message); } @PostConstruct public void init() { var memory = TokenWindowChatMemory.withMaxTokens(1000, new OpenAiTokenizer("gpt-3.5-turbo")); assistant = AiServices.builder(Assistant.class) .streamingChatLanguageModel(OpenAiStreamingChatModel.withApiKey(OPENAI_API_KEY)) .chatMemory(memory) .build(); } public Flux<String> chatStream(String message) { Sinks.Many<String> sink = Sinks.many().unicast().onBackpressureBuffer(); assistant.chat(message) .onNext(sink::tryEmitNext) .onComplete(sink::tryEmitComplete) .onError(sink::tryEmitError) .start(); return sink.asFlux(); } } The code is mostly the same as before, with some important differences: Assistant now returns a TokenStream instead of a String. init() uses streamingChatLanguageModel() instead of chatLanguageModel(). chatStream() returns a Flux<String> instead of a String. Update App.tsx with the following content: TypeScript-JSX export default function App() { const [messages, setMessages] = useState<MessageListItem[]>([]); function addMessage(message: MessageListItem) { setMessages((messages) => [...messages, message]); } function appendToLastMessage(chunk: string) { setMessages((messages) => { const lastMessage = messages[messages.length - 1]; lastMessage.text += chunk; return [...messages.slice(0, -1), lastMessage]; }); } async function sendMessage(message: string) { addMessage({ text: message, userName: "You", }); let first = true; ChatService.chatStream(message).onNext((chunk) => { if (first && chunk) { addMessage({ text: chunk, userName: "Assistant", }); first = false; } else { appendToLastMessage(chunk); } }); } return ( <div className="p-m flex flex-col h-full box-border"> <MessageList items={messages} className="flex-grow" /> <MessageInput onSubmit={(e) => sendMessage(e.detail.value)} /> </div> ); } The template is the same as before, but the way we handle the response is different. Instead of waiting for the response to be received, we start listening for chunks of the response. When the first chunk is received, we add it as a new message. When subsequent chunks are received, we append them to the last message. Re-run the application, and you should see that the response is displayed as it is received. Conclusion As you can see, LangChain makes it easy to build LLM-powered AI applications in Java and Spring Boot. With the basic setup in place, you can extend the functionality by chaining operations, adding external tools, and more following the examples on the LangChain4j GitHub page, linked earlier in this article. Learn more about Hilla in the Hilla documentation.

By Marcus Hellberg

Securing Your CI/CD: An OIDC Tutorial

Let's start with a story: Have you heard the news about CircleCI's breach? No, not the one where they accidentally leaked some customer credentials a few years back. This time, it's a bit more serious. It seems that some unauthorized individuals were able to gain access to CircleCI's systems, compromising the secrets stored in CircleCI. CircleCI advised users to rotate "any and all secrets" stored in CircleCI, including those stored in project environment variables or contexts. The CircleCI breach serves as a stark reminder of the risks associated with storing sensitive information in CI/CD systems. Next, let's talk about CI/CD security a bit more. CI/CD Security CI/CD systems, like CircleCI, are platforms used by developers to automate build/deploy processes, which, by definition, means that they need to access other systems to deploy software or use some services, like cloud services. For example, after building some artifacts, you probably need to push those artifacts to some repositories; for another, when deploying your cloud infrastructure using code, you need to access public cloud providers to create stuff. As we can imagine, this means that a lot of sensitive information gets passed through the CI/CD platforms daily, because for CI/CD to interact with other systems, some type of authentication and authorization is required, and in most cases, passwords are used for this. So, needless to say, the security of the CI/CD systems themselves is critical. Unfortunately, although CI/CD systems are designed to automate software development processes, they might not necessarily be built with security in mind and they are not 100% secure (well, nothing is). Best Practices to Secure CI/CD Systems Best Practice #1: No Long-Lived Credentials One of the best practices, of course, is not to use long-lived credentials at all. For example, when you access AWS, always use temporary security credentials (IAM roles) instead of long-term access keys. Now, when you try to create an access key, AWS even reminds you to not do this, but recommends SSO/other methods. In fact, in many scenarios, you don't need long-term access keys that never expire; instead, you can create IAM roles and generate temporary security credentials. Temporary security credentials consist of an access key ID and a secret access key, but they also include a security token that indicates when the credentials expire. Best Practice #2: Don't Store Secrets in CI/CD Systems By storing secrets in CI systems, we are essentially placing our trust in a third-party service to keep sensitive information safe. However, if that service is ever compromised, as was the case with CircleCI, then all of the secrets stored within it are suddenly at risk, which can result in serious consequences. What we can do is to use some secrets manager to store secrets, and use a secure way in our CI/CD systems to retrieve those secrets. If you are not familiar with data security or secrets managers, maybe give this blog a quick read. Best Practice #3: Rotate/Refresh Your Passwords Not all systems you are trying to access from your CI/CD systems support some kind of short-lived credentials like AWS does. There are certain cases where you would have to use long-lived passwords, and in those cases, you need to make sure you rotate and refresh the token as it periodically expires. Certain secret managers even can rotate secrets for you, reducing operational overhead. For example, HashiCorp's Vault supports multiple "engines" (components that store, generate, or encrypt data), and most of the engines for Databases support root password rotation, where Vault manages the rotation automatically for you: If you are interested in more best practices, there is a blog on how to secure your CI/CD pipeline. How OIDC (OpenID Connect) Works Following these best practices, let's dive deep into two hands-on tutorials to harden your CI/CD security. Before that, let's do a very short introduction to the technology that enables us to do so: OpenID Connect (OIDC). If you are not bothered to read the official definition of OIDC from the official website, here's the TL;DR version: OIDC allows us to use short-lived tokens instead of long-lived passwords, following our best practice #1 mentioned earlier. If integrated with CI, we can configure our CI to request short-lived access tokens and use that to access other systems (of course, other systems need to support OIDC on their end). Tutorial: GitHub Actions OIDC With AWS To use OIDC in GitHub Actions workflows, first, we need to configure AWS. 1. Create an OIDC Provider in AWS For Configure provider, choose OpenID Connect. For the provider URL: Use https://token.actions.githubusercontent.com Choose "Get thumbprint" to verify the server certificate of your IdP. For the "Audience": Use sts.amazonaws.com. After creation, copy the provider ARN, which will be used next. To learn more about this step, see the official document here. 2. Create a Role With Assume Role Policy Next, let's configure the role and trust in IAM. Here, I created a role named "gha-oidc-role" and attached the AWS-managed policy "AmazonS3ReadOnlyAccess" (ARN: arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess). Then, the tricky part is the trust relationships, and here's an example of the value I used: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::737236345234:oidc-provider/token.actions.githubusercontent.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" }, "StringLike": { "token.actions.githubusercontent.com:sub": "repo:IronCore864/vault-oidc-test:*" } } } ] } The Principal is the OIDC provider's ARN we copied from the previous step. The token.actions.githubusercontent.com:sub in the condition defines which org/repo can assume this role; here I used IronCore864/vault-oidc-test. After creation, copy the IAM role ARN, which will be used next. To learn more about creating roles for OIDC, see the official document here. 3. Test AWS Access in GitHub Action Using OIDC Let's create a simple test workflow: name: AWS on: workflow_dispatch: jobs: s3: runs-on: ubuntu-latest permissions: id-token: write contents: read steps: - name: configure aws credentials uses: aws-actions/configure-aws-credentials@v2 with: role-to-assume: arn:aws:iam::737236345234:role/gha-oidc-role role-session-name: samplerolesession aws-region: us-west-1 - name: ls run: | aws s3 ls This workflow named "AWS" is triggered manually, tries to assume the role we created in the previous step, and runs some simple AWS commands to test we get the access. The job or workflow run requires a permission setting with id-token: write. You won't be able to request the OIDC JWT ID token if the permissions setting for id-token is set to read or none. For your convenience, I put the workflow YAML file here. After triggering the workflow, everything works with no access keys or secrets needed whatsoever: Tutorial: GitHub Actions OIDC With HashiCorp Vault Unfortunately, not all systems that you are trying to access from your CI/CD workflows support OIDC, and sometimes you would still need to use passwords. However, using hardcoded passwords means we need to duplicate and store them in GitHub as secrets, and this violates our aforementioned best practice. A better approach is to use a secrets manager to store secrets and set up OIDC between your CI and your secrets manager to retrieve secrets from your secrets manager, with no password used in the process. 1. Install HashiCorp Vault In this tutorial, we will do a local dev server (DO NOT DO THIS IN PRODUCTION) and expose it to the public internet so that GitHub Actions can reach it. The quickest way to install Vault on Mac probably is using brew. First, install the HashiCorp tap, a repository of all our Homebrew packages: brew tap hashicorp/tap. Then, install Vault: brew install hashicorp/tap/vault. For other systems, refer to the official doc here. After installation, we can quickly start a local dev server by running: vault server -dev However, this is only running locally on our laptop, not accessible from the public internet. To expose it to the internet so that GitHub Actions can reach it, we use grok, a fast way to put your app on the internet. For detailed installation and usage, see the official doc. After installation, we can simply run ngrok http 8200 to expose the Vault port. Take note of the public URL to your local Vault. 2. Enable JWT Auth Execute the following to enable JWT auth in Vault: vault auth enable jwt Apply the configuration for GitHub Actions: vault write auth/jwt/config \ bound_issuer="https://token.actions.githubusercontent.com" \ oidc_discovery_url="https://token.actions.githubusercontent.com" Create a policy that grants access to the specified paths: vault policy write myproject-production - <<EOF path "secret/*" { capabilities = [ "read" ] } EOF Create a role to use the policy: vault write auth/jwt/role/myproject-production -<<EOF { "role_type": "jwt", "user_claim": "repository", "bound_claims_type": "glob", "bound_claims": {"sub": "repo:IronCore864/*"}, "policies": ["myproject-production"] } EOF When creating the role, ensure that the bound_claims parameter is defined for your security requirements, and has at least one condition. To check arbitrary claims in the received JWT payload, the bound_claims parameter contains a set of claims and their required values. In the above example, the role will accept any incoming authentication requests from any repo owned by the user (or org) IronCore864. To see all the available claims supported by GitHub's OIDC provider, see "About security hardening with OpenID Connect". 3. Create a Secret in Vault Next, let's create a secret in Vault for testing purposes, and we will try to use GitHub Actions to retrieve this secret using OIDC. Here we created a secret named "aws" under "secret", and there is a key named "accessKey" in the secret with some random testing value. To verify, we can run: $ vault kv get secret/aws = Secret Path = secret/data/aws ======= Metadata ======= Key Value --- ----- created_time 2023-07-29T00:00:38.757487Z custom_metadata <nil> deletion_time n/a destroyed false version 1 ====== Data ====== Key Value --- ----- accessKey test Note that the "Secret Path" is actually secret/data/aws, rather than secret/aws. This is because of the kv engine v2, the API path has the added "data" part. 4. Retrieve Secret from Vault in GitHub Actions Using OIDC Let's create another simple test workflow: name: Vault on: workflow_dispatch: jobs: retrieve-secret: runs-on: ubuntu-latest permissions: id-token: write contents: read steps: - name: Import secret from Vault id: import-secrets uses: hashicorp/vault-action@v2 with: method: jwt url: https://f2f6-185-212-61-32.ngrok-free.app role: myproject-production secrets: | secret/data/aws accessKey | AWS_ACCESS_KEY_ID; - name: Use secret from Vault run: | echo "${{ env.AWS_ACCESS_KEY_ID }" echo "${{ steps.import-secrets.outputs.AWS_ACCESS_KEY_ID }" This workflow named "Vault" is triggered manually, tries to assume the role we created in the previous steps, and receives the secret we just created. To use the secret, we can either use "env" or step outputs, as shown in the example above. Similarly to the previous AWS job, it requires a permission setting with id-token: write. For your convenience, I put the workflow YAML file here. After triggering the workflow, everything works with no secrets used to access our Vault: Summary In this article, we started with the infamous CircleCI breach, went on to talk about security in CI/CD systems with some best practices, did a quick introduction to OIDC, and did two hands-on tutorials on how to use it with your CI. After this tutorial, you should be able to configure secure access between GitHub Actions and your cloud providers and retrieve secrets securely using OIDC. See you in the next one!

By Tiexin Guo

Building a RESTful Minimal API With .NET Core 7

NET Core and ASP.NET Core are popular frameworks for creating powerful RESTful APIs. In this tutorial, we will use it to develop a simple Minimal API that simulates a credit score rating. Minimal APIs provide a streamlined approach to creating high-performing HTTP APIs using ASP.NET Core. They allow you to construct complete REST endpoints with minimal setup and code easily. Instead of relying on conventional scaffolding and controllers, you can fluently define API routes and actions to simplify the development process. We will create an endpoint allowing a user to retrieve a credit score rating by sending a request to the API. We can also save and retrieve credit scores using POST and GET methods. However, it is essential to note that we will not be linking up to any existing backend systems to pull a credit score; instead, we will use a random number generator to generate the score and return it to the user. Although this API is relatively simple, it will demonstrate the basics of REST API development using .NET Core and ASP.NET. This tutorial will provide a hands-on introduction to building RESTful APIs with .NET Core 7 and the Minimal API approach. Prerequisites Before we start, we must ensure that we have completed several prerequisites. To follow along and run this tutorial, you will need the following: A working .NET Core installation An IDE or text editor of your choice Postman to test our endpoint Creating the Initial Project We’ll be using the .NET cli tool to create our initial project. The .NET command line interface provides the ability to develop, build, run, and publish .NET applications. The .NET cli new command provides many templates to create your project. You can also add the search command to find community-developed templates from NuGet or use dotnet new list to see available templates provided by Microsoft. We’ll be creating a Minimal API and starting from as clean a slate as possible. We’ll be using the empty ASP.NET Core template. In the directory of your choosing; enter the following in the terminal: dotnet new web You’ll notice that the directory structure will look something like this: We’ll be doing all of our work in the Program.cs file. Its starting code should look similar to the following: var builder = WebApplication.CreateBuilder(args); var app = builder.Build(); app.MapGet("/", () => "Hello World!");app.Run(); We can see how concise and readable our starter code is. Let’s break down the code provided by the template line by line: The WebApplication.CreateBuilder(args) method creates a new instance of the WebApplicationBuilder class, which is used to configure and build the WebApplication instance. The args parameter is an optional array of command-line arguments that can be passed to the application at runtime. The builder.Build() method is called to create a new instance of the WebApplication class, which represents the running application. This instance configures the application, defines routes, and handles requests. The third line defines a route for the root path (“/”) of the application using the app.MapGet() method. This means that when the root path is requested, the application will respond with the string “Hello World!”. We start the application by calling the app.Run() method. Using the builder pattern, we can configure and customize the WebApplication instance. This allows us to define the application’s behavior, including middleware, routes, and other settings, in a structured and extensible way. For example, the WebApplication instance created by the builder can be thought of as the “entry point” of the application, which handles requests and generates responses. Overall, this code block creates a simple Minimal API in .NET 7 that responds with a “Hello World!” message when the application’s root path is requested. Next, we’ll customize our API to mimic retrieving a credit score rating. Adding in the Code In Program.cs, we will house our endpoints and business logic. We’ll define our creditscore endpoint to provide GET and POST operations. We’ll implement a list to store any credit score we would like. We’ll also define an endpoint to retrieve the list of saved credit scores. We’ll be utilizing a CreditScore record, a new reference type in C# 10 similar to structs. A record is a lightweight and immutable data object optimized for comparison and equality checking. Populate Program.cs with the following code: var builder = WebApplication.CreateBuilder(args); var app = builder.Build(); var userAddedCreditScores = new List<CreditScore>();app.MapGet("/creditscore", () => { var score = new CreditScore ( Random.Shared.Next(300, 850) ); return score; });app.MapPost("/creditscore", (CreditScore score) => { userAddedCreditScores.Add(score); return score; });app.MapGet("/userAddedCreditScores", () => userAddedCreditScores);app.Run();record CreditScore(int Score) { public string? CreditRating { get => Score switch { >= 800 => "Excellent", >= 700 => "Good", >= 600 => "Fair", >= 500 => "Poor", _ => "Bad" }; } } As mentioned, our code first creates a builder object for the web application and then uses it to build an application instance. It also defines a record type called CreditScore with a single property called Score and a read-only property called CreditRating. This may look a little strange as we define our record after using it. However, this is due to namespaces, and the record must be defined outside of the WebApplication namespace. The application exposes multiple endpoints using app.MapGet() and app.MapPost() methods. The first endpoint, /creditscore is a GET method that generates a new random CreditScore object with a score between 300 and 850. We’ll define a POST method for the same endpoint that accepts a CreditScore object in the request body, adds it to a list called userAddedCreditScores, and returns the same CreditScore object to the caller. The other endpoint /userAddedCreditScores is a GET method that returns a list of all the CreditScore objects that have been added to userAddedCreditScores. Finally, the application starts running using app.Run(). Running and Testing the API With our code written, run the following command to compile and run our project: dotnet run The API is now operational and ready for testing. After running the previous command, you will see which port has been used to host your API in the console. You can define which port you would like to use by editing the Properties > launchSettings.json file or by adding editing the app.Run() command in Program.cs like so, replacing 3000 with your desired port number: app.Run("http://localhost:3000"); You can use a tool like Postman to send an HTTP request to the API. For me, the endpoint to get a credit score is localhost:5242/creditscore. When you send a request to this endpoint, you should receive a 200 OK status code, a credit score generated by the random number generator, and a credit rating. We can save a credit score by sending a POST request to the creditscore endpoint. We form the request’s body with a CreditScore object. Finally, we can retrieve all added scores by sending a GET request to the /userAddedCreditScores endpoint. Wrapping Up In summary, we have developed a basic RESTful Minimal API using .NET Core 7 and ASP.NET. This code can be a foundation for creating more complex APIs for your application. As you continue to develop the API, you may want to consider implementing security measures such as an API key, integrating with an API gateway, monitoring the usage of the API, or generating revenue through API monetization.

By Dylan Frankcom

Using the NGINX Docker Image

Docker is a compelling platform to package and run web applications, especially when paired with one of the many Platform-as-a-Service (PaaS) offerings provided by cloud platforms. NGINX has long provided DevOps teams with the ability to host web applications on Linux and also provides an official Docker image to use as the base for custom web applications. In this post, I explain how DevOps teams can use the NGINX Docker image to build and run web applications on Docker. Getting Started With the Base Image NGINX is a versatile tool with many uses, including a load balancer, reverse proxy, and network cache. However, when running NGINX in a Docker container, most of these high-level functions are delegated to other specialized platforms or other instances of NGINX. Typically, NGINX fulfills the function of a web server when running in a Docker container. To create an NGINX container with the default website, run the following command: docker run -p 8080:80 nginx This command will download the nginx image (if it hasn't already been downloaded) and create a container exposing port 80 in the container to port 8080 on the host machine. You can then open http://localhost:8080/index.html to view the default "Welcome to nginx!" website. To allow the NGINX container to expose custom web assets, you can mount a local directory inside the Docker container. Save the following HTML code to a file called index.html: <html> <body> Hello from Octopus! </body> </html> Next, run the following command to mount the current directory under /usr/share/nginx/html inside the NGINX container with read-only access: docker run -v $(pwd):/usr/share/nginx/html:ro -p 8080:80 nginx Open http://localhost:8080/index.html again and you see the custom HTML page displayed. One of the benefits of Docker images is the ability to bundle all related files into a single distributable artifact. To realize this benefit, you must create a new Docker image based on the NGINX image. Creating Custom Images Based on NGINX To create your own Docker image, save the following text to a file called Dockerfile: FROM nginx COPY index.html /usr/share/nginx/html/index.html Dockerfile contains instructions for building a custom Docker image. Here you use the FROM command to base your image on the NGINX one, and then use the COPY command to copy your index.html file into the new image under the /usr/share/nginx/html directory. Build the new image with the command: docker build . -t mynginx This builds a new image called mynginx. Run the new image with the command: docker run -p 8080:80 mynginx Note that you didn't mount any directories this time. However, when you open http://localhost:8080/index.html your custom HTML page is displayed because it was embedded in your custom image. NGINX is capable of much more than hosting static files. To unlock this functionality, you must use custom NGINX configuration files. Advanced NGINX Configuration NGINX exposes its functionality via configuration files. The default NGINX image comes with a simple default configuration file designed to host static web content. This file is located at /etc/nginx/nginx.conf in the default image, and has the following contents: user nginx; worker_processes auto; error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid; events { worker_connections 1024; } http { include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; sendfile on; #tcp_nopush on; keepalive_timeout 65; #gzip on; include /etc/nginx/conf.d/*.conf; } There's no need to understand this configuration file in detail, but there is one line of interest that instructs NGINX to load additional configuration files from the /etc/nginx/conf.d directory: include /etc/nginx/conf.d/*.conf; The default /etc/nginx/conf.d file configures NGINX to function as a web server. Specifically the location / block-loading files from /usr/share/nginx/html is why you mounted your HTML files to that directory previously: server { listen 80; server_name localhost; #access_log /var/log/nginx/host.access.log main; location / { root /usr/share/nginx/html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } You can take advantage of the instructions to load any *.conf configuration files in /etc/nginx to customize NGINX. In this example, you add a health check via a custom location listening on port 90 that responds to requests to the /nginx-health path with an HTTP 200 OK. Save the following text to a file called health-check.conf: server { listen 90; server_name localhost; location /nginx-health { return 200 "healthy\n"; add_header Content-Type text/plain; } } Modify the Dockerfile to copy the configuration file to /etc/nginx/conf.d: FROM nginx COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build the image with the command: docker build . -t mynginx Run the new image with the command. Note the new port exposed on 9090: docker run -p 8080:80 -p 9090:90 mynginx Now open http://localhost:9090/nginx-health. The health check response is returned to indicate that the web server is up and running. The examples above base your custom images on the default nginx image. However, there are other variants that provide much smaller image sizes without sacrificing any functionality. Choosing NGINX Variants The default nginx image is based on Debian. However, NGINX also provides images based on Alpine. Alpine is frequently used as a lightweight base for Docker images. To view the sizes of Docker images, they must first be pulled down to your local workstation: docker pull nginx docker pull nginx:stable-alpine You can then find the image sizes with the command: docker image ls From this, you can see the Debian image weighs around 140 MB while the Alpine image weighs around 24 MB. This is quite a saving in image sizes. To base your images on the Alpine variant, you need to update the Dockerfile: FROM nginx:stable-alpine COPY index.html /usr/share/nginx/html/index.html COPY health-check.conf /etc/nginx/conf.d/health-check.conf Build and run the image with the commands: docker build . -t mynginx docker run -p 8080:80 -p 9090:90 mynginx Once again, open http://localhost:9090/nginx-health or http://localhost:8080/index.html to view the web pages. Everything continues to work as it did previously, but your custom image is now much smaller. Conclusion NGINX is a powerful web server, and the official NGINX Docker image allows DevOps teams to host custom web applications in Docker. NGINX also supports advanced scenarios thanks to its ability to read configuration files copied into a custom Docker image. In this post, you learned how to create a custom Docker image hosting a static web application, added advanced NGINX configuration files to provide a health check endpoint, and compared the sizes of Debian and Alpine NGINX images. Resources NGINX Docker image source code Dockerfile reference Happy deployments!

By Matthew Casperson

Build a Serverless Application for Entity Detection on AWS

In this blog post, you will learn how to build a Serverless solution for entity detection using Amazon Comprehend, AWS Lambda, and the Go programming language. Text files uploaded to Amazon Simple Storage Service (S3) will trigger a Lambda function which will further analyze it, extract entity metadata (name, type, etc.) using the AWS Go SDK, and persist it to an Amazon DynamoDB table. You will use Go bindings for AWS CDK to implement "Infrastructure-as-code" for the entire solution and deploy it with the AWS Cloud Development Kit (CDK) CLI. The code is available on GitHub. Introduction Amazon Comprehend leverages NLP to extract insights from documents, including entities, key phrases, language, sentiments, and other elements. It utilizes a pre-trained model that is continuously updated with a large body of text, eliminating the need for training data. Additionally, users can build their own custom models for classification and entity recognition with the help of Flywheels. The platform also offers built-in topic modeling to organize documents based on similar keywords. For document processing, there is the synchronous mode for a single document or a batch of up to 25 documents, while asynchronous jobs are recommended for processing large numbers of documents. Let's learn Amazon Comprehend with a hands-on tutorial. We will be making use of the entity detection feature wherein, Comprehend analyzes the text and identifies all the entities present, as well as their corresponding entity type (e.g. person, organization, location). Comprehend can also identify relationships between entities, such as identifying that a particular person works for a specific company. Automatically identifying entities within large amounts of text data can help businesses save time and resources that would otherwise be spent manually analyzing and categorizing text data. Prerequisites Before you proceed, make sure you have the following installed: Go programming language (v1.18 or higher) AWS CDK AWS CLI Clone the project and change to the right directory: git clone https://github.com/abhirockzz/ai-ml-golang-comprehend-entity-detection cd ai-ml-golang-comprehend-entity-detection Use AWS CDK To Deploy the Solution The AWS Cloud Development Kit (AWS CDK) is a framework that lets you define your cloud infrastructure as code in one of its supported programming and provision it through AWS CloudFormation. To start the deployment, simply invoke cdk deploy and wait for a bit. You will see a list of resources that will be created and will need to provide your confirmation to proceed. cd cdk cdk deploy # output Bundling asset ComprehendEntityDetectionGolangStack/comprehend-entity-detection-function/Code/Stage... ✨ Synthesis time: 4.32 //.... omitted Do you wish to deploy these changes (y/n)? y Enter y to start creating the AWS resources required for the application. If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder. You can keep track of the stack creation progress in the terminal or navigate to the AWS console: CloudFormation > Stacks > ComprehendEntityDetectionGolangStack. Once the stack creation is complete, you should have: An S3 bucket - Source bucket to upload text file A Lambda function to execute entity detection on the file contents using Amazon Comprehend A DyanmoDB table to store the entity detection result for each file A few other components (like IAM roles, etc.) You will also see the following output in the terminal (resource names will differ in your case). In this case, these are the names of the S3 bucket and the DynamoDB table created by CDK: ✅ ComprehendEntityDetectionGolangStack ✨ Deployment time: 139.02s Outputs: ComprehendEntityDetectionGolangStack.entityoutputtablename = comprehendentitydetection-textinputbucket293fcab7-8suwpesuz1oc_entity_output ComprehendEntityDetectionGolangStack.textfileinputbucketname = comprehendentitydetection-textinputbucket293fcab7-8suwpesuz1oc ..... You can now try out the end-to-end solution! Detect Entities in Text File To try the solution, you can either use a text file of your own or the sample files provided in the GitHub repository. I will be using the S3 CLI to upload the file, but you can use the AWS console as well. export SOURCE_BUCKET=<enter source S3 bucket name - check the CDK output> aws s3 cp ./file_1.txt s3://$SOURCE_BUCKET aws s3 cp ./file_2.txt s3://$SOURCE_BUCKET # verify that the file was uploaded aws s3 ls s3://$SOURCE_BUCKET This Lambda function will extract detect entities and store the result (entity name, type, and confidence score) in a DynamoDB table. Check the DynamoDB table in the AWS console: You can also use the CLI to scan the table: aws dynamodb scan --table-name <enter table name - check the CDK output> Don’t Forget To Clean Up Once you're done, to delete all the services, simply use: cdk destroy #output prompt (choose 'y' to continue) Are you sure you want to delete: ComprehendEntityDetectionGolangStack (y/n)? You were able to set up and try the complete solution. Before we wrap up, let's quickly walk through some of the important parts of the code to get a better understanding of what's going on behind the scenes. Code Walkthrough We will only focus on the important parts - some code has been omitted for brevity. CDK You can refer to the complete CDK code here. bucket := awss3.NewBucket(stack, jsii.String("text-input-bucket"), &awss3.BucketProps{ BlockPublicAccess: awss3.BlockPublicAccess_BLOCK_ALL(), RemovalPolicy: awscdk.RemovalPolicy_DESTROY, AutoDeleteObjects: jsii.Bool(true), }) We start by creating the source S3 bucket. table := awsdynamodb.NewTable(stack, jsii.String("entites-output-table"), &awsdynamodb.TableProps{ PartitionKey: &awsdynamodb.Attribute{ Name: jsii.String("entity_type"), Type: awsdynamodb.AttributeType_STRING}, TableName: jsii.String(*bucket.BucketName() + "_entity_output"), SortKey: &awsdynamodb.Attribute{ Name: jsii.String("entity_name"), Type: awsdynamodb.AttributeType_STRING}, }) Then, we create a DynamoDB table to store entity detection results for each file. function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("comprehend-entity-detection-function"), &awscdklambdagoalpha.GoFunctionProps{ Runtime: awslambda.Runtime_GO_1_X(), Environment: &map[string]*string{"TABLE_NAME": table.TableName()}, Entry: jsii.String(functionDir), }) table.GrantWriteData(function) bucket.GrantRead(function, "*") function.Role().AddManagedPolicy(awsiam.ManagedPolicy_FromAwsManagedPolicyName(jsii.String("ComprehendReadOnly"))) Next, we create the Lambda function, passing the DynamoDB table name as an environment variable to the function. We also grant the function access to the DynamoDB table and the S3 bucket. We also grant the function access to the ComprehendReadOnly managed policy. function.AddEventSource(awslambdaeventsources.NewS3EventSource(sourceBucket, &awslambdaeventsources.S3EventSourceProps{ Events: &[]awss3.EventType{awss3.EventType_OBJECT_CREATED}, })) We add an event source to the Lambda function to trigger it when an invoice image is uploaded to the source bucket. awscdk.NewCfnOutput(stack, jsii.String("text-file-input-bucket-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("text-file-input-bucket-name"), Value: bucket.BucketName()}) awscdk.NewCfnOutput(stack, jsii.String("entity-output-table-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("entity-output-table-name"), Value: table.TableName()}) Finally, we export the S3 bucket and DynamoDB table names as CloudFormation output. Lambda Function You can refer to the complete Lambda Function code here. func handler(ctx context.Context, s3Event events.S3Event) { for _, record := range s3Event.Records { sourceBucketName := record.S3.Bucket.Name fileName := record.S3.Object.Key err := detectEntities(sourceBucketName, fileName) } } The Lambda function is triggered when a text file is uploaded to the source bucket. For each text file, the function extracts the text and invokes the detectEntities function. Let's go through it. func detectEntities(sourceBucketName, fileName string) error { result, err := s3Client.GetObject(context.Background(), &s3.GetObjectInput{ Bucket: aws.String(sourceBucketName), Key: aws.String(fileName), }) buffer := new(bytes.Buffer) buffer.ReadFrom(result.Body) text := buffer.String() resp, err := comprehendClient.DetectEntities(context.Background(), &comprehend.DetectEntitiesInput{ Text: aws.String(text), LanguageCode: types.LanguageCodeEn, }) for _, entity := range resp.Entities { item := make(map[string]ddbTypes.AttributeValue) item["entity_type"] = &ddbTypes.AttributeValueMemberS{Value: fmt.Sprintf("%s#%v", fileName, entity.Type)} item["entity_name"] = &ddbTypes.AttributeValueMemberS{Value: *entity.Text} item["confidence_score"] = &ddbTypes.AttributeValueMemberS{Value: fmt.Sprintf("%v", *entity.Score)} _, err := dynamodbClient.PutItem(context.Background(), &dynamodb.PutItemInput{ TableName: aws.String(table), Item: item, }) } return nil } The detectEntities function first reads the text file from the source bucket. It then invokes the DetectEntities API of the Amazon Comprehend service. The response contains the detected entities. The function then stores the entity type, name, and confidence score in the DynamoDB table. Conclusion and Next Steps In this post, you saw how to create a serverless solution using Amazon Comprehend. The entire infrastructure life-cycle was automated using AWS CDK. All this was done using the Go programming language, which is well-supported in AWS Lambda and AWS CDK. Here are a few things you can try out to extend this solution: Try experimenting with other Comprehend features such as Detecting PII entities. The entity detection used a pre-trained model. You can also train a custom model using the Comprehend Custom Entity Recognition feature that allows you to use images, scanned files, etc. as inputs (rather than just text files). Happy building!

By Abhishek Gupta CORE