I’ve just started a new Security Engineering course created by Scott Champine through ProLUG. As a graduate of his Linux Administration course and an active contributor to the Professional Linux User Group, I felt compelled to make time for this new course—I’ve learned a great deal from his teachings in the past.
In regards to Cyber Security, Integrating protective measures throughout the system lifecycle to ensure the system maintains its mission/operational effectiveness, even in the presence of adversarial threats.
Question
Describe the CIA Triad.
Answer
The CIA Triad is a core model in systems security engineering.
Confidentiality – Preventing unauthorized disclosure of system data or resources, often enforced through access control, encryption, and information flow policies.
Integrity – Ensuring that system data and operations are not altered in an unauthorized or undetected way, including protection against both accidental and intentional modification.
Availability – Ensuring reliable access to system services and resources when required, even under attack or component failure.
Question
What is the relationship between Authority, Will, and Force as they relate to security?
Answer
In systems security engineering:
Authority is derived from policy and design requirements—what the system must enforce according to mission objectives, laws, or standards.
Will represents the commitment of system stakeholders to implement and maintain security measures.
Force is the application of engineered mechanisms—technical, administrative, or procedural—that ensure security objectives are realized in practice.
Question
What are the types of controls and how do they relate to the above question?
Answer
In systems security engineering, controls are safeguards built into the system to achieve security objectives. They align with Authority, Will, and Force as follows:
Administrative Controls – Derived from organizational policy (Authority) and guide design, personnel roles, and security governance.
Technical Controls – Engineered into the system as part of architecture and software/hardware features (Force), e.g., encryption, access enforcement, secure boot.
Operational Controls – Rely on human procedures and configurations to maintain secure operations (Will and Force), such as patch management and monitoring.
Physical Controls – Provide physical protection to system components (Force), e.g., secure facilities or tamper-evident hardware.
Find a STIG or compliance requirement that you do not agree is necessary for a server or service build.
Question
What is the STIG or compliance requirement trying to do?
Answer
The compliance requirement encourages users to set up automated CVE patch updates from trusted providers within a 24-hour timeframe.
Question
What category and type of control is it?
Answer This STIG is an administrative control. Since it is not built into the system by default, it must be applied and managed manually.
Question
Defend why you think it is not necessary. (What type of defenses do you think you could present?
Answer
Initially, I found it difficult to identify a STIG procedural that I disagreed with. However, after extensive review, I selected this one. I believe automated patching is not ideal, especially for production systems. Patches can introduce unexpected behaviors in dependent systems. Additionally, relying on automation can foster complacency or a lack of awareness over time.
Apache Server 2.4 UNIX Server Security Technical Implementation Guide :: Version 3, Release: 2 Benchmark Date: 30 Jan 2025
Vul ID: V-214270 Rule ID: SV-214270r961683_rule STIG ID: AS24-U1-000930
Severity: CAT II Classification: Unclass Legacy IDs: V-92749; SV-102837
Group Title: SRG-APP-000456-WSR-000187
Rule Title: The Apache web server must install security-relevant software updates within the configured time period directed by an authoritative source (e.g., IAVM, CTOs, DTMs, and STIGs).
Discussion: Security flaws with software applications are discovered daily. Vendors are constantly updating and patching their products to address newly discovered security vulnerabilities. Organizations (including any contractor to the organization) are required to promptly install security-relevant software updates (e.g., patches, service packs, and hot fixes). Flaws discovered during security assessments, continuous monitoring, incident response activities, or information system error handling must also be addressed expeditiously.
The Apache web server will be configured to check for and install security-relevant software updates from an authoritative source within an identified time period from the availability of the update. By default, this time period will be every 24 hours.
Check Text: Determine the most recent patch level of the Apache Web Server 2.4 software, as posted on the Apache HTTP Server Project website. If the Apache installation is a proprietary installation supporting an application and is supported by a vendor, determine the most recent patch level of the vendor’s installation.
In a command line, type "httpd -v".
If the version is more than one version behind the most recent patch level, this is a finding.
I am a generally curious person who enjoys learning new things. However, the shear volume of information available can lead to feeling overwhelmed, distracted and aimless.
From years of experience being an auto-didact, I have honed my craft of self directed study. Now, I create a solid learning plan that keeps me on track and feeling a sense of achievement.
A learning plan is a personal roadmap that outlines what to learn, how to learn it and when to reach certain milestones. I start with goals and work my way backwards from there.
Firstly Rust on Embedded is a different beast as the standard library is not used and memory safety is not on by default. However, there are still some advantages over a popular language like C or C++. HAL or Hardware Abstraction Layer separates the hardware from the code enabling more portable software that can compile to multiple architectures. Cargo improves development ergonomic by creating and managing the project and its dependancies. Thirdly, the build system is unified across platforms, so code will compile on Windows, Mac and Linux in the same way.
Rust on embedded systems is a different challenge, as it does not use the standard library, and memory safety is not enabled by default. However, it still offers several advantages over popular languages like C or C++.
One key benefit is the Hardware Abstraction Layer (HAL), which separates hardware-specific details from the code, enabling more portable software that can compile across multiple architectures. Additionally, Cargo enhances development ergonomics by simplifying project and dependency management.
Lastly, Rust’s unified build system ensures consistent behavior across platforms, allowing code to compile seamlessly on Windows, macOS, and Linux.
Is really interesting, it is the concept of mapping hardware and storing the map for API access.
HAL leverages Peripheral Access Crates (PACs), which are auto-generated Rust crates representing the registers and bitfields of a microcontroller. PACs allow safe and direct access to hardware registers while ensuring Rust’s strict type-checking and ownership rules are followed. HAL sits on top of PACs, abstracting these low-level details.
Rust embedded HALs adhere to the embedded-hal traits—a collection of interfaces defining common operations like GPIO pin control, SPI/I2C communication, timers, and ADC usage. By implementing these traits, HAL provides a uniform way to interact with hardware, regardless of the underlying platform.
HAL abstracts device-specific features into a user-friendly API. For example:
Configuring a GPIO pin involves selecting its mode (input, output, pull-up, etc.) without directly modifying hardware registers.
Communication protocols like SPI or I2C are exposed through easy-to-use Rust methods (read, write, transfer, etc.).
Cargo handles dependencies seamlessly using Cargo.toml. Developers specify libraries (called “crates”) with version constraints, and Cargo fetches and builds them automatically.
Cargo:
Ensures reproducible builds by generating a Cargo.lock file that locks dependency versions.
Community-driven ecosystem (e.g., crates.io) simplifies finding and using high-quality, maintained libraries.
I’m writing this post as I plan to develop hardware projects using Rust for embedded systems. The combination of Rust and RISC-V microcontrollers is a particularly exciting intersection. In my sights are the ESP32-C3 and Raspberry Pi Pico 2, both of which I’m considering for upcoming projects. Instead of dealing with messy C, slow MicroPython, or the limitations of TinyGo, Rust allows me to create clean and performant projects—something I always strive for. Stay tuned for more updates!
As a dedicated member of the Professional Linux User Group, I gain valuable insights into essential industry tools, processes, and procedures from professional engineers who work hands-on with major infrastructure.
This evening, Michael Pesa of Lambda Labs delivered an excellent talk on best practices with Kubernetes and GitOps, shedding light on the challenges faced by traditional orchestration approaches. What intrigued me most was the discussion on Talos OS and Chainguard, particularly their use of Software Bill of Materials (SBOM). The concept centers around stripping systems down to their bare essentials, which not only reduces vulnerabilities but also improves performance.
Talos OS is particularly fascinating because it eliminates many traditional system components like SSH, systemd, glibc, package managers, or a shell. Essentially, Talos is just the Linux kernel with several Go binaries. This streamlined approach significantly reduces vulnerabilities and minimizes the attack surface. As Michael mentioned in his presentation, many vulnerabilities stem from privilege escalation, container escapes, and memory hacking. Talos mitigates most of these threats by enforcing API-driven controls instead of relying on a shell and by utilizing private key-based authentication throughout.
I am excited to experiment with these tools in my homelab, where I aim to create a modern, declarative infrastructure with ephemerality at its core.
Despite the jokes or criticisms you may have heard, Kubernetes matters a lot. Once I understood the “Why,” I became much more motivated to learn the “How.” This is how I got started with Kubernetes using my Proxmox Home-Lab and K3S.
Firstly, I would like to illuminate the “Why,” as it’s an important philosophy to grasp before diving in. I firmly believe in understanding the “Why” before the “How.” 🧠
Many moons ago 🌛, internet infrastructure relied on operating systems to run services. Unix and Linux were preferred because they are multi-user environments suitable for serving files to requesters. In fact, the first implementation of TCP was written on a UNIX system running on a NextSTEP computer. 🖥️ This model worked for decades: more users meant more machines. However, issues arose. Machines would go down, causing cascading effects. ⚠️ Misconfigurations wreaked havoc, and machines often ran inefficiently, either wasting resources or straining hardware. 🔧
Virtualization revolutionized infrastructure by allowing computers to be divided into independent virtual machines (VMs). A VM is a fully self-contained operating system. This innovation enabled more efficient utilization of hardware, increased flexibility, and reduced downtime caused by hardware failures. 🚀
Containers brought another layer of efficiency and standardization. Unlike VMs, containers share the host operating system’s kernel but encapsulate applications and their dependencies. This reduces overhead and enables applications to run consistently across different environments. 🌍 Developers could now “build once, run anywhere,” making containers a key tool in modern infrastructure. 🛠️
As the use of containers exploded, managing them became increasingly complex. Deploying, scaling, monitoring, and maintaining hundreds or thousands of containers manually was impractical. This is where orchestration tools like Kubernetes stepped in, automating these tasks and ensuring applications are always running, balanced, and recoverable in case of failures. ✅
Understanding the “Why” is key to appreciating Kubernetes’ value. Here are some of the core reasons:
Monitoring 📊: Kubernetes provides tools and integrations to monitor your workloads, ensuring you can observe application health and performance in real time.
Logging 📝: Centralized logging in Kubernetes makes it easy to trace and debug issues across distributed systems.
Security 🔒: Kubernetes enhances security through role-based access control (RBAC), network policies, and automatic updates, reducing vulnerabilities in production systems.
Ephemerality 🌀: Kubernetes embraces the concept of ephemeral workloads, where containers can be replaced automatically if they fail, ensuring high availability.
Reproducibility 🔄: Kubernetes enables reproducible deployments by using declarative configurations, allowing you to deploy the same infrastructure consistently across environments.
By addressing these challenges, Kubernetes transforms the way infrastructure is managed and applications are deployed, making it a cornerstone of modern cloud-native computing. ☁️
John Champine2, an OpenShift Engineer, delivered a compelling two-hour presentation on Kubernetes and OpenShift. His anecdotes and technical insights were especially engaging, offering both rich historical context drawn from his personal experience and intricate details about shared resource management. 🖥️⚙️
I heavily utilize Proxmox VE to build out simulated production environments where I can practice various administrative and engineering tasks. In my homelab, I installed K3S and Talos to create a typical dev/testing/production environment. 🌐 One particularly unique workflow I used involved building custom Podman containers—yes, Podman! 🐋
It’s not widely known that podman play kube3 can create a manifest for use in Kubernetes pods. With this method, I could prototype and build out containers, functionally test them, and then publish them for declarative deployment. This approach felt incredibly slick to me, as bugs were ironed out during the process, and the final deployment was straightforward. ✅🛠️
Most of my work was done with K3S4, a Rancher5-based distro designed for low-resource environments. However, I also experimented with Talos OS, setting up multiple virtual machines in a configuration resembling a multi-machine/node environment—with a sprinkle of jank to keep things interesting. 🤖✨
This hands-on approach allowed me to deepen my understanding of Kubernetes while also refining workflows that integrate containerization and orchestration. 🚀
From these experiences, I have developed a deep respect for Kubernetes. I see it as the operating system of the internet—an innovation that will inspire other similar systems. 🌐 While newer technologies like MicroVMs and hybrid container/VM architectures are emerging, I believe they can be easily incorporated into orchestration schemes like Kubernetes. 🤖🛠️
Given this perspective, I think Kubernetes will remain relevant for a long time, much like Unix/Linux. 🐧 It simply makes sense given the strenuous demands of the modern internet and the ever-growing number of attacks and incidents. Such a resilient system enables greater efficiency, enhanced security, and improved situational awareness for everyone. 🔒🚀
Incident response is a structured approach to identifying, managing, and resolving unexpected events such as security breaches, system failures, or misconfigurations. It aims to minimize disruption, mitigate damage, and restore normal operations while implementing lessons learned to prevent future incidents.
Responding to incidents is a stressful event because it can involve many stakeholders and little time. This week we exercised our skills by live debugging in front of our peers on a remote host. The problems all related to failure modes and misconfiguration and the exercise was rewarding in that I learned a lot as always, and built some confidence.
Yesterday, our close-knit group completed an intensive 16-week hands-on course in Enterprise Linux Administration, culminating in a live incident response session.
Over the course of hundreds of hours, I pushed myself to go above and beyond in my studies and responsibilities. Along the way, I formed strong connections with like-minded peers, navigating the modern educational landscape of YouTube, Discord, and KillerCoda.
I am truly grateful to have stumbled upon this seemingly random community and to have experienced the structured, effective teaching methods of Scott Champine (Het Tanis), an experienced and traditional educator.
To understand the environment, we must first understand the platform. Discord is a communication platform that combines text, voice, and video chat, designed to create communities where people can interact in real-time. What makes it unique is its seamless integration of customizable servers, topic-specific channels, and robust tools for both casual conversation and collaborative work.
Working on discord harbored a comfortable sense of passive interaction. Unlike say Zoom, Skype or any other similar video communication platform, Discord allows for people to come and go as they please, have multiple presenters and open voice chat, replicating a real world meeting more closely.
This allowed for impromptu discussions / presentation, greatly improving the learning experience.
Early in the course, I applied my leadership skills by organizing a formal schedule for our study group meetings. These sessions covered course assignments in detail while also exploring related topics through collaborative, interactive projects.
The format was casual and engaging. I would share my screen to walk through scenarios while the group discussed the subject matter. Others also shared their screens, demonstrating tips and tricks in unison.
One of the most effective tools I introduced was a shared note using Etherpad. Similar to Google Docs, Etherpad allows multiple people to edit a document simultaneously. However, it stands out by enabling access without requiring sign-in credentials, making it easy to share with anyone.
These activities relied heavily on trust, as it would have been easy for someone to disrupt the sessions. My leadership skills were frequently tested by off-topic individuals or disruptive participants, but such issues were usually short-lived.
Coming into the course, I already had a solid understanding of Linux, backed by a few years of experience. Additionally, I had completed RWXRob’s (Rob Muhlestein’s) Beginner Boost DevOps course a year prior.
What set this course apart was its group-learning dynamic. During Rob’s course, I worked alone, building projects and debugging through hard-fought, self-directed methods like reading documentation, brute-forcing solutions, and referencing forums. In contrast, group work brought added motivation, inspiration, and a collaborative approach to problem-solving. It helped eliminate mundane, off-topic roadblocks, allowing us to focus on core learning and progress more efficiently.
Through the study group and community discussions, I’ve developed a strong connection with the ProLUG community and feel confident that I can rely on the server for discussions, questions, and troubleshooting. In the near future, I plan to give back by supporting future coursework and helping new learners navigate their journey.
I’m deeply grateful to Scott Champine (Het Tanis) for offering this free course and dedicating so much of his time to it. I’m equally thankful to the server members who joined the study group and dove headfirst into the intricacies of systems.
Systems engineering troubleshooting involves diagnosing and resolving complex issues within interconnected systems to ensure seamless operation and optimal performance. It requires a methodical approach to identify root causes, integrate solutions, and maintain system functionality while addressing both technical and process-related challenges.
Your management is all fired up about implementing some Six Sigma processes around the company. You decide to familiarize yourself and get some basic understanding to pass along to your team.
5S is a Japanese Lean approach to organizing a workspace, so that by making a process more effective and efficient, it will become easier to identify and expunge muda. 5S relies on visual cues and a clean work area to enhance efficiencies, reduce accidents, and standardize workflows to reduce defects. The method is based on five steps:
Identify and categorize common troubleshooting problems such as typos, illogical configurations, or vulnerabilities to establish clarity and prioritize issues. (Seiri)
Organize and catalog processes and procedures for addressing both routine and uncommon problem scenarios for quick access and consistency. (Seiton)
Validate and test all processes and procedures to ensure they function effectively and reliably. (Seiso)
Promote team familiarity by regularly practicing and drilling procedures, similar to incident response training, to build confidence and efficiency. (Seiketsu)
Apply the processes and procedures in real-world scenarios, evaluate their effectiveness, make necessary adjustments, and document improvements for future use. (Shitsuke)
By applying the 5S methodology to troubleshooting, the team can develop a shared understanding of how to consistently address issues, identify system failure points, and create opportunities for incremental improvement, fostering a sense of flow and efficiency.
input - anything entering the process or is required to enter the process to drive the creation of an output
outputs - service or product that is created by this process
events - predefined criteria or actions that cause a process to begin working.
tasks - activities are the heart. a unit of action within the process.
4.1 - Decisions are possible made during or for tasks.
The four layers of a process—inputs, outputs, events, and tasks—function similarly to how a computer program uses functions to process variables and produce results. However, in a Six Sigma process, these elements are more dynamic, encompassing both virtual and physical components, as well as steps that may be driven by either human actions or automated systems. This flexibility allows Six Sigma to address complex workflows that combine diverse inputs and tasks to achieve consistent and efficient outputs.
Looking at our operation as a series of processes with layers like this can help us to identify, refine and standardize processes into Standard Operating Procedures (SOP’s).
The phrase “high water mark” originates from marking a riverbank to indicate the highest water level reached during a season. This mark serves as a warning, signaling potential danger if the water rises beyond it in the future.
In the context of systems, the high water mark represents historically safe operational loads. If metrics indicate that this threshold has been exceeded, it should alert administrators to a potential issue.
For example, if the high water mark for daily memory usage was 14/18 GiB of RAM, and we observe the system suddenly using 16/16 GiB, this warrants attention as a potential problem requiring further investigation.
Control limits are essential tools for monitoring the stability of a process over time. The upper and lower control limits define the normal range within which a process output should remain when the process is operating correctly. If the output exceeds these boundaries, it signals a potential issue, indicating that the process may be out of control and requiring investigation or troubleshooting.
Incident An unplanned event that disrupts normal operations.
Problem The underlying cause of one or more incidents.
FMEA (Failure Mode and Effects Analysis): A method for identifying and prioritizing potential failure points in a process.
Six Sigma A data-driven methodology focused on improving processes by reducing defects and variability.
TQM (Total Quality Management): A management approach emphasizing continuous improvement and customer satisfaction across all organizational processes.
Post Mortem A retrospective analysis of an event to identify successes and areas for improvement.
Scientific Method A systematic process of forming hypotheses, testing them, and analyzing results to draw conclusions.
Iterative A repetitive approach to refining a process or solution through successive cycles.
Discrete Data Data that can only take specific, distinct values.
Ordinal Data with a meaningful order but no consistent interval (e.g., satisfaction ratings).
Nominal (Binary/Attribute): Categorical data without order, such as “yes/no” or “male/female.”
Continuous Data Data that can take any value within a range, such as temperature or time.
Risk Priority Number (RPN) A score in FMEA used to prioritize risks, calculated as Severity × Occurrence × Detection.
5 Whys A technique to identify the root cause of a problem by repeatedly asking “why” until the root cause is found.
Fishbone Diagram (Ishikawa) A visual tool used to identify and categorize potential causes of a problem.
Fault Tree Analysis (FTA) A deductive analysis method used to identify the root causes of system failures.
PDCA (Plan-Do-Check-Act) A cyclical process for continuous improvement in workflows or systems.
SIPOC A high-level process map identifying Suppliers, Inputs, Processes, Outputs, and Customers.
Ansible is an open-source automation tool used for configuration management, application deployment, and IT orchestration, enabling tasks to be executed on multiple systems simultaneously without the need for agents. It uses simple YAML-based playbooks and SSH for communication, making it efficient and easy to learn for managing infrastructure.
INI. YAML seems to be the clear choice as it allows for a more declarative inventory. However, while working in the study group it was easier to edit INI without making indentation errors.
I can add quite a few interesting things to an inventory file to make it more useful. Once the file reaches a certain size, it is better to break it out into separate unit files that are nested for things like Hosts, Host Variables, Production, Staging etc…
Some notable things I think make things pretty powerful are:
You have been noticing drift on your server configurations, so
you want a way to generate a report on them every day to validate the configurations are the
same.
We have 3 playbooks in the root of the directory. These playbooks utilize roles defined in the roles subdirectory.
Playbook 01 gathers facts about nfs using roles defined in a subdirectory called roles.
Playbook 02 gathers data from a target system using roles defined in a subdirectory called data-gather.
Playbook 03 updates and installs using roles defined in a subdirectory called packages_update/tasks & packages_install/tasks.
These playbooks incorporate roles based on specific conditions, executing the tasks defined within each role. When a role is included, the playbook inherits all its contents.
ansible servers -i hosts -u inmate -k -m shell -a “ls -l /tmp/somefile”
Pull down a github repo
git clone https://github.com/het-tanis/HPC_Deploy.git
cd HPC_Deploy
What do you see in here?
What do you need to learn about more to deploy some of these tools?
Can you execute some of these, why or why not?
Linux system hardening involves securing the system by reducing its attack surface through measures such as disabling unnecessary services, enforcing access controls, applying security patches, and using tools like OpenSCAP, STIG compliance frameworks, or the OSCAP Scanner. These tools help automate security audits, enforce compliance standards, and identify vulnerabilities to enhance system security.
Your security team comes to you with a discrepancy between
the production security baseline and something that is running on one of your servers in
production. There are 5 servers in a web cluster and only one of them is showing this
behavior. They want you to account for why something is different.
I am going to assume that I am new to the system in general and have very surface knowledge from fellow staff. I am also assuming we are working with a redhat based system.
If I do that see something distinctly different, I would employ a more sophisticated approach with difference checking.
Given that everything is a structured file, I can append the output from a working system and the goose 🪿 to a new file and run diff against them.
Your team has been giving you more and more engineering
responsibilities. You are being asked to build out the next set of servers to integrate into the
development environment. Your team is going from RHEL 8 to Rocky 9.4.
During the analysis and optimization phase, I would start a playbook with information gathered from previous phases.
I would build and run the playbook against VM templates until satisfied.
Given the prior phases, my Playbook would be robust and capable of the transition.
However, I would ensure a robust backup and rollback plan in the case something fails.
I would have a seperate playbook built that would validate performance against what I was observing during my VM experimentation.
Though the environment may differ from that of the VM, I would still be able to discern performance characteristics and notice any outlier differences.
Well when I am gathering a picture of my current security baseline, I can use some of these tools like dmesg and ss to see what possible attack surface I may have.