Tech Blog

That Unix file is not deleted ๐Ÿ‘บ

Removing a file in Unix does not actually remove it from memory. You are only reducing the number of Hard-Links pointing to the file to ‘0’. In order to actually delete the information is to overwrite it. This will happen over time as the disk fills, or if one would like to delete a file in a timely fashion should over write empty drive sectors.

Studying for the LPIC1 Certification ๐Ÿ“–

Intro ๐Ÿ‘‹

I am intent on becoming a Linux Professional Institute Certified Administrator. It has been a goal of mine for a little while now. However goals are just aspirational, reaching a goal such as this requires a structured, systematic approach. As a life-long learner, I am always improving my study methods by being more efficient with my time and energy. As we all know life can get in the way of our goals, so we must find methods that work when conditions are not ideal for studying. For instance, I brake up my study sessions into smaller, more bite sized chunks that I can do when time permits. I take much more comprehensive notes than I did in the past and I work on wrought memorization through the use of flash cards; something that I really wasn’t into before.

Creating a Work Breakdown Structure

The first thing I do when starting to study a new topic is creating a simple structure from a high level view. A handful of bullet points can simply outline the structure of the knowledge. In regards to the LPIC, most of this had already been done as the curriculum for the exam is open source.

Hands on practice

For each item that the course cover, I set up a lab and run through all of the commands by actually typing them in and seeing what kind of errors or typos I may run into. The LPIC has a huge list of commands that we must remember which makes it very difficult for me.

Taking comprehensive notes

I recently started using LogSeq, a markdown ++ notetaking applicaton that allows for interlinking of notes. This has really upped my game when it comes to keeping track of notes and refering back to them. In the past, I would just take notes for the task at hand, not really refering back to them often. I was just taking notes as a memory retention aid.

Flash Cards

Another recent addition to my study arsenal is Anki, a smart flashcard application that I use for wrought memorization. The application and many flash card decks are open source, so I was up and running in a short period of time with a large deck of LPIC flashcards.

Udemy / Youtube

Though video courses are pretty low bandwidth information sources, I still like to follow chapterized courses, especially on Udemy. On a sale day I can grab a comprehensive chapterised video course for less than 20 Dollars, a steal in comparison to a college course.

ProxMox for the win ๐Ÿ†

Intro ๐Ÿ‘‹

Proxmox is a free, open-source virtualization host built on top of Debian. It can run both virtual machines (VMs) and Linux containers, offering a wide array of features. ๐Ÿš€

I’m always learning new systems and software. Before using Proxmox, I was repeatedly installing and reinstalling Unix systems on an old laptop. This helped me understand installation, configuration, and a host of other thingsโ€”but it was very tedious and inefficient. That laptop was underpowered, so running multiple VMs was out of the question. I needed a multicore x86 machine with ample memory to achieve this.

Wanting to take my systems engineering learning more seriously, I decided to build an x86 machine dedicated to learning without the fear of messing things up. While researching virtualization software, I discovered Proxmox.

Installation ๐Ÿ› ๏ธ

Installing Proxmox is straightforward, especially for someone who has installed many major distros on bare-metal systems. The process involves downloading the latest ISO, creating a boot disk using Balena Etcher, and then following a series of prompts, choosing the target disk and localeโ€”just like a typical Debian installation.

Accessing Proxmox ๐ŸŒ

Once installed, Proxmox operates as a headless network machine. If you plug in a monitor, you’ll only see a black screen showing the host addressโ€”nothing more. The way to interact with Proxmox is through a browser on a device connected to the same local network.

For example, I open a browser on my MacBook, navigate to the host’s IP address, and am greeted by the login screen. You would have set up your username and password during installation, so just log in from there. ๐Ÿ‘

PC Tower

The Dashboard ๐Ÿ“Š

Proxmox provides an easy-to-use dashboard that shows system load and memory usage at a glance. Creating a virtual machine is as simple as clicking “Create New VM” and uploading a suitable ISO. The real learning comes when you automate the process and manage multiple machines.

Proxmox Dashboard

Learning Platform ๐ŸŽ“

Proxmox gives you the freedom to create, break, and destroy VMsโ€”allowing you to learn new things without sweating the small stuff. Through it, I’ve gained a wealth of systems engineering knowledge by completing small projects outside of work.

I like to create machine templates that can be reused for automated provisioning using tools like Ansible and Terraform. This simulates setting up clusters of machines that need to be pre-configured and communicate with each other.

I also create Cloud-Init images in Qcow2 format to build templates with randomized SSH keys and uninitialized hostnames, much like how Azure, AWS, or Google Cloud sets up Platform-as-a-Service (PaaS) environments.

Through Proxmox, I’ve gone through installation and configuration procedures for large, complex systems like Kubernetes. This allows me to gain valuable experience with these cumbersome tools before working on similar setups in production environments. ๐Ÿ–ฅ๏ธ

In Summary ๐Ÿ“

If you’re serious about learning systems engineering and working with multiple machines, I highly recommend setting up a Proxmox machine. Even an older PC can suffice, although some limitations may applyโ€”but that’s part of the research and learning process! ๐Ÿ˜„

Linux File Systems Overview ๐Ÿ’พ

1. ext4 ๐Ÿ’พ (Linux, BSD)

Intro

ext4 (Extended File System version 4) is the default file system for many Linux distributions.

https://en.wikipedia.org/wiki/Ext4 https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/storage_administration_guide/ch-ext4#ch-ext4

Technical Features: ๐Ÿ”

  • Journaling
  • Extent-Based Allocation
  • Delayed Allocation
  • Persistent Pre-allocation
  • Multi-block Allocation
  • Online Resizing
  • 64-bit File System Support
  • Directory Indexing with HTree
  • Defragmentation
  • Backward Compatibility with ext2/ext3
  • Barriers for Data Integrity
  • Large File Support (up to 16 TiB)
  • Metadata Checksumming (optional)
  • Quotas

Advantages ๐Ÿ‘

  • Mature and Stable: ext4 is a well-tested and widely-used file system with a long history of stability.
  • Performance: It offers good performance for most workloads, especially for general-purpose usage.
  • Backward Compatibility: Supports ext3 and ext2 file systems, making it easy to upgrade.
  • Journaling: Provides a journaling feature that helps to prevent data corruption in case of a crash.
  • Wide Support: Supported by almost all Linux distributions and has a large community.

Downsides ๐Ÿ‘Ž

  • Limited Scalability: While adequate for most users, ext4 doesn’t scale as well as newer file systems for very large volumes and large numbers of files.
  • Lack of Advanced Features: ext4 lacks features like snapshotting and built-in data integrity checks (e.g., checksums).

Scale

  • Maximum File Size: 16 TiB
  • Maximum Volume Size: 1 EiB

Distro Usage

  • ext4 is the most widely used format spanning Linux and BSD.

2. XFS ๐Ÿ’พ (Linux, BSD)

Intro

XFS is a high-performance file system designed for parallel I/O operations, often used in enterprise environments.

https://en.wikipedia.org/wiki/XFS https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-xfs

Technical Features: ๐Ÿ”

  • Extent-Based Allocation
  • Journaling (Metadata Journaling)
  • Delayed Allocation
  • Persistent Pre-allocation
  • Online Resizing (grow only)
  • Dynamic Inode Allocation
  • B+ Tree Directory Structure - (Quick Access B Tree)
  • Direct I/O Support
  • Data Striping for Performance
  • Advanced Metadata Management
  • Large File and Volume Support (up to 8 EiB)
  • Online Defragmentation
  • Quotas and Project Quotas
  • Realtime Subvolume for Real-Time I/O

Advantages ๐Ÿ‘

  • High Performance: Optimized for large files and supports high-performance parallel I/O, making it ideal for environments with large data sets.
  • Scalability: Scales well with large volumes and large numbers of files, supporting file systems up to 500 TB.
  • Journaling: Uses journaling to help prevent data corruption.
  • Online Resizing: Supports online resizing of file systems (only grow).

Downsides ๐Ÿ‘Ž

  • Complexity: XFS is more complex to manage compared to ext4.
  • Limited Snapshot Support: Has limited support for snapshots compared to Btrfs and OpenZFS.
  • Potential Data Loss on Power Failure: In certain configurations, XFS may be more susceptible to data loss in the event of a sudden power loss.

Technical Details ๐Ÿ”

  • Maximum File Size: 8 EiB
  • Maximum Volume Size: 8 EiB

Distro Usage

XFS has been in the Linux Kernel since 2001 It is the default file system for RHEL

3. Btrfs ๐Ÿ’พ (Linux)

Intro

Btrfs (B-tree File System) is a modern, copy-on-write file system designed for Linux that offers advanced features like snapshots, RAID support, self-healing, and efficient storage management, making it suitable for scalable and reliable data storage.

https://en.wikipedia.org/wiki/Btrfs https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/storage_administration_guide/ch-btrfs https://docs.kernel.org/filesystems/btrfs.html

Technical Features: ๐Ÿ”

  • Journaling
  • Extent Base Allocation
  • Persistent Pre-allocation
  • Delayed Allocation
  • Multi-block Allocation
  • Stripe-aware Allocation
  • Resizeable with resize2fs
  • *B-tree Balancing Algorithm - Different from XFS (COW B Tee)
  • Copy-on-Write (COW)
  • Snapshots and Clones
  • Built-in RAID Support
  • Data and Metadata Checksumming
  • Self-Healing
  • Dynamic Subvolumes
  • Online Resizing
  • Compression (LZO. ZLIB. ZSTD)
  • Deduplication

Advantages ๐Ÿ‘

  • Snapshot Support: Provides built-in support for snapshots, allowing for quick backups and rollbacks.
  • Data Integrity: Includes checksumming of data and metadata, which helps to ensure data integrity.
  • Self-Healing: With RAID support, Btrfs can automatically repair corrupted data.
  • Dynamic Storage: Allows for the dynamic addition and removal of storage devices.

Downsides ๐Ÿ‘Ž

  • Stability: Btrfs is considered less mature than ext4 or XFS, particularly for certain features like RAID 5/6.
  • Performance: May not perform as well as XFS or ext4 in certain high-performance scenarios, particularly with heavy random writes.
  • Complexity: The advanced features of Btrfs come with increased complexity.

Technical Details ๐Ÿ”

  • Maximum File Size: 16 EiB
  • Maximum Volume Size: 16 EiB
  • Better on SSDs: Btrfs is well-suited for flash/solid-state storage because of TRIM support and CoW, which reduces write amplification.

Distro Usage

Btrfs has been in the mainline linux Kernel since 2008 it is the default file system for SUSE and Fedora

4. OpenZFS ๐Ÿ’พ (Unix)

Intro

OpenZFS is an advanced file system and volume manager that originated from Sun Microsystems’ ZFS and is now maintained by the OpenZFS project.

https://en.wikipedia.org/wiki/OpenZFS https://openzfs.org/wiki/Main_Page

Technical Features: ๐Ÿ”

  • Copy-on-Write (COW)
  • Snapshots and Clones
  • Pooled Storage (ZFS Storage Pools)
  • Dynamic Striping
  • Built-in RAID Support (RAID-Z1, RAID-Z2, RAID-Z3)
  • Data and Metadata Checksumming
  • Self-Healing
  • Deduplication
  • Compression (LZ4, GZIP, ZLE, etc.)
  • Online Resizing
  • Dynamic Block Size
  • End-to-End Data Integrity
  • ZFS Datasets (File Systems and Volumes)
  • Adaptive Replacement Cache (ARC)
  • Transparent Data Encryption
  • ZFS Send/Receive for Backup and Replication

Advantages ๐Ÿ‘

  • Data Integrity: Uses end-to-end checksums for all data, ensuring high data integrity.
  • Snapshots and Clones: Supports efficient, low-overhead snapshots and clones, useful for backups and development.
  • RAID-Z Support: Offers advanced RAID options (RAID-Z1, RAID-Z2, RAID-Z3), providing redundancy and fault tolerance.
  • Compression: Built-in compression can save space and improve performance in certain workloads.
  • Scalability: Designed to handle very large data sets and scales well with both size and number of files.

Downsides ๐Ÿ‘Ž

  • Resource Intensive: Can be resource-intensive, particularly in terms of memory usage.
  • Complexity: The advanced features and flexibility of OpenZFS come with a steep learning curve.
  • Portability: While available on many platforms, it is not as natively supported in Linux as ext4 or XFS.
  • Licensing: OpenZFS is licensed under CDDL, which is incompatible with the GPL.

Technical Details ๐Ÿ”

  • Maximum File Size: 16 EiB
  • Maximum Volume Size: 256 ZiB (theoretical)

Distro Usage

Open ZFS is Not available in the mainline Linux Kernel. Rather, it is available through a 3rd party module. Works on Linux, BSD, and Mac

HAMMER2 ๐Ÿ’พ (DragonflyBSD)

Intro

Hammer2 is a modern, advanced file system designed for high-performance and scalable storage solutions, particularly in clustered environments. It features robust capabilities such as copy-on-write, data deduplication, and built-in snapshots, providing high data integrity, efficient storage management, and instant crash recovery.

Wikipedia: HAMMER2 DragonFly BSD Hammer2

Technical Features: ๐Ÿ”

  • Clustered File System Support
  • Snapshot and Cloning Support
  • Copy-on-Write (COW)
  • Data Deduplication
  • Data Compression (LZ4, ZLIB)
  • Data and Metadata Checksumming
  • Multi-Volume Support
  • Instant Crash Recovery
  • Fine-Grained Locking (for SMP scalability)
  • RAID Support (1, 1+0)
  • Thin Provisioning
  • Asynchronous Bulk-Freeing
  • Large Directory Support
  • Built-in Data Integrity and Self-Healing

Advantages ๐Ÿ‘

  • High Performance: Optimized for high-performance and scalable storage solutions.
  • Data Integrity: Incorporates checksumming and self-healing features to maintain data integrity.
  • Efficient Storage Management: Offers advanced features like data deduplication and compression to manage storage efficiently.
  • Scalability: Designed to handle large volumes of data and support clustered environments.

Downsides ๐Ÿ‘Ž

  • Complexity: The advanced features and configuration options can introduce complexity.
  • Maturity: As a newer file system, it may have fewer tools and less mature support compared to more established file systems.
  • Limited Adoption: Less commonly used than other file systems, which may affect community support and documentation.

Technical Details ๐Ÿ”

  • Maximum File Size: Not explicitly defined, but supports very large files.
  • Maximum Volume Size: Not explicitly defined, but designed for large-scale storage.

Distro Usage

  • DragonFly BSD: The primary platform where Hammer2 is used and supported.
  • Limited Availability: Not available in mainstream Linux distributions; primarily associated with DragonFly BSD.

Key Concepts / Glossary ๐Ÿ”‘

Snapshots ๐Ÿ“ธ

  • Snapshots are read-only copies of a file system at a specific point in time, allowing users to save the state of the file system for backup and recovery purposes. They are efficient and consume minimal space, as only the differences between the current state and the snapshot are stored.

Clones vs. Snapshots ๐Ÿ“ธ๐Ÿงฌ

  • Snapshots: Read-only copies of the file system at a specific time.
  • Clones: Writable copies of snapshots that can be modified independently.

RAID-Z Levels โ›“๏ธ

  • RAID-Z1: Single parity; can tolerate the loss of one disk.
  • RAID-Z2: Double parity; can tolerate the loss of two disks.
  • RAID-Z3: Triple parity; can tolerate the loss of three disks.

RAID 5 and RAID 6 โ›“๏ธ

  • RAID 5: Stripes data across disks with single parity; can tolerate the loss of one disk.
  • RAID 6: Stripes data across disks with double parity; can tolerate the loss of two disks.

Issues with RAID 5/6 in Btrfs

Btrfs’s implementation of RAID 5/6 is considered unstable due to issues like the write hole problem, making it less reliable for production use. Data integrity may be compromised, leading to potential data loss.

CDDL License ๐Ÿชช

The Common Development and Distribution License (CDDL) is an open-source license created by Sun Microsystems. It is incompatible with the GPL, which can complicate integration with Linux.

Btrfs Self-Healing โค๏ธโ€๐Ÿฉน

Self-Healing in Btrfs works by verifying data against checksums and repairing any detected corruption using redundant data stored on other disks in a RAID configuration.

Dynamic Storage ๐Ÿงฑ

Dynamic Storage refers to the ability to manage multiple storage devices within a single file system, allowing for on-the-fly addition and removal of devices, with the file system automatically balancing data across them.

Online Resizing ๐Ÿ—บ๏ธ

Online Resizing allows the resizing of a file system while it is mounted and in use. XFS supports growing the file system online, while Btrfs supports both growing and shrinking.

B-Trees โš–๏ธ

A B-tree is a self-balancing tree data structure that maintains sorted data and allows efficient insertion, deletion, and search operations. B-trees are used in file systems like Btrfs to manage metadata and data blocks.

Extent Base Allocation ๐Ÿ‘ 

is a method used by modern file systems to manage data storage efficiently. Instead of tracking individual fixed-size blocks, the file system groups contiguous blocks into larger units called extents.

Persistent Pre-allocation ๐ŸŽŸ๏ธ

This technique reserves a specific amount of disk space for a file in advance, ensuring that the allocated space remains available, which helps in reducing fragmentation and guaranteeing storage for large files.

Delayed Allocation โฑ๏ธ

Delayed allocation defers the assignment of specific disk blocks to file data until the data is flushed to disk, optimizing the allocation process and reducing fragmentation by allowing the file system to make better decisions about where to place data.

Multi-block Allocation โ‹”

Multi-block allocation allows a file system to allocate multiple contiguous blocks at once, rather than individually, improving performance and reducing fragmentation, especially for large files.

Stripe-aware Allocation ๐Ÿง 

Stripe-aware allocation is used in RAID configurations to ensure that data is distributed evenly across all disks in the array, optimizing performance by aligning data placement with the underlying stripe size of the RAID setup.

Fine-Grained Locking (for SMP Scalability) ๐Ÿš€

Fine-grained locking applies locks at a granular level, allowing multiple processors to concurrently access different parts of the file system, enhancing performance and scalability in multi-core environments.

RAID 1+0 ๐Ÿ–‡๏ธ

RAID support includes configurations such as RAID 1 for data mirroring and RAID 1+0 for combining mirroring with striping to provide both redundancy and improved performance.

Thin Provisioning ๐Ÿ”ฎ

Thin provisioning allocates disk space on-demand rather than reserving it all upfront, optimizing storage utilization by only using the space actually required by data.

Asynchronous Bulk-Freeing ๐Ÿ—‘๏ธ

Asynchronous bulk-freeing performs large-scale space reclamation in the background, allowing the file system to manage deletions efficiently without impacting overall performance.

Large Directory Support ๐Ÿข

Large directory support enables efficient management of directories with a vast number of entries, using optimized data structures to ensure fast performance for directory operations.

My WebDev Preferences ๐Ÿ‘

Intro

This blog will likely be dominated by Linux talk as that is what I spend most of my time with. However, I do full-stack web development on occlusion. In fact, I have worked as a professional web Designer/Developer on and off, doing simple full stack projects in a range of languages. Recently, I have been stepping up my game with the intention of creating web applications with great GUIs, not only information pages. I think that with mature Javascript frameworks like NextJS or React and WASM (Web Assembly), we are going to see high performance desktop applications transition to being web-apps. This potential has me excited about learning more sophisticated tools.

GO

Golang is a dead simple systems focused language created by GOATED Unix contributors. When I started picking up Go a few years back, it was not really popular. Web development was dominated by NodeJS on the backend and there were no job posting for the skill. However, I tend to have good instincts and decided to eschew Node in favor of GO. I am very happy with this decision as it is clear, strongly typed, easy to set up and fast as hell. You can classify me as a GO enjoyer and back-ned deployer. GO now has a built-in HTTPs service and muxer, so it is my go-to for serving static and dynamic content.

TailWind

I really regret not checking out TailWind CSS earlier, it is so nice. I totally had the wrong idea and thought that it was just another annoying framework that over complicates and I was sorely wrong. It really helps in combination with GO templates to create beautiful responsive layouts. There is not much too a simple TW setup, just a simple call-out in the header. When things needs to be more expansive, one can use a config file to specify rules. I think I will be using a lot more TW in the future.

React

I am not a major fan of JS because I am a fan of simplicity, alas it is the language of the Web, so I must tango with wild syntax. Most web experiences require reactive interaction and content delivery. I am not locked into react as there are so many similar frameworks, but I am saying it is a must in ones stack.

VSCode

Having a customizable proper IDE is important for getting things done. As much as I like learning Vim, Helix, or even Sublime, I know that no-matter the level of configuration it wont even come close to a proper IDE that lints, detects bugs, has a large ecosystem of plug-ins and comfy features like workspaces and built in problem console. If you just want to make stuff, then reduce your cognitive overhead and utilize a user friendly IDE.