Know your hardware


, , ,

This article is about understanding different hardware for x86 architecture available in Market today. I have been wanting to do this for long time to understand what are the different processors, Motherboards, Architecture available.

I will start with intel, as it’s one of the largest Manufacturer of x86 Processors.

As of this writing intel has released 7th Generation processors named kabylake. So i thought before i start understanding these processors, we need to understand what you see in a typical x86 Intel Microprocessors. So this part of the article deals with basics of x86 processors and other hardware available in a x86 System.

In a typical old motherboard you will see this:


Typical MotherBoard Layout []

The main ingredients from the above image are :

CPU : Which Executes Instructions

NorthBridge: Transfers Data from Memory to CPU and vice versa through an interconnect called Front side bus.

SouthBridge: It’s an interconnect(I/O Hub) to which USB, PCI, Harddisks are connected.

Northbridge is a hub which transfers data from Soutbridge to RAM, RAM to CPU, and CPU to RAM and RAM to southbridge. Northbridge is connected to CPU through a bus[lots of wires] called Front Side Bus, It regulates traffic from CPU to RAM . In some systems we can find that Video card is also connected to Northbridge .

Southbridge is an I/O Hub where USB, PCI , and Harddisks are connected to. Northbridge needs to transfer data/instructions to Southbridge from RAM & CPU.

The data that is residing in Hard disk at one point has to be saved in RAM and from RAM to CPU to execute and back to RAM and back to hard disk. And same also for other devices attached to PCI slots, USB, graphics card. The data that is flowing through these devices to RAM and CPU need to be controlled through Northbridge which makes it very very important.

Also if we look at the above diagram the only way for data to flow in to CPU is through Front side bus and there is only 1 front side bus causing congestion. So the bus that connects between Northbridge to Southbridge, the front side bus(FSB) and the connections between Northbridge to RAM will have massive amounts of data flowing and will be heavily congested.

So if you have a high end graphics card connected to Graphics slot which is connected to Northbridge and you have usb pen drive where USB is connected to Southbridge , Don’t expect that your usb data transfer will be in full speed. Understanding the above architecture and bottle necks is very important

Next comes the CPU. CPU is the most expensive part of the whole PC. It’s where all the instructions are executed. What i want to concentrate more on CPU is it’s basic functionality. At very fundamental level it has the following ingredients

1. Program counter: This contains the address of the next instruction to be executed

2. Instruction Decoder: This decodes the instructions like what the instruction means, where it’s operands lie, Are they in CPU registers or in Memory, if in Memory where are they. So for instruction to be executed what all need to be done, Multiplication, Addition, Subtraction, Division.

3. Arithmetic Logical Unit: The decoded instruction is executed, The result is put on bus to be stored in RAM.

4. CPU has also General purpose registers where it can store data temporarily . The number of register and size of these registers vary in different architectures.

So let’s say we have an Instruction at some location in RAM (say address X)and Program Counter is currently pointing to that address X, the following sequence occurs:

  •  CPU fetches the instruction from address X
  •   Once the instruction is fetched it’s decoded
  •  Then it’s executed
  •  Put the result back in bus to be stored back in RAM.

When CPU is fetching the instruction, the other parts of the CPU i.e the Instruction Decoder, Arithmetic Logical Unit are all idle. To make effective use of CPU there is a concept called pipeline. i.e when CPU is fetching instructions, It makes sure that there are enough instructions for Instruction Decoder to Decode and Enough instructions to ALU to execute.

Note: The above is a very simplified explanation of CPU pipe-lining , There is a lot to it, you could find more on internet.

From our above basic explanation we will expand CPU more:


  1. Since instruction are in Memory(RAM), they need to be fetched from Memory and Execute. This fetching happens via FSB & Northbridge which do not run at the same speed as RAM. This fetching takes a lot of time, so most modern processors have caches where instructions from memory are fetched and stored locally so when it needs to execute it will fetch from cache instead of Memory.
    •  The other reason to have caches is because RAM is very slow compared to the speed at which CPU is executing instructions. The total time taken to wait for instruction to fetched from RAM to CPU is higher causing a lot of power wasted and making the processor slow.
    •  These caches inside the CPU are layered , First comes CPU Registers (eax, rax etc), Next comes L1 cache, Next comes L2 cache which is slightly bigger in size than L1 and you have L3 cache which is bigger than L2 Cache but not a lot (Mostly it would be in-order of 1 or 2MB)
  2. Instructions fetching from Cache are queued
  3. From the queue, instructions will be decoded
  4. The decoded Instructions are executed.

The basic need for cache is because CPU need to be fetched data/instructions quickly, it needs to access memory which is very fast and quick hence the cache. And also one more reason is Time Taken to execute the instruction is very less than time taken to fetch the instruction. So it’s important to keep the time to fetch the instruction as less as possible.

Some terms with regard to CPU cache.

  • cache hit: if the information CPU requested is available in cache it’s called cache hit
  • cache miss: If the information CPU requested is not available in cache and has to be accessed in RAM it’s called cache miss
  • Snoop: When cache is watching the bus(address lines) for transactions it’s called Snoop
  • Snarf: When cache is taking data from data lines(bus) , cache is said to have snarfed the data.
  •  Dirty Data: When data is modified within cache but not modified in Main memory
  • Stale Data: When data is modified in main memory but not modified in cache.

When cache is reading data, there are 2 ways to it can do this:

  • Look Aside
  • Look Through

When cache is writing data, there are 2 ways to do this:

  • Write Back
  • Write Through

Look aside: In look aside architecture, when CPU wants to fetch some data, both cache and main memory(RAM) will see the bus at the same time. If the information is available in cache it’s a HIT else it’s a MISS.

Look Through: Cache gets access of the bus before RAM and if the information is available in cache it’s HIT else it’s a MISS. The disadvantage with this policy is that when bus is being used to read memory, CPU has no access to cache and it has to wait till bus is freed.

Write Back: When Processor needs to write something , it first writes in cache. At this point cpu can continue with other tasks, cache will then update main memory.

WriteThrough: In this method processor writes to both cache and main memory

Components of Cache.

There are 3 components to cache namely:

  1. SRAM: Static Ram is a memory block which holds data.
  2. Tag RAM: It’s a small piece of SRAM which holds addresses of data stored in RAM
  3. Cache Controller: is the brains behind cache, it’s responsible for:
    • Performing snoops and snarfs
    • Updating SRAM and TRAM
    • Implementing write policy
    • Determine if the memory request is cacheable
    • Check if a request to cache is hit or miss

Organization of cache:

To fully understand cache organization two terms are required to be understood first :

  • cache page
  • cache line

Main memory(RAM) is divided in to equal pieces called page cache. The size of page is dependent on size of cache

Cache page is broken in to small pieces called cache line. Each line can store 4 to 64 bytes in it. During data transfer the whole line is read or written.


Methods of Cache organization:

  1. Fully-Associative: Any line in Main memory can be stored at any line in cache. In this method cache pages is not used, only lines are used.
    • Advantages:  A memory location can be stored at any line in cache
    • Disadvantages:  To search through the cache lines is complex
    • cache-2
  2. Direct Map Cache: In this method, Main memory is divided in to cache pages, The size of each page is equal to size of cache. Line 0 in Page 1 of Main memory can be stored in line 0 in page1 of Cache memory cache-3
  3. Set Associative: This is combination of Fully-Associative and Direct Mapped caching schemes. SRAM part of cache is divided in to equal size (2 or 4) called cache ways. The size of cache page is equal to size of cache way. Each cache way is like direct map cacheCache-4

In further articles i will delve more on cache implemenation on Processors , its operating modes etc.



Eflags Registers

This article talks about intel IA32 eflags register & some interesting things that i found out while studying more about these flags.  This article would be using gnu debugger(gdb) to show the status of eflags register.

First the theory about eflags registers:

eflags register in an IA32 processor stores various flags corresponding to the result of last instruction executed.

Not all instructions use eflags register like mov, bswap, xchg, but instructions like “inc” (increment), add (addition), mul, div instructions use eflags register.

First before we go further in to eflags, there are few points to remember.

  • We cannot examine the whole eflags register
  • There is no instruction that can be used to modify this register directly.
  • There are some instructions that can be used to modify certain bits of the register,  but they are beyond the scope of this article.

We will be looking at some of the flags of the register using simple examples:

  1. Carry Flag
  • Keeps the status of the final carry-out while computing the result of the last instruction set.
  • While adding 2-numbers the carry flag contains the carry-out of the most significant bit.
  • Example :
  • Adding 253 & 4,  For this example, we will use “al” register , which is lower 8-bits of EAX register
  • General Purpose Registers

    General Purpose Registers

    I choose this example specifically to view the Carry flag.   Since our number is less than 255 we will use lower 8 bits of eax register which is  al and will be adding 4 to 253 . Below is the sample code

Adding 2 numbers

Assembly Language Program in AT&T style

We assemble the above code using Gnu Assembler and loader.



we will use Gnu Debugger(gdb) to view the contents of the registers.

Gnu Debugger

Gnu Debugger

We will set the break point to line 4 and run the program , Type “n” to execute the line 4

Set break point and run the program

Set break point and run the program

Type “info registers”  at the gdb prompt to view the current value in registers

info registers

info registers

as we can see from the above figure, Register al is actually storing -3, instead of  253, this is because the range of numbers that can be stored in al is not from 0 to 255  but instead -128 to 127.

Type “n” or next to execute the line 5 of the program which adds 4 to register al.

CF is seen in gdb

CF is seen in gdb

when we do addition of 4 to -3 , the result is +1 , so the final value of register al is 0x01 which sets the Carry Flag (CF).  We can see from the above figure that eflags shows CF to be set as expected.

To check eflags register only we could type “info reg eflags” on gdb prompt.

2. Zero Flag

  • Zero flag is set to 1 if the result of the last flag-modifying instruction is 0


adding negative & positive number

adding negative & positive number

In the above code we set 0xfd, which is -3 value set in register al, and then we add +3 to it. So when processor executes line 5 , the resultant value is 0.  So processor sets ZF in the eflags register. We can view this when we run the above program through gdb.

I will cover the rest of the eflags in next article.

Using gdb layout when debugging Assembly Language Programs


, , , , , , ,

In my quest to learn programming, I have started my initial steps with assembly Language programming (ALP). I have been on this endeavour from quite some time.

This post is not about ALP though but about an important option of gdb called layout. layout helps all the newbies who are learning ALP a lot. Before i explain about this option , Consider the below program:



The above program calculates sum of two values (17 and 15) using sum function in alp. The output is saved in ebx register. Let us first assemble and link the program. We are assembling our source code using GNU Assembler with option -gstabs to debug the assembly code through gdb.

From man as:
–gstabs:Generate stabs debugging information for each assembler line. This may help debugging assembler code, if the debugger can handle it.

GNU Assembler


From the Figure-2 we could see that program “mysum” runs successfully. Let us Run the program through gdb. What we want to know is the following:

i) Values of registers while the code is being executed at each step
ii) Most importantly we want to see the code and registers at the same time.

To accomplish the above goals gdb provides a Text User interface using curses library to show source file. This feature is not limited to source file only but also shows assembly code(asm), registers(regs). In our case we would require to view assembly code and registers and that too at the same time. To view the registers as our asm code executes.



We first start with invoking the program mysum with gdb and setting the break point at the start function:



After setting the break point run the program by typing “run” (or just “r”) at the gdb prompt which will stop at the first break point , which in our case is the _start function. At this point lets invoke “asm” layout by typing “layout asm” at the gdb prompt:

asm layout would look like this:

layout asm


In Figure-5, we could see our asm code with more details, like address where the particular instruction resides, and the instruction. Our current program counter starts at 0x8048054 which is the start of our program. From here we will keep stepping through our code and view the register values.

To load the register layout type “layout regs” at the gdb prompt and gdb would automatically split the TUI to show both asm code and registers as shown below:



In Figure-6, Instruction to be executed is highlighted and also we could see the values of registers. We will step through the code by typing “step” (or in short “s”) command at gdb prompt which will execute the earlier instruction and the code to be executed next is highlighted:



In Figure-7,  we could see now our Program counter points to next instruction at address 0x8048056 which is to push 0xf (15 to stack). And also our register layout shows EIP is pointing to code to be executed and current ESP register value.

As we keep stepping through our code (use “s” at gdb prompt) and when our code enters sum function we should be able to see our base pointer  register value  (EBP) and if it’s saved with value of ESP register:



Our code has passed values 17 and 15 to the stack and in sum function we are copying these values to General puspose registers ecx and ebx. Figure-9 shows that ecx and edx registers have been loaded with 17 and 15 as mentioned in the code.



Keep stepping through the code and once code reaches to the end of sum function where we exit the function by popping the stack we could see the Register values at register layout Restore the stack pointer and returning back to _start):



Once we are back to _start function we see that the sum of 17 and 15 is stored in ebx register, load “1” to eax register and send the interrupt to call the exit system call. The output of the program i.e sum of values 17 and 15 can be viewed by check the status of exit system call which is value in ebx register.



I hope the above information would be useful for newbies while debugging assembly language code.

Note: “layout regs” doesn’t yet work on gdb version “gdb-7.2-51.el6.i686” on RHEL6 . It crashes gdb. Fedora 15 and latest rawhide has the fix . Hope later versions of the gdb on RHEL6 might have the fix.

Authenticating using polkit to access libvirt in Fedora 18


, , , , ,

From Fedora-18 there has been some noticeable changes to polkit. Policy kit helps access to certain privileged process to unprivileged applications or users in this case. I generally use systems with SELinux Enabled and also confine my users. Since most of my job requires testing various applications , I keep creating a lot of vm’s (RHEL5,RHEL6). For this virt-manager is my preferred application.

Recently i have been assigned with a new Intel Hardware which has hardware Virtualization enabled with 1TB Hard disk. So installed Fedora-18 to create VM’s. My requirement is i should be able to install vm’s using Non-root user and that too with user who’s confined.

  • Create a user
        $ useradd test  
  • Map this user to staff_u selinux user
        $ semanage login -a -s staff_u test
        Login Name           SELinux User         MLS/MCS Range        Service
        __default__          user_u               s0                   *
        ceres                sysadm_u             s0-s0:c0.c1023       *
        juno                 staff_u              s0                   *
        root                 root                 s0-s0:c0.c1023       *
        system_u             system_u             s0-s0:c0.c1023       *
        test                 staff_u              s0-s0:c0.c1023       *
  • login as test user and connect to libvirt socket using virsh
        [mniranja@mniranja mar20]$ ssh test@
        test@'s password: 
        Last login: Wed Mar 20 00:20:13 2013 from localhost
        [test@dhcp201-167 ~]$ id -Z
  • Connect to libvirt socket
        [test@dhcp201-167 ~]$ virsh -c qemu:///system
        error: authentication failed: Authorization requires authentication but no agent is available.
        error: failed to connect to the hypervisor

As you can see above it doesn’t allow to connect , In earlier versions of Fedora, you could use policy kit to create a authorization rule to connect to libvirt socket. Refer Libvirt documentation. This method is also called Policy Kit LocalAuthority. So on Fedora-16 system i had the following rule

        [root@reserved 50-local.d]# cat 50-org.example-libvirt-remote-access.pkla 
        [Remote libvirt SSH access]

The above would allow users of group “virt” to access libvirt and manage libvirt through policy kit action “org.libvirt.unix.manage” . The above rules are placed in file 50-org.example-libvirt-remote-access.pkla under directory “/etc/polkit-1/localauthority/50-local.d”.
I hoped the same would work on Fedora-18 but it doesn’t as Policy kit localAuthority has been removed totally, instead all the custom policy kit rules should be placed under /etc/polkit-1/rules.d/ directory. Syntax of writing rules has been changed and Java Script syntax need to be used. Refer DavidZ blog for more information regarding the change.

On Fedora-18 i managed to do the same by adding the following rule file 10.virt.rules created under /etc/polkit-1/rules.d directory

        [root@dhcp201-167 rules.d]# cat 10.virt.rules 
        polkit.addRule(function(action, subject) {
        polkit.log("action=" + action);
        polkit.log("subject=" + subject);
        var now = new Date();
        polkit.log("now=" + now)
        if (( == "org.libvirt.unix.manage" || == "org.libvirt.unix.monitor") && subject.isInGroup("virt")) {
        return polkit.Result.YES;
        return null;

Thanks To Gilbert , As you can see the above allows polkit action “libvirt.unix.manage” || “org.libvirt.unix.monitor” to all the users of group “virt”

  • Restart polkit service
        $ systemctl restart polkit.service
  • Add the user test to group virt
        $ usermod -aG virt test
  • login as test user and connect to libvirt using virsh
        [test@dhcp201-167 ~]$ id -Z
        [test@dhcp201-167 ~]$ id
        uid=1002(test) gid=1003(test) groups=1003(test),1001(virt) context=staff_u:staff_r:staff_t:s0-s0:c0.c1023
        [test@dhcp201-167 ~]$ virsh -c qemu:///system
        Welcome to virsh, the virtualization interactive terminal.
        Type:  'help' for help with commands
           'quit' to quit
  • Check the logs using journalctl
        [root@dhcp201-167 ~]# journalctl -xn
        -- Logs begin at Tue 2013-03-19 22:54:05 EDT, end at Wed 2013-03-20 00:43:25 EDT. --
        Mar 20 00:43:02 kernel: usb 1-1.3: Product: USB Optical Mouse
        Mar 20 00:43:02 kernel: usb 1-1.3: Manufacturer: PixArt
        Mar 20 00:43:02 kernel: input: PixArt USB Optical Mouse as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.3/1-1.3:1.0/input/in
        Mar 20 00:43:02 kernel: hid-generic 0003:0461:4E22.006D: input,hidraw0: USB HID v1.11 Mouse [PixArt USB Optical Mouse] on usb
        Mar 20 00:43:18 sshd[3722]: Accepted password for test from port 53789 ssh2
        Mar 20 00:43:18 systemd-logind[596]: New session 18 of user test.
        -- Subject: A new session 18 has been created for user test
        -- Defined-By: systemd
        -- Support:
        -- Documentation:
        -- Documentation:
        -- A new session with the ID 18 has been created for the user test.
        -- The leading process of the session is 3722.
        Mar 20 00:43:18 sshd[3722]: pam_unix(sshd:session): session opened for user test by (uid=0)
        Mar 20 00:43:25 polkitd[1688]: /etc/polkit-1/rules.d/10.virt.rules:2: action=[Action id='org.libvirt.unix.manage']
        Mar 20 00:43:25 polkitd[1688]: /etc/polkit-1/rules.d/10.virt.rules:3: subject=[Subject pid=3791 user='test' groups=test,virt,
        Mar 20 00:43:25 polkitd[1688]: /etc/polkit-1/rules.d/10.virt.rules:5: now=Wed Mar 20 2013 00:43:25 GMT-0400 (EDT)

Introduction to CTDB Cluster


, ,

Why CTDB ?

  • Traditionally Clustering involves a SAN connected to n nodes. The storage can be  accessed only by the nodes participating in the cluster and as the the need for more storage and users grow , space tends to be small and clustering becomes small
  • So we need a file system that can be accessed by arbitrary number of clients and not restricted to the systems participating in the cluster.  One of the answers to this problem is Distributed File system.
  • We need to distribute the existing shared storage using network protocols like NFS and cifs. With samba and CTDB we can achieve this goal of distributing the shared File  system using CIFS Protocol
  • CTDB is originally developed specifically as cluster Enhancement software and contains high availability, load balancing features which makes file services like samba, NFS and FTP cluster-able.

Basic Infrastructure of CTDB

  • Storage is attached to nodes participating in the cluster through FC or iscsi
  • Shared File system which supports POSIX-fcntl locks
      • IBM General Parallel File system (GPFS)
      • Global File system (GFS)
      • GNU Cluster File system (Gluster)
      • Sun’s Lustre
      • OCFS2

Basics of CIFS File system

  • CIFS (Common Internet File system is a standard remote file system access protocol for use over network, enabling groups of users to connect and share documents
  • CIFS is open, cross-platform based on SMB (Server Message Block) Protocol, which is native file-sharing Protocol in the Windows Operating system. On RHEL this is implemented using samba
  • CIFS runs over TCP/IP

Basics of Samba

  • Samba provides File and Print services for all the clients using SMB/CIFS protocols
  • Apart from File and Print services, it also does Authentication and authorization, Name
  • Resolution and Service Announcement
  • File and Print services are provided by the smbd daemon
  • Name resolution and browsing is provided by nmbd daemon
  • Configuration file is /etc/samba/smb.conf

TDB (Trivial Database)

  • Samba keeps track of all the information needed to serve clients in a series of *.tdb files
  • located in /var/lib/samba or /var/cache/samba
  • Some of the TDB files are persistent
  • TDB files are very small like Berkely database files
  • Allow multiple simultaneous writes

Example TDB Files:

  • account_policy.tdb    NT account policy settings such as pw expiration
  • brlock.tdb  Byte-range locks
  • connections.tdb Share connections (used to enforce max connections, etc…)
  • Messages.tdb Samba messaging system

What Does CTDB do ?

  • CTDB (Clustered Trivial Data Base) is a very thin and fast database that is developed for samba to make clusterize samba.
  • What CTDB does is to make it possible for Samba to run and serve the same data from several different hosts in network at the same time
  • which means samba becomes clustered service and and are active and exports the samba shares, read-write operations at the same time making it high-available.
  • To do above we require a method of communication (IPC) for samba daemons running between nodes and share some persistent data (TDB files). Some of the information that should be shared are:
  • User information
  • For samba acting as a member server of a Domain, The Domain SID should be shared
  • The user mapping tables Mapping of Unix UID’s and GID’s to Windows Users and Groups
  • The active SMB-sessions and connections are shared
  • locking information like byte-range locks granted exclusively to users to access a particular file have to be shared between all the nodes These locks are Windows Locks i.e when Multiple windows/samba clients access files these locks are given by smbd daemon so it makes sense to share these locks between smb daemons on different nodes

Sample Diagram on CTDB Messages are shared between 2 CTDB Clusters:

Below are the list of TDB files that are to be shared between CTDB Clusters:

  • SMB Sessions (sessionid.tdb)
  • share connections (connections.tdb)
  • share modes (locking.tdb)
  • byte range locks (brlock.tdb)
  • user database (passdb.tdb)
  • domain Join Information (secrets.tdb)
  • id mapping tables (winbind_idmap.tdb)
  • registry (registry.tdb)

Requirements to configure CTDB cluster on RHEL6

    • GFS Packages
    • HA Packages
    • ctdb, samba
    • ctdb-tools

Configuring samba to use CTDB

  • We require 2 separate networks, one internal network through which CTDB daemons communicate and one public network through which it offers cluster services like samba, NFS etc.
  • Install samba and CTDB Packages

$ yum install samba ctdb tdb-tools

  • Configure /etc/samba/smb.conf to make samba cluster aware, Add the below lines in “global” section of smb.conf

clustering = yes
idmap backend = tdb2

  • CTDB Cluster configuration

/etc/sysconfig/ctdb: is the primary configuration file and it contains startup parameters for ctdb. The important parameters are:


This parameter specifies the file that needs to be created and should contain list of
Private IP address that CTDB daemons will use in the cluster. It should be a private
non-routable subnet which is only used for cluster traffic. This file must be same on all
nodes in the cluster.
contents of /etc/ctdb/nodes :


This parameter specifies the lock file that the CTDB daemons use to arbitrate which
node is acting as a recovery master. This file must be held on shared storage so that all
CTDB daemons in the cluster will access/lock the same file.


This parameter specifies the name of the file which contains the list of public addresses that particular node can host. While running, the CTDB cluster will assign each public address that exists in the entire cluster to one node that will host that public address. These are the addresses that the SMBD daemons and other services will bind to and which clients will use to connect to the cluster

Example 3 node cluster:

Content of /etc/ctdb/public_addresses: eth0 eth0 eth0

Configure it as one DNS A record (==name) with multiple IP addresses and let round- robin DNS distribute the clients across the nodes of the cluster

The CTDB cluster utilizes IP takeover techniques to ensure that as long as at least one node in the cluster is available, all the public IP addresses will always be available to clients.

/etc/ctdb/events.d This is a collection of scripts that is called to by CTDB when certain events occur to allow for site specific tasks to be performed

  • Start CTDB daemon and let ctdb call the smbd daemon, samba daemon should not be started by init process

#chkconfig ctdb on
#chkconfig smb off
#chkconfig nmb off

  • Start the ctdb daemon

# service ctdb start

Example Diagram of a 3 Node CTDB Cluster:

How does CTDB work

  • On each node CTDB daemon “ctdbd” is running, samba instead of writing    directly to TDB databases it talks via local “ctdbd”
  • “ctdbd” negotiates the metadata for the TDB’s over the n/w
  • For actual read and write operations local copies are maintained on fast local storage
  • We have 2 kinds of TDB files Persistent &  Normal
  • Persistent TDB files should always be up-to-date and each node always has a updated copy. These TDB files are kept locally (LTDB) on the local storage and not on the shared storage. So the read and write operations are faster
  • The node when wants to write to Persistent TDB, it locks the whole database , perform read and write operations and the transaction commit operations is finally distributed to all nodes and also written locally
  • Normal TDB files are maintained temporarily . The idea is that each node doesn’t have to know all the records of a database. It’s sufficient to know the records which affect it’s own client  connections, so when the node goes down it is acceptable to lose those records
  • Each node carry certain roles
    • DMASTER (data master)
      • Current, authoritative copy of a record
      • Moves around as nodes write to a record
    • LMASTER (Location Master)
      • knows the location of the DMASTER
      • Knows where the record is stored
  • Only one node has the current authoritative copy of a record, i.e data master
      • Step-1: Get a lock on a record in TDB
      • Step-2: Check if we are on Data master
        • if we are DMASTER for this record
        • then operate on the record and unlock it when finished
      • Step-3: if we are not DMASTER for this record unlock the record
      • Step-4: Send a request to the local CTDB daemon to request the record to be migrated on to this node
      • Step-5 once we get a reply from local “ctdb” daemon that the record is now   locally available, go to step-1


  • CTDB assigns IP address from the pool (CTDB_PUBLIC_ADDRESS) to the healthy node
  • When the node goes down IP is moved to other node
  • Client reconnects to the new node using tickle ACKs if the below conditions are met :
    • Node goes down
    • Client doesn’t know yet that ip has moved
    • New node sends TCP ACK with seq 0 to the client
    • client sends correct ACK to the client
    • New node resets the connection using RST
    • client re-establishes connection to new node
  • recovery master – performs recovery , collects most recent copy from all nodes and recovery master becomes data master.
  • Recovery master is determined by the election process, the RECOVERY_LOCK file acts as arbitrator and nodes compete to get a lock (POSIX fcntl byte-range) on that file.
  • If the Recovery master node is gone, we need to assign the role to a new node.

Commands to manage CTDB

$ctdb status: status command provides basic information about the cluster and the status of the nodes.

$ctdb ping: This command tries to ping each of the CTDB daemons in the cluster

$ctdb ip: This command prints the current status of the public ip addresses and which physical node is currently serving that ip

$onnode: onnode is used to run commands on ctdb nodes

$onnode all pidof ctdbd
$onnode all netstat -tn | grep 4379

CTDB Status Messages:

“ctdb status” specifies the node status , There are 5 possible states:

  • ok This node is fully functional
  • DISCONNECTED This node could not be connected through the network and is currently not participating in the cluster.
  • UNHEALTHY ctdbd daemon is running but the service provided by ctdbd has failed
  • BANNED Too many failed too many recovery attempts and is banned from participating in cluster for a period of “RecoveryBanPeriod” seconds
  • STOPPED A node that is stopped does not host any Public IP address and is not part of the cluster


  • ctdb log file /var/log/log.ctdb
  • Output of “ctdb status” and “onnode” commands output would be helpful
  • /var/log/samba contains logs related to smbd daemon
  • If needed tcpdump on port 4379 can be taken, wireshark is capable of identifying CTDB protocol and can display various CTDB status
  • testparm output to check if clustering is enabled ?


Man Pages:

man ctdb
man ctdbd
man onnode



Q) When CTDB itself a cluster why do we require HA Packages to be installed like cman ?

CTDB will not work without Red Hat Cluster Suite. As CTDB requires gfs and gfs in-turn
requires cman to start dlm_controld and gfs_controld. So CMAN is pre requisite for CTDB

Q)How does CTDB solve the split-brain problem

Well this problem doesn’t arise in the first place as CTDB is all-active node and not active/passive setup where passive nodes suddenly become active.

Q)How to identify which node is actually serving or which node is Data master

The ip to which client is connected is the data master, i.e from the pool of public addresses , to which ever ip the client connects and the node to which the ip is assigned becomes Data  master (DMASTER)

Q) How to identify which node is recovery master (RMASTER)

The node which holds the lock file . The lock file is the file which is saved in the shared file

Using Openssl on RHEL6 in FIPS-140 mode and generating Certificates.



For long time I have been trying to understand  FIPS-140 Certification and it’s effects. Today, I finally got to  configure RHEL6 system in fips mode and use openssl commands. Before we go and play with it, A brief Intro on what  FIPS and Openssl is.

FIPS-140 standard specifies the security requirements for a cryptographic module utilized within a security system protecting sensitive information in computer and telecommunication systems.   US national Institute of Standards and Technology(NIST) publishes FIPS series of standards for the implementation of Cryptographic modules. The Cryptographic Module Validation Program (CMVP) validates cryptographic modules to Federal Information Processing Standard (FIPS) 140-2 and other cryptography based standards.

FIPS 140-2 is primarily of interest to U.S., Canadian, and UK government agencies which have formal policies requiring use of FIPS 140 validated cryptographic software.

Products that have received a NIST/CSE validation are listed on the Cryptographic Module Validation List at

OpenSSL is a Open Source software Implementing SSLv2/V3, TLS protocols and also provides general purpose Crypto libraries (aka libcrypto, libssl etc).

The intention of this article is to specify on how fips should be enabled on RHEL6 and to use approved ciphers with openssl.

Before we start using openssl and use FIPS approved security functions, The operating system has to be brought under fips mode, For that we need to rebuilt the initramfs with fips , prelink should be undone on all the libraries. I have enumerated the steps below.

Below are the steps to put RHEL6 system in FIPS mode  and use openssl with fips approved security functions.

Disable prelinking

change the line "PRELINKING=yes" to "PRELINKING=no" in /etc/sysconfig/prelink

For libraries that were already prelinked, the prelink should be  undone on all the system files using the following command:

$ prelink -u -a

initramfs should be regenerated with fips , to do that install dracut-fips package

$ yum install dracut-fips

Edit /etc/grub.conf  and add fips=1 to the end of the “kernel” line and reboot the system

kernel /vmlinuz-2.6.32-131.0.15.el6.x86_64 ro root=/dev/mapper/myvg-rootvol rd_LVM_LV=myvg/rootvol rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto fips=1 

For generating Certificates, openssl should be used only with specific set of Approvied Security Functions. For the list of Approved Security functions  that can be used refer NIST

In Brief below below algorithms can be used for signing, hashing and encyrption:

  • Symmetric Key (AES, TDEA and EES)
  • Asymmetric Key (DSS – DSA, RSA and ECDSA)
  • Secure Hash Standard (SHS)  Secure Hash Standard (SHS) (SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512)
  • Message Authentication (Triple-DES, AES and SHS)

To check if openssl is operating under fips mode,  issue the following

$ openssl md5 somefile

The above should fail as MD5 is not a fips approved Hash Standard.

$ openssl sha1 somefile

The above would work as SHA1 is the fips Approved Hash Standard.

Lets generate Self-signed CA certificate

1. Generate the key

$ openssl genrsa  1024 > dhcp210-11.key

2. Convert the key to PKCS8 Format

The encryption used in the genrsa command cannot be used in the FIPS mode as it uses MD5 to convert the password to a key. We have to either write it unencrypted (no -des3 option) and then convert it using the ‘openssl pkcs8’ command.

if we need it encrypted, or generate the key  using -newkey option during the openssl req command which already writes it encrypted in the pkcs8 format.

$ openssl pkcs8 -in dhcp210-11.key -topk8 -out dhcp210-11-enc.key -v1 PBE-SHA1-3DES

3. Create a Self signed CA certificate.

$ openssl req -new -x509 -key dhcp210-11-enc.key -out dhcp210-11.crt -days 366

or skip step-1 and 2 and generate key inplace (-newkey option) which encrypts private key using pkcs8 format

$ openssl req -new -x509 -newkey rsa:1024 -out dhcp210-11.crt -days 365




Renewing self signed CA Certs using certutil


, , ,

This is an how-to article on renewal of self-signed CA Certs using Certutil Commands. To create self signed Certificate authorities and other certificates , Refer the Mozilla Documentation.

As normal User or Server Certificates Expire, the CA certs also do expire after certain period. But one needs to know how to renew.

Since this How-to is based on mozilla NSS. I will explain with an example NSS database where a CA and user certs are created using certutil Commands.

$certutil -L -d /etc/pki/testca

Certificate Nickname          Trust Attributes
testca                   CTu,u,u
www                     u,u,u

testca is the CA certificate and www is a user cert

$certutil -L -d /etc/pki/testca -n testca | head -n 15
        Version: 3 (0x2)
        Serial Number: 0 (0x0)
        Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
        Issuer: "CN=rootca0,,C=US"
            Not Before: Tue Nov 01 02:29:56 2011
            Not After : Thu Dec 01 02:29:56 2011
        Subject: "CN=rootca0,,C=US" 

To view the private key, issue the below command :

 $ certutil -K -d /etc/pki/testca
certutil: Checking token "NSS Certificate DB" in slot "NSS User Private Key and Certificate Services"
 Enter Password or Pin for "NSS Certificate DB":
 < 0> rsa 2caa8cf41a5fc803902034710f59c296326cdcc8 NSS Certificate DB:testca
  < 1> rsa      99059e9f59b710edcee11d4bd32fd97977bc121e   NSS Certificate DB:www

From the above output you could see the Nick of the private key used by testca

The procedure to renew the testca Certificate is:

1. Create a certificate request using the same Private key

2. Get it signed by the Old CA

3. Add the newly signed certificate CA to NSS database

Creating a Certificate request using the same Private key:

$certutil -d . -R -k "NSS Certificate DB:testca" -s "CN=rootca0,,c=US" -a -o rootca.req
Brief Explanation of the command options:

-R:  Create a certificate-request file that can be submitted to a Certificate Authority (CA) for processing into a finished certificate. Output defaults to standard out unless you use 
-o output-file argument.
-s: subject of the Certificate ( Use the same Subject of earlier CA)
-m: serial Number
-v: Period in Months till which Certificate will be valid

Sign the Certificate Request

$certutil -C -d . -c "testca" -a -i rootca.req -t "CT,," -o cacert.crt  -m 0 -v 12

Add the Certificate to NSS database:

 $certutil -A -d . -n "testca" -a -i cacert.crt -t "CT,,"

List the CA cert to check the validity period

$certutil -L -d . -n testca

As you can see above , it lists both the certificates (old and new), Remove -a option in the above command to see in pretty print output

 Version: 3 (0x2)
 Serial Number:
 Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
 Issuer: "CN=rootca0,,C=US"
 Not Before: Tue Nov 01 03:17:32 2011
 Not After : Thu Nov 01 03:17:32 2012
 Subject: "CN=rootca0,,C=US"
 Subject Public Key Info:
 Public Key Algorithm: PKCS #1 RSA Encryption
 RSA Public Key:
 Exponent: 65537 (0x10001)
 Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
 Fingerprint (MD5):
 Fingerprint (SHA1):
 Certificate Trust Flags:
 SSL Flags:
 Valid CA
 Trusted CA
 Trusted Client CA
 Email Flags:
 Object Signing Flags:
 Version: 3 (0x2)
 Serial Number: 0 (0x0)
 Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
 Issuer: "CN=rootca0,,C=US"
 Not Before: Tue Nov 01 02:29:56 2011
 Not After : Thu Dec 01 02:29:56 2011
 Subject: "CN=rootca0,,C=US"
 Subject Public Key Info:
 Public Key Algorithm: PKCS #1 RSA Encryption
 RSA Public Key:
 Exponent: 65537 (0x10001)
 Signed Extensions:
 Name: Certificate Basic Constraints
 Data: Is a CA with no maximum path length.
 Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
 Fingerprint (MD5):
 Fingerprint (SHA1):
 Certificate Trust Flags:
 SSL Flags:
 Valid CA
 Trusted CA
 Trusted Client CA
 Email Flags:
 Object Signing Flags:

Validate the user Certificates

$ certutil -V -d . -u C -n www
certutil: certificate is valid
$ certutil -V -d . -u C -n testca
certutil: certificate is valid