Saturday, December 31, 2011

How to generate CA certificate for server & client communication?

■ Requirement : Generate CA certificate for server & client communication.
■ OS Environment : Linux
■ Application : openssl 
■ Implementation Steps :

1. Create certification authority :

$ cd /etc/newcerts
$ openssl genrsa 2048 > ca-key.pem
$ openssl req -new -x509 -nodes -days 1000 -key ca-key.pem > ca-cert.pem

NOTE: Last command will ask for details of certificate provider. So, provide short names

2. Creating certificate for server using above CA certificate :

$ openssl req -newkey rsa:2048 -days 1000 -nodes -keyout server-key.pem > server-req.pem
$ openssl x509 -req -in server-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > server-cert.pem

NOTE: First command may ask for a password. Don't provide it. Just press enter key for two times.

3. Creating certificate for client using above CA certificate(similar like server) :

$openssl req -newkey rsa:2048 -days 1000 -nodes -keyout client-key.pem > client-req.pem .
$openssl x509 -req -in client-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > client-cert.pem

NOTE : Provide details of client owner who will contact server.  Client will be able to contact to server using client-cert.pem and server will consult it its server-cert.pem and approve encryption.

Friday, December 30, 2011

How to install mysql server and configure SSL with it on linux?

■ Requirement: Install mysql-server & configure SSL for secure communication
■ OS Environment : Linux
■ Application : 

  • perl-DBD-MySQL-3.0007-2.el5
  • perl-DBI-1.52-2.el5
  • mysql-server-5.0.77-4.el5_6.6
  • mysql-5.0.77-4.el5_6.6
  • mysql-5.0.77-4.el5_6.6
  • openssl

■ Symptoms encountered : 

  •  ERROR 2026 (HY000): SSL connection error

■  Implementation Steps :

1. Download all above packages & install them :  

$ yum install mysql mysql-server openssl perl-DBD-MySQL perl-DBI -y
$ rpm -ivh  

2. Start mysql service :

$ service mysqld start

4. Change mysql root password :


$/usr/bin/mysqladmin -u root password 'mysql'

5. Configure SSL for mysql server and client(who will access server) :

$ mkdir -p /etc/mysql/newcerts
$ chown -R mysql:mysql /etc/mysql/newcerts


6. Creating certificate authority :

$cd /etc/mysql/newcerts
$ openssl genrsa 2048 > ca-key.pem
$ openssl req -new -x509 -nodes -days 1000 -key ca-key.pem > ca-cert.pem


7. Creating certificate for server using above CA certificate :

$ openssl req -newkey rsa:2048 -days 1000 -nodes -keyout server-key.pem > server-req.pem
$ openssl x509 -req -in server-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > server-cert.pem


8. Creating certificate for client using above CA certificate(similar like server) :

$ openssl req -newkey rsa:2048 -days 1000 -nodes -keyout client-key.pem > client-req.pem
$ openssl x509 -req -in client-req.pem -days 1000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 > client-cert.pem


9. Make sure following entries are present in /etc/my.cnf file :

[mysqld]

datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
old_passwords=1
ssl 


10. Restart mysqld & Grant mysql user to use ssl :

$service mysqld restart
$ mysql
$ GRANT ALL ON *.* TO 'mysql'@'%' IDENTIFIED BY 'mysql' REQUIRE SSL;

11. Verification / Testing :

$cd /etc/mysql/newcerts

$ mysql --ssl-cert=/etc/mysql/newcerts/ca-cert.pem --ssl-key=/etc/mysql/newcerts/client-key.pem --ssl-cert=/etc/mysql/newcerts/client-cert.pem -u root -p -v -v -v

Enter password: <<

 pw = mysql 

Output will look like below :

 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.0.77 Source distribution Reading history-file /root/.mysql_history Type 'help;' or '\h' for help. Type '\c' to clear the buffer. 

 mysql> show variables like '%%ssl%%';

--------------
show variables like '%%ssl%%'
--------------


+---------------+-------------------------------------+
| Variable_name | Value |
+---------------+-------------------------------------+
| have_openssl | YES |
| have_ssl | YES |
| ssl_ca | /etc/mysql/newcerts/ca-cert.pem |
| ssl_capath | |
| ssl_cert | /etc/mysql/newcerts/server-cert.pem |
| ssl_cipher | |
| ssl_key | /etc/mysql/newcerts/server-key.pem |
+---------------+-------------------------------------+
7 rows in set (0.01 sec)


mysql> SHOW STATUS LIKE 'Ssl_cipher';
--------------
SHOW STATUS LIKE 'Ssl_cipher'
--------------
+---------------+--------------------+
| Variable_name | Value |
+---------------+--------------------+
| Ssl_cipher | DHE-RSA-AES256-SHA | << Confirmed +---------------+--------------------+ 1 row in set (0.00 sec) mysql>

mysql> quit

Tuesday, November 29, 2011

How to configure rndc key with chrooted bind on linux?


■ Requirement : Configure rndc key with chrooted bind
■ OS Environment : Linux, RHEL 6.2, Centos
■ Implementation Steps :

1. Edit /etc/rndc.conf and add following lines :

options {
default-server 127.0.0.1;
default-key "rndckey";
};

server 127.0.0.1 {
key "rndckey";
};

key "rndckey" {
algorithm "hmac-md5";
secret "secret key will be placed here";
};

$ cd /var/named/chroot/etc/
$ dnssec-keygen -r /dev/urandom -a HMAC-MD5 -b 256 -n HOST rndc

5. Copy the key from private file and put it in /etc/rndc.conf at "secret" line.
6. Create a soft link :

$ln -s /var/named/chroot/etc/rndc.conf /etc/rndc.conf

8. Restart named and check status :

$service named restart

9. Verification : 

$rndc status

Output will look like :

version: 9.7.3-P3-RedHat-9.7.3-2.el6_1.P3.2
CPUs found: 1
worker threads: 1
number of zones: 20
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running

Tuesday, November 15, 2011

How to rotate sudo log?

■ Requirement : Rotate sudo log messages
■ OS Environment : Linux, RHEL, Centos
■ Assumption : 

  •    sudo log file = /var/log/sudolog
  •    log retention = 90 days

■ Implementation Steps :

1. Edit /etc/sudoers and add following lines :

Defaults !syslog
Defaults logfile = /var/log/sudolog

2. Rotating this log file :

edit /etc/logrotate.d/sudolog and put following :

/var/log/sudolog {
rotate 90
size 5M
postrotate
/usr/bin/killall -HUP syslogd
endscript
}

4. Restart syslogd service :

$service syslogd restart

Friday, September 23, 2011

Details about SUID, SGID and Sticky bit permission on linux os

■ Requirement : Define suid, sgid & sticky bit
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

1. SUID or setuid:
         change user ID on execution. If setuid bit is set, when the file will be executed by a user, the process will have the same rights as the owner of the file being executed.

2. SGID or setgid: 

        change group ID on execution. Same as above, but inherits rights of the group of the owner of the file on execution. For directories it also may mean that when a new file is created in the directory it will inherit the group of the directory (and not of the user who created the file).

3. Sticky bit.
       It was used to trigger process to "stick" in memory after it is finished, now this usage is obsolete. Currently its use is system dependant and it is mostly used to suppress deletion of the files that belong to other users in the folder where you have "write" access to.

4. Numeric representation :

Octal digit Binary value Meaning

0 000 setuid, setgid, sticky bits are cleared
1 001 sticky bit is set
2 010 setgid bit is set
3 011 setgid and sticky bits are set
4 100 setuid bit is set
5 101 setuid and sticky bits are set
6 110 setuid and setgid bits are set
7 111 setuid, setgid, sticky bits are set

file should have permission: 2644
dir should have permission : 2755

5. Textual representation :

SUID If set, then replaces "x" in the owner permissions to "s", if owner has execute permissions, or to "S" otherwise. Examples:
-rws------ both owner execute and SUID are set
-r-S------ SUID is set, but owner execute is not set

SGID If set, then replaces "x" in the group permissions to "s", if group has execute permissions, or to "S" otherwise. Examples:

-rwxrws--- both group execute and SGID are set
-rwxr-S--- SGID is set, but group execute is not set

Sticky If set, then replaces "x" in the others permissions to "t", if others have execute permissions, or to "T" otherwise. Examples:
-rwxrwxrwt both others execute and sticky bit are set
-rwxrwxr-T sticky bit is set, but others execute is not set

drwxrwxrwt - Sticky Bits - chmod 1777
drwsrwxrwx - SUID set - chmod 4777
drwxrwsrwx - SGID set - chmod 2777

What are the CPU states found in "top" output?

■ Requirement : Describe CPU states found in output of "top" command
■ OS Environment : Linux, RHEL, Centos
■ Resolution  : 

Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

# us -> User CPU time: The time the CPU has spent running users’ processes that are not niced.
# sy -> System CPU time: The time the CPU has spent running the kernel and its processes.
# ni -> Nice CPU time: The time the CPU has spent running users’ process that have been niced.
# wa -> iowait: Amount of time the CPU has been waiting for I/O to complete.
# hi -> Hardware IRQ: The amount of time the CPU has been servicing hardware interrupts.
# si -> Software Interrupts.: The amount of time the CPU has been servicing software interrupts.

Wednesday, September 21, 2011

How to check details of the rpm pacakge which is not installed yet?

■ Requirement : Check details of rpm package
■ OS Environment : Linux, RHEL, Centos
■ Resolution  : 

$ rpm -qpil  

To check details of installed package :

$ rpm -qi  


Check dependencies of package : 

$ rpm -qp -requires  

Saturday, September 17, 2011

How to access windows share from Linux machine

■ Requirement : Access windows share directory from linux system
■ OS Environment : windows, Linux, RHEL, Centos
■ Implementation Steps : 

1. Mount windows share using cifs file system : 

$mount -t cifs \\Win_IP\WIN_SHARE /mnt

Note : you should place windows IP and share directory in above & below command
put following entry in /etc/fstab

\\win_ip\winshare /mnt cifs credentials=/root/.smbpasswd 0 0

Details :

win_ip = win server name = win IP
winshare = share directory on windows.
/mnt = mounted directory on linux
/root/.smbpasswd = contains login credentials to access windows share
cifs = filesytem name

2. Manual Verification :

$smbclient -L //win_ip -U workgroup/win_user

3. Debugging Steps for cifs :

1) "dmesg -c" (clear the error log)
2) "echo 7 > /proc/fs/cifs/cifsFYI" (enabling cifs
informational/debug messages)
3) try the mount and examine the dmesg output ("dmesg")
4) capture tcpdump.

Friday, September 16, 2011

How do I determine if my x86-compatible Intel system is multi-processor, multi-core or supports hyperthreading?

■ Requirement : Check of processor is multi core or HT supported 
■ OS Environment : Linux, RHEL, Centos
■ Prerequisites : 

Physical ID (Physical processor or socket ID):

       The physical id value is a number assigned to each processor socket. The number of unique physical id values on a system tells you the number of CPU sockets that are in use. All logical processors (cores or hyperthreaded images) contained within the same physical processor will share the same physical id value.

Siblings (ie chield , logical processor):
       The siblings value tells you how many logical processors are provided by each physical processor.

Core ID (Core ID value) :

        The core id values are numbers assigned to each physical processor core. Systems with hyperthreading will see duplications in this value as each hyperthreaded image is part of a physical core. Under Red Hat Enterprise Linux 5, these numbers are an index within a particular CPU socket so duplications will also occur in multi-socket systems. Under Red Hat Enterprise Linux 4, which uses APIC IDs to assign core id values, these numbers are not reused between sockets so any duplications seen will be due solely to hyperthreading.

Core value (Number of core value ie how many core can be combined in one logical processor) : The cpu cores value tells you how many physical cores are provided by each physical processor.

Thread (Each core can contain max 2 threads in Intel arch) :
Number of threads.

■ Resolution :

$cat /proc/cpuinfo 
$ dmidecode.

How to check whether current running kernel is tainted(contaminated) or not ?

■ Requirement : check whether current running kernel is tainted(contaminated) or not
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

The Linux kernel maintains a"taint state" which is included in kernel error messages. The taint state provides an indication whether something has happened to the running kernel that affects whether a kernel error or hang can be troubleshoot effectively by analysing the kernel source code. Some of the information in the taint relates to whether the information provided by the kernel in an error message can be considered trustworthy.

1. Following command could be used :

$ cat /proc/sys/kernel/tainted
536870912

Use the following to decipher the taint value :

Non-zero if the kernel has been tainted. Numeric values, which can be ORed together:

1 - A module with a non-GPL license has been loaded, this includes modules with no license. Set by modutils >= 2.4.9 and module-init-tools.
2 - A module was force loaded by insmod -f. Set by modutils >= 2.4.9 and module-init-tools.
4 - Unsafe SMP processors: SMP with CPUs not designed for SMP.
8 - A module was forcibly unloaded from the system by rmmod -f.
16 - A hardware machine check error occurred on the system.
32 - A bad page was discovered on the system.
64 - The user has asked that the system be marked "tainted". This could be because they are running software that directly modifies the hardware, or for other reasons.
128 - The system has died.
256 - The ACPI DSDT has been overridden with one supplied by the user instead of using the one provided by the hardware.
512 - A kernel warning has occurred.
1024 - A module from drivers/staging was loaded.
268435456 - Unsupported hardware
536870912 - Technology Preview code was loaded

The taint status of the kernel not only indicates whether or not the kernel has been tainted but also indicates what type(s) of event caused the kernel to be marked as tainted. This information is encoded through single-character flags in the string following "Tainted:" in a kernel error message.

* P: Proprietary module has been loaded, i.e. a module that is not licensed under the GNU General Public License (GPL) or a compatible license. This may indicate that source code for this module is not available to the Linux kernel developers.
* G: The opposite of P: the kernel has been tainted (for a reason indicated by a different flag), but all modules loaded into it were licensed under the GPL or a license compatible with the GPL.
* F: Module has been forcibly loaded using the force option "-f" of insmod or modprobe, which caused a sanity check of the versioning information from the module (if present) to be skipped.
* S: SMP with CPUs not designed for SMP. The Linux kernel is running with Symmetric MultiProcessor support (SMP), but the CPUs in the system are not designed or certified for SMP use.
* R: User forced a module unload. A module which was in use or was not designed to be removed has been forcefully removed from the running kernel using the force option "-f" of rmmod.
* M: System experienced a machine check exception. A Machine Check Exception (MCE) has been raised while the kernel was running. MCEs are triggered by the hardware to indicate a hardware related problem, for example the CPU's temperature exceeding a treshold or a memory bank signaling an uncorrectable error.
* B: System has hit bad_page, indicating a corruption of the virtual memory subsystem, possibly caused by malfunctioning RAM or cache memory.
* U: Userspace-defined naughtiness.
* D: Kernel has oopsed before
* A: ACPI table overridden.
* W: Taint on warning.
* C: modules from drivers/staging are loaded.
* I: Working around severe firmware bug.

The taint flags above are implemented in the standard Linux kernel and indicate the information provided in kernel error messages is not necessarily to be trusted. Additionally, the following flags are used by the RHEL kernel:

* H: Hardware is unsupported.
* T: Technology Preview code is loaded.

How to find out which process is using swap space?

■ Requirement :  Find out process which consumes swap space
■ OS Environment : Linux, RHEL, Centos
■ Implementation Steps : 

1. If we would like to sort out the running or queueing process as per swap usage we can do like :

$ top

Then press capital "o" (ie "O") followed by "p" and press enter. Now processes should be sorted by their swap usage.

2. Use script : 
Use bash script to pick up the process from /proc file system.

#!/bin/bash
# Get current swap usage for all running processes
SUM=0
OVERALL=0
for DIR in `find /proc/ -maxdepth 1 -type d | egrep "^/proc/[0-9]"` ; do
PID=`echo $DIR | cut -d / -f 3`
PROGNAME=`ps -p $PID -o comm --no-headers`
for SWAP in `grep Swap $DIR/smaps 2>/dev/null| awk '{ print $2 }'`
do
let SUM=$SUM+$SWAP
done
echo "PID=$PID - Swap used: $SUM - ($PROGNAME )"
let OVERALL=$OVERALL+$SUM
SUM=0

done
echo "Overall swap used: $OVERALL"

Wednesday, September 14, 2011

How to check firmware version of ethernet device?

■ Requirement : Check firmware version of NIC
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

$ ethtool -i eth0

Saturday, September 10, 2011

Concept about Linux Page Cache and pdflush

■ Requirement :  Explanation on page cache & pdflush
■ OS Environment : Linux, RHEL, Centos
■ ReSolution : 

Concept about Linux Page Cache and pdflush :

          When we try to write data, Linux caches this information in an area of memory called the page cache. We can check this cache memory using free, vmstat or top command. Even we can get information in /proc/meminfo.

        As pages are written, the size of the "Dirty" section will increase. Once writes to disk have begun, you'll see the "Writeback" figure go up until the write is finished. It can be very hard to actually catch the Writeback value going high, as its value is very transient and only increases during the brief period when I/O is queued but not yet written.

pdflush (A kernel thread) :

           Linux usually writes data out of the page cache using a process called pdflush. At any moment, between 2 and 8 pdflush threads are running on the system. You can monitor how many are active by looking at /proc/sys/vm/nr_pdflush_threads. Whenever all existing pdflush threads are busy for at least one second, an additional pdflush daemon is spawned. The new ones try to write back data to device queues that are not congested, aiming to have each device that's active get its own thread flushing data to that device. Each time a second has passed without any pdflush activity, one of the threads is removed. There are tunables for adjusting the minimum and maximum number of pdflush processes, but it's very rare they need to be adjusted.

Tune pdflush :

Exactly what each pdflush thread does is controlled by a series of parameters in /proc/sys/vm:

1. /proc/sys/vm/dirty_writeback_centisecs (default 500): In hundredths of a second, this is how often pdflush wakes up to write data to disk. The default wakes up the two (or more) active threads every five seconds.

2. /proc/sys/vm/dirty_expire_centiseconds (default 3000): In hundredths of a second, how long data can be in the page cache before it's considered expired and must be written at the next opportunity. Note that this default is very long: a full 30 seconds. That means that under normal circumstances, unless you write enough to trigger the other pdflush method, Linux won't actually commit anything you write until 30 seconds later.

3. /proc/sys/vm/dirty_background_ratio (default 10): Maximum percentage of active that can be filled with dirty pages before pdflush begins to write them

Note that some kernel versions may internally put a lower bound on this value at 5%. So on the system above, where this figure gives 2.5GB, with the default of 10% the system actually begins writing when the total for Dirty pages is slightly less than 250MB--not the 400MB you'd expect based on the total memory figure.

4. /proc/sys/vm/dirty_ratio (default 40): Maximum percentage of total memory that can be filled with dirty pages before processes are forced to write dirty buffers themselves during their time slice instead of being allowed to do more writes.

Note that all processes are blocked for writes when this happens, not just the one that filled the write buffers. This can cause what is perceived as an unfair behavior where one "write-hog" process can block all I/O on the system. The classic way to trigger this behavior is to execute a script that does "dd if=/dev/zero of=hog" and watch what happens.

do like : #dd if=/dev/zero of=hog in one terminal and on other terminal do #watch cat /proc/meminfo

When does pdflush write?

       Data written to disk will sit in memory until either a) they're more than 30 seconds old, or b) the dirty pages have consumed more than 10% of the active, working memory.

Tuning Recommendations for write-heavy operations :

Important : The usual issue that people who are writing heavily encounter is that Linux buffers too much information at once, in its attempt to improve efficiency. This is particularly troublesome for operations that require synchronizing the file-system using system calls like fsync. If there is a lot of data in the buffer cache when this call is made, the system can FREEZE for quite some time to process the sync.

dirty_background_ratio: Primary tunable to adjust, probably downward. If your goal is to reduce the amount of data Linux keeps cached in memory, so that it writes it more consistently to the disk rather than in a batch, lowering dirty_background_ratio is the most effective way to do that. It is more likely the default is too large in situations where the system has large amounts of memory and/or slow physical I/O.

dirty_ratio: Secondary tunable to adjust only for some workloads. Applications that can cope with their writes being blocked altogether might benefit from substantially lowering this value. It is easier to encounter when reducing dirty_ratio setting below its default.

dirty_expire_centisecs: Test lowering, but not to extremely low levels. Attempting to speed how long pages sit dirty in memory can be accomplished here, but this will considerably slow average I/O speed because of how much less efficient this is. This is particularly true on systems with slow physical I/O to disk. Because of the way the dirty page writing mechanism works, trying to lower this value to be very quick (less than a few seconds) is unlikely to work well. Constantly trying to write dirty pages out will just trigger the I/O congestion code more frequently.

dirty_writeback_centisecs: Leave alone. The timing of pdflush threads set by this parameter is so complicated by rules in the kernel code for things like write congestion that adjusting this tunable is unlikely to cause any real effect. It's generally advisable to keep it at the default so that this internal timing tuning matches the frequency at which pdflush runs.

Statistical data :


$ free
total used free shared buffers cached
Mem: 4040360 4012200 28160 0 176628 3571348
-/+ buffers/cache: 264224 3776136
Swap: 4200956 12184 4188772
$

In this example the total amount of available memory is 4040360 KB. 264224 KB are used by processes and 3776136 KB are free for other applications. Don't get confused by the first line which shows that 28160KB are free. Using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices) helps the system to run faster because disk information is already in memory which saves I/O.

Swap memory : An addition memory taken from harddisk and this will be used in addition with RAM. Dirty data may reside here too and can be directly move to disk for writing.

Value can be viewed by :

grep SwapTotal /proc/meminfo
cat /proc/swaps
free


Shared Memory : A part of RAM which is used for sharing by processes. Shared memory allows processes to access common structures and data by placing them in shared memory segments. It's the fastest form of Interprocess Communication (IPC) available since no kernel involvement occurs when data is passed between the processes. In fact, data does not need to be copied between the processes.

Check shared memory settings : ipcs -lm
See all chared memory : ipcs -m
Details of segment : ipcs -m -i
Remove segment : ipcrm shm

Check semaphore value : ipcs -ls

Change its value : echo 250 32000 100 128 > /proc/sys/kernel/sem

Buffer cache : The is subset of pagecache which stores files in memory.

IO Request Queue Parameters:

nr_requests : This file sets the depth of the request queue. nr_requests sets the maximum number of disk I/O requests that can be queued up. The default value for this is dependent on the selected scheduler.

read_ahead_kb : This file sets the size of read-aheads, in kilobytes. the I/O subsystem will enable read-aheads once it detects a sequential disk block access. This file
sets the amount of data to be “pre-fetched” for an application and cached in memory to improve read response time.

The tunable variables for the cfq scheduler are set in files found under /sys/block// queue/iosched/. These files are:

quantum : Total number of requests to be moved from internal queues to the dispatch queue in each cycle.

queued : Maximum number of requests allowed per internal queue.

Prioritizing I/O Bandwidth for Specific Processes : When the cfq scheduler is used, you can adjust the I/O throughput for a specific process using ionice. ionice allows you to assign any of the following scheduling classes to a program:

• idle (lowest priority)
• best effort (default priority)
• real-time (highest priority)

For more information about ionice, scheduling classes, and scheduling priorities, refer to man ionice.

Deadline scheduler : The deadline scheduler aims to keep latency low, which is ideal for real-time workloads. On servers that receive numerous small requests, the deadline scheduler can help by reducing resource management overhead. This is achieved by ensuring that an application has a relatively low number of outstanding requests at any one time. The tunable variables for the deadline scheduler are set in files found under /sys/
block//queue/iosched/. These files are:

read_expire : The amount of time (in milliseconds) before each read I/O request expires. Since read requests are generally more important than write requests, this is the primary tunable option for the deadline scheduler.

write_expire : The amount of time (in milliseconds) before each write I/O request expires.

fifo_batch : When a request expires, it is moved to a "dispatch" queue for immediate servicing. These expired requests are moved by batch. fifo_batch specifies how many requests are included in each batch.

writes_starved : Determines the priority of reads over writes. writes_starved specifies how many read requests should be moved to the dispatch queue before any write requests are moved.

front_merges : In some instances, a request that enters the deadline scheduler may be contiguous to another request in that queue. When this occurs, the new request is normally merged to the back of the queue.

front_merges controls whether such requests should be merged to the front of the queue instead. To enable this, set front_merges to 1. front_merges is disabled by default (i.e. set to 0).


Anticipatory Scheduler: The tunable variables for the anticipatory scheduler are set in files found under /sys/ block//queue/iosched/. These files are:

read_expire :
The amount of time (in milliseconds) before each read I/O request expires. Once a read or write request expires, it is serviced immediately, regardless of its targeted block device. This tuning option is similar to the read_expire option of the deadline scheduler Read requests are generally more important than write requests; as such, it is advisable to issue a faster expiration time to read_expire. In most cases, this is half of write_expire. For example, if write_expire is set at 248, it is advisable to set read_expire to 124.

write_expire : The amount of time (in milliseconds) before each write I/O request expires.

read_batch_expire : The amount of time (in milliseconds) that the I/O subsystem should spend servicing a batch of read requests before servicing pending write batches (if there are any). . Also, read_batch_expire is typically set as a multiple of read_expire.

write_batch_expire : The amount of time (in milliseconds) that the I/O subsystem should spend servicing a batch of write requests before servicing pending write batches.

antic_expire : The amount of time (in milliseconds) to wait for an application to issue another I/O request before moving on to a new request.

What is I/O Scheduler for a Hard Disk on linux?

■ Requirement : Concept of IO scheduler
■ OS Environment : Linux, RHEL, Centos
■ Resolution  : 

         The 2.6 LinuxKernel includes selectable I/O schedulers. They control the way the Kernel commits reads and writes to disks – the intention of providing different schedulers is to allow better optimisation for different classes of workload.

Why does kernel need IO scheduler?

        Without an I/O scheduler, the kernel would basically just issue each request to disk in the order that it received them. This could result in massive HardDisk thrashing: if one process was reading from one part of the disk, and one writing to another, the heads would have to seek back and forth across the disk for every operation. The scheduler’s main goal is to optimise disk access times.

An I/O scheduler can use the following techniques to improve performance:

a)Request merging : The scheduler merges adjacent requests together to reduce disk seeking.
b)Elevator : The scheduler orders requests based on their physical location on the block device, and it basically tries to seek in one direction as much as possible.
c)Prioritisation : The scheduler has complete control over how it prioritises requests, and can do so in a number of ways

All I/O schedulers should also take into account resource starvation, to ensure requests eventually do get serviced!

How to view Current Disk scheduler ?

Assuming that we have a disk name /dev/sda, type :

$ cat /sys/block/{DEVICE-NAME}/queue/scheduler
$ cat /sys/block/sda/queue/scheduler

Sample output:

noop anticipatory deadline [cfq]

Here used scheduler is cfq.

How to set I/O Scheduler For A Hard Disk ?

To set a specific scheduler, simply type the command as follows:

$ echo {SCHEDULER-NAME} > /sys/block/{DEVICE-NAME}/queue/scheduler
For example, set noop scheduler, enter:
$ echo noop > /sys/block/hda/queue/scheduler

OR

Edit /boot/grub/grub.conf and enter in kernel line "elevator=noop" or any other scheduler available.

There are currently 4 available IO schedulers :

* No-op Scheduler
* Anticipatory IO Scheduler (AS)
* Deadline Scheduler
* Complete Fair Queueing Scheduler (CFQ)

A) No-op Scheduler : This scheduler only implements request merging.

B) Anticipatory IO Scheduler : The anticipatory scheduler is the default scheduler in older 2.6 kernels – if you've not specified one, this is the one that will be loaded. It implements request merging, a one-way elevator, read and write request batching, and attempts some anticipatory reads by holding off a bit after a read batch if it thinks a user is going to ask for more data. It tries to optimise for physical disks by avoiding head movements if possible – one downside to this is that it probably give highly erratic performance on database or storage systems.

C) Deadline Scheduler : The deadline scheduler implements request merging, a one-way elevator, and imposes a deadline on all operations to prevent resource starvation. Because writes return instantly within Linux, with the actual data being held in cache, the deadline scheduler will also prefer readers – as long as the deadline for a write request hasn't passed. The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance.

D) Complete Fair Queueing Scheduler (CFQ) : The complete fair queueing scheduler implements both request merging and the elevator, and attempts to give all users of a particular device the same number of IO requests over a particular time interval. This should make it more efficient for multiuser systems. It seems that Novel SLES sets cfq as the scheduler by default, as does the latest Ubuntu release. As of the 2.6.18 kernel, this is the default schedular in kernel.org releases. RHEL 6 uses default scheduler CFQ.

Changing Schedulers :

The most reliable way to change schedulers is to set the kernel option “elevator” at boot time. You can set it to one of “as”, “cfq”, “deadline” or “noop”, to set the appropriate scheduler. elevator=cfq

It seems under more recent 2.6 kernels (2.6.11, possibly earlier), you can change the scheduler at runtime by echoing the name of the scheduler into /sys/block/$devicename/queue/scheduler, where the device name is the basename of the block device, eg “sda” for /dev/sda.

Document refereed  : /usr/src/linux/Documentation/block/switching-sched.txt,

Wednesday, September 7, 2011

How sendmail works?

■ Requirement : How sendmail works
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

Outbound email :


1. MUA passes the email to sendmail , which creates in the /var/spool/mqueue (mail queue) directory two files that hold the message while sendmail processes it.
2. To create a unique filename for a particular piece of email, sendmail generates a random string and uses that string in filenames pertaining to the email.
3. The sendmail daemon stores the body of the message in a file named df (data file) followed by the generated string.
4. It stores the headers and other information in a file named qf (queue file) followed by the generated string.
5. If a delivery error occurs, sendmail creates a temporary copy of the message that it stores in a file whose name starts with tf (temporary file) and logs errors in a file whose name starts xf .
6. Once an email has been sent successfully, sendmail removes all files pertaining to that email from /var/spool/mqueue .

Incoming email :

1. By default, the MDA stores incoming messages in users' files in the mail spool directory, /var/spool/mail , in mbox format. Within this directory, each user has a mail file named with the user's username. Mail remains in these files until it is collected, typically by an MUA. Once an MUA collects the mail from the mail spool, the MUA stores the mail as directed by the user, usually in the user 's home directory hierarchy.

mbox versus maildir :

1. The mbox format stores all messages for a user in a single file. To prevent corruption, the file must be locked while a process is adding messages to or deleting messages from the file; you cannot delete a message at the same time the MTA is adding messages. A competing format, maildir , stores each message in a separate file. This format does not use locks, allowing an MUA to read and delete messages at the same time as new mail is delivered. In addition, the maildir format is better able to handle larger mailboxes

Mail logs :

# cat/var/log/maillog
...
Mar 3 16:25:33 MACHINENAME sendmail[7225]: i23GPXvm007224:
to=, ctladdr=
(0/0), delay=00:00:00, xdelay=00:00:00, mailer=local, pri=30514,
dsn=2.0.0, stat=Sent


Each log entry starts with a timestamp, the name of the system sending the email, the name of the mail server ( sendmail ), and a unique identification number. The address of the recipient follows the to= label and the address of the sender follows ctladdr= . Additional fields provide the name of the mailer and the time it took to send the message. If a message is sent correctly, the stat= label is followed by Sent .

Aliases and Forwarding :

Three files can forward email: .forward (page 634), aliases (discussed next ), and virtusertable (page 640). Table 20-1 on page 640 compares the three files.
Table 20-1. Comparison of forwarding techniques


.forward aliases virtusertable

Controlled by non root user root root

Forwards email
addressed to "non root user" "Any real or virtual user on the local system" "Any real or virtual user on any domain recognized by sendmail"

Order of precedence Third Second First

/etc/aliases

Most of the time when you send email, it goes to a specific person; the recipient, user@system , maps to a specific, real user on the specified system. Sometimes you may want email to go to a class of users and not to a specific recipient. Examples of classes of users include postmaster , webmaster , root , and tech_support . Different users may receive this email at different times or the email may be answered by a group of users. You can use the /etc/aliases file to map inbound addresses to local users, files, commands, and remote addresses.

Each line in /etc/aliases contains the name of a local pseudouser, followed by a colon , whitespace, and a comma-separated list of destinations. The default installation includes a number of aliases that redirect messages for certain pseudousers to root . These have the form

system: root


Sending messages to the root account is a good way of making them easy to review. However, because root 's email is rarely checked, you may want to send copies to a real user. The following line forwards mail sent to abuse on the local system to root and alex :

abuse: root, alex


You can create simple mailing lists with this type of alias. For example, the following alias sends copies of all email sent to admin on the local system to several users, including Zach, who is on a different system:

admin: sam, helen, mark, zach@love.com


You can direct email to a file by specifying an absolute pathname in place of a destination address. The following alias, which is quite popular among less conscientious system administrators, redirects email sent to complaints to /dev/null where they disappear:

complaints: /dev/null


You can also send email to standard input of a command by preceding the command with a pipe character ( | ). This technique is commonly used with mailing list software such as Mailman. For each list it maintains, Mailman has entries, such as the following entry for mylist , in the aliases file:

mylist: "|/usr/lib/mailman/mail/mailman post mylist"


newaliases

After you edit /etc/aliases , you must either run newaliases as root or restart sendmail to recreate the aliases.db file that sendmail reads.

praliases

You can use praliases to list aliases currently loaded by sendmail :

# /usr/sbin/praliases| head-5
postmaster:root
daemon:root
adm:root
lp:root
shutdown:root


~/.forward

Systemwide aliases are useful in many cases, but non root users cannot make or change them. Sometimes you may want to forward your own mail: Maybe you want mail from several systems to go to one address or perhaps you just want to forward your mail while you are working at another office for a week. The ~/.forward file allows ordinary users to forward their email.

Lines in a .forward file are the same as the right column of the aliases file explained previously: Destinations are listed one per line and can be a local user, a remote email address, a filename, or a command preceded by a pipe character ( | ).

Mail that you forward does not go to your local mailbox. If you want to forward mail and keep a copy in your local mailbox, you must specify your local username preceded by a backslash to prevent an infinite loop. The following example sends Sam's email to himself on the local system and on the system at tcorp.com :

$ cat ~sam/.forward
sams@xyz.com
\sam


Related Programs

sendmail

The sendmail package includes several programs. The primary program, sendmail , reads from standard input and sends an email to the recipient specified by its argument. You can use sendmail from the command line to check that the mail delivery system is working and to email the output of scripts.

mailq

The mailq utility displays the status of the outgoing mail queue and normally reports there are no messages in the queue. Messages in the queue usually indicate a problem with the local or remote sendmail configuration or a network problem.

# /usr/bin/mailq
/var/spool/mqueue is empty
Total requests: 0


mailstats

The mailstats utility reports on the number and sizes of messages sendmail has sent and received since the date it displays on the first line:

# /usr/sbin/mailstats
Statistics from Sat Dec 24 16:02:34 2005
M msgsfr bytes_from msgsto bytes_to msgsrej msgsdis Mailer
0 0 0K 17181 103904K 0 0 prog
4 368386 4216614K 136456 1568314K 20616 0 esmtp
9 226151 26101362K 479025 12776528K 4590 0 local
============================================================
T 594537 30317976K 632662 14448746K 25206 0
C 694638 499700 146185


In the preceding output, each mailer is identified by the first column, which displays the mailer number, and by the last column, which displays the name of the mailer. The second through fifth columns display the number and total sizes of messages sent and received by the mailer. The sixth and seventh columns display the number of messages rejected and discarded respectively. The row that starts with T lists the column totals, and the row that starts with C lists the number of TCP connections.

Setting Up a Backup Server

You can set up a backup mail server to hold email when the primary mail server experiences problems. For maximum coverage, the backup server should be on a different connection to the Internet from the primary server.

Setting up a backup server is easy. Just remove the leading dnl from the following line in the backup mail server's sendmail.mc file:

dnl FEATURE('relay_based_on_MX')dnl


DNS MX records (page 726) specify where email for a domain should be sent. You can have multiple MX records for a domain, each pointing to a different mail server. When a domain has multiple MX records, each record usually has a different priority; the priority is specified by a two-digit number, where lower numbers specify higher priorities.

When attempting to deliver email, an MTA first tries to deliver email to the highest-priority server. If that delivery attempt fails, it tries to deliver to a lower-priority server. If you activate the relay_based_on_MX feature and point a low-priority MX record at a secondary mail server, the mail server will accept email for the domain. The mail server will then forward email to the server identified by the highest-priority MX record for the domain when that server becomes available.


Other Files in /etc/mail :

The /etc/mail directory holds most of the files that control sendmail . This section discusses three of those files: mailertable , access , and virtusertable .
mailertable : Forwards Email from One Domain to Another

When you run a mail server, you may want to send mail destined for one domain to a different location. The sendmail daemon uses the /etc/mail/mailertable file for this purpose. Each line in mailertable holds the name of a domain and a destination mailer separated by whitespace; when sendmail receives email for the specified domain, it forwards it to the mailer specified on the same line. Red Hat enables this feature by default: Put an entry in the mailertable file and restart sendmail to use it.

The following line in mailertable forwards email sent to tcorp.com to the mailer at bravo.com :

$ cat /etc/mail/mailertable
lolipop.com smtp:[xyz.com]


The square brackets in the example instruct sendmail not to use MX records but rather to send email directly to the SMTP server. Without the brackets, email could enter an infinite loop.

A period in front of a domain name acts as a wildcard and causes the name to match any domain that ends in the specified name. For example, .tcorp.com matches sales.tcorp.com , mktg.tcrop.com , and so on.

The sendmail init script regenerates mailertable.db from mailertable each time you run it, as when you restart sendmail .
access : Sets Up a Relay Host

On a LAN, you may want to set up a single server to process outbound mail, keeping local mail inside the network. A system that processes outbound mail for other systems is called a relay host . The /etc/mail/access file specifies which systems the local server relays email for. As configured by Red Hat, this file lists only the local system:

$ cat /etc/mail/access
...
# by default we allow relaying from localhost...
localhost.localdomain RELAY
localhost RELAY
127.0.0.1 RELAY


You can add systems to the list in access by adding an IP address followed by whitespace and the word RELAY . The following line adds the 192.168. subnet to the list of hosts that the local system relays mail for:

192.168. RELAY


The sendmail init script regenerates access.db from access each time you run it, as when you restart sendmail .
virtusertable : Serves Email to Multiple Domains

When the DNS MX records are set up properly, a single system can serve email to multiple domains. On a system that serves mail to many domains, you need a way to sort the incoming mail so that it goes to the right places. The virtusertable file can forward inbound email addressed to different domains ( aliases cannot do this).

As sendmail is configured by Red Hat, virtusertable is enabled. You need to put forwarding instructions in the /etc/mail/virtusertable file and restart sendmail to serve the specified domains. The virtusertable file is similar to the aliases file (page 633), except the left column contains full email addresses, not just local ones. Each line in virtusertable starts with the address that the email was sent to, followed by whitespace and the address sendmail will forward the email to. As with aliases , the destination can be a local user, an email address, a file, or a pipe symbol ( | ), followed by a command.

The following line from virtusertable forwards mail addressed to zach@lolipop.com to zcs , a local user:

zach@xyz.com zcs


You can also forward email for a user to a remote email address:

sams@xyz.com sams@lolipop.com


You can forward all email destined for a domain to another domain without specifying each user individually. To forward email for every user at xyz.com to lolipop.com , specify @xyz.com as the first address on the line. When sendmail forwards email, it replaces the %1 in the destination address with the name of the recipient. The next line forwards all email addressed to bravo.com to tcorp.com , keeping the original recipients' names :

@xyz.com %1@lolipop.com


Finally you can specify that email intended for a specific user should be rejected by using the error namespace in the destination. The next example bounces email addressed to spam@lolipop.com with the message 5.7.0:550 Invalid address :

spam@lolipop.com error:5.7.0:550 Invalid address

How to send one mail to "relay server"(another mail server) using sendmail?

■ Requirement : sending mail to another relay server using sendmail.
■ OS Environment : Linux, RHEL, Centos
■ Implementation Steps : 

1. edit /etc/mail/sendmail.mc & add this line :

define(`SMART_HOST',`[smarthost.example.net]')dnl

3. Rebuild the sendmail.cf :

$ m4 /etc/mail/sendmail.mc > /etc/mail/sendmail.cf

4.Restart sendmail:

$ /etc/rc.d/init.d/sendmail restart

5. Now send mail and check the maillog. Log will show relay name.

sendmail without DNS :

There are a number of steps required to successfully use sendmail when there is limited or no DNS.

1. I assume that domain is resolvable, either by /etc/hosts or DNS, or alternatively we can specify an IP address.
2. Set realy host in /etc/mail/sendmail.mc ie define(`SMART_HOST',`name.of.smart.host')dnl
3. Since the system implicitly have limited resolving capabilities, accept email for unknown domains so use line in /etc/mail/sendmail.mc of the form
FEATURE(accept_unresolvable_domains)dnl
4. We have to make it sure that the ServiceSwitchFile (by default at /etc/mail/service.switch) has content similar to:

aliases files
hosts files

5. Setting the submission agent to ignore DNS. Use line in /etc/mail/submit.mc of the form

define(`confDIRECT_SUBMISSION_MODIFIERS',`C')

6. Use line in /etc/mail/submit.mc of the form

FEATURE(accept_unresolvable_domains)dnl

7. Execute :

$m4 /etc/mail/submit.mc > /etc/mail/submit.cf

6.service sendmail restart

How to install and configure sendmail?

■ Requirement : Install & configure sendmail
■ OS Environment : Linux(RHEL, Centos)
■ Implementation Steps : 

1. Install sendmail  :

$ yum install sendmail*
$ yum install m4*

2. Edit /etc/mail/sendmail.mc & bind daemon with loop back IP. Server's IP can be used too.

dnl # DAEMON_OPTIONS(`Port=smtp,Addr=127.0.0.1, Name=MTA')dnl

3. Execute following command :

$ m4 /etc/mail/sendmail.mc > /etc/mail/sendmail.cf

4. Edit /etc/hosts.allow and add following line :

sendmail: ALL

5. Set DAEMON to yes in /etc/sysconfig/sendmail

DAEMON=yes
QUEUE=1h

6. Enable service at boot level :

$ chkconfig sendmain on
$ service sendmail start

Testing :

1. Check whether port 25 is listening  :

$ netstat -tulpn | grep 25
$ telnet localhost 25
$ telnet IP 25

Note : Replace IP address in above command.

2.  Send a test mail :

$ echo test | mail -s test-subject -v  


Note : sendmail configurations files are inside /etc/mail/. Log messages are printed in  /var/log/maillog.

Tuesday, September 6, 2011

What is arp?

■ Requirement : Details on arp
■ OS Environment : Linux, RHEL, Centos
■ Resolution: 

What is arp?

Ans : This is a command to manipulate the system ARP cache. Arp manipulates the kernel’s ARP cache in various ways. The primary options are clearing an address mapping entry and manually setting up one. For debugging purposes, the arp program also allows a complete dump of the ARP cache.
Note : This program is obsolete. For replacement check ip neighbor

Add entry of another machine's IP and MAC address :

#arp -s
arp -i eth0 -s 10.65.211.133 00:16:3e:74:8d:85 pub

View the arp cache :

#arp -n
#arp -v

Delete arp cache entry :

#arp -d

Cache stored in /proc/net/arp file.

Note : Each complete entry in the ARP cache will be marked with the C flag. Permanent entries are marked with M and published entries have the P flag.
files :

/proc/net/arp,
/etc/networks
/etc/hosts
/etc/ethers

How to atomatic update arp cache usign arping ?

arping - we can use this command to send ARP REQUEST to a neighbour host.

$ arping -I
arping -c 1 -I eth0 10.220.227.52



How arp works?

Ans : In an Ethernet environment, ARP is used to map a MAC address to an IP address. ARP dynamically binds the IP address (the logical address) to the correct MAC address. Before IP unicast packets can be sent, ARP discovers the MAC address used by the Ethernet interface where the IP address is configured. Hosts that use ARP maintain a cache of discovered Internet-to-Ethernet address mappings to minimize the number of ARP broadcast messages. To keep the cache from growing too large, an entry is removed if it is not used within a certain period of time. Before sending a packet, the host looks in its cache for Internet-to-Ethernet address mapping. If the mapping is not found, the host sends an ARP request.

arping sends request to nearest host or router and get's their MAC and IP and keeps in cache. Then router sends to its nearest subnet and find out desried IP and MAC. If it gets then it sends to first host. Now first host keeps these in its cache. IP vs MAC mapping.

Set timeout value of arp ?

$ arp timeout 8000

How to clear arp?

$ clear arp

Why MAC address validation ?

MAC address validation is a verification process performed on each incoming packet to prevent spoofing on IP Ethernet-based interfaces, including bridged Ethernet interfaces. When an incoming packet arrives on a layer 2 interface, the validation table is used to compare the packet's source IP address with its MAC address. If the MAC address and IP address match, the packet is forwarded; if it does not match, the packet is dropped.

How to validate arp ?

$ arp validate

Monday, September 5, 2011

What is the magic SysRq key?

■ Requirement : Details on magic SysRq
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

It is a 'magical' key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up.

Enable the magic SysRq key : 

1. CONFIG_MAGIC_SYSRQ=yes in kernel config file(which has inside /boot)
2. in /proc/sys/kernel/sysrq: echo value > /proc/sys/kernel/sysrq

value =
0 - disable sysrq completely
1 - enable all functions of sysrq
2 - enable control of console logging level
4 - enable control of keyboard (SAK, unraw)
8 - enable debugging dumps of processes etc.
16 - enable sync command
32 - enable remount read-only
64 - enable signalling of processes (term, kill, oom-kill)
128 - allow reboot/poweroff
256 - allow nicing of all RT tasks

3. How do I use the magic SysRq key?

On x86 - You press the key combo 'ALT-SysRq-'. Note - Some
keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is
also known as the 'Print Screen' key. Also some keyboards cannot
handle so many keys being pressed at the same time, so you might
have better luck with "press Alt", "press SysRq", "release SysRq",
"press ", release everything.

From command prompt :

echo command > /proc/sysrq-trigger

command =
'b' - Will immediately reboot the system without syncing or unmounting
your disks.

'c' - Will perform a system crash by a NULL pointer dereference.
A crashdump will be taken if configured.

'd' - Shows all locks that are held.

'e' - Send a SIGTERM to all processes, except for init.

'f' - Will call oom_kill to kill a memory hog process.

'g' - Used by kgdb (kernel debugger)

'h' - Will display help (actually any other key than those listed
here will display help. but 'h' is easy to remember :-)

'i' - Send a SIGKILL to all processes, except for init.

'j' - Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.

'k' - Secure Access Key (SAK) Kills all programs on the current virtual
console. NOTE: See important comments below in SAK section.

'l' - Shows a stack backtrace for all active CPUs.

'm' - Will dump current memory info to your console.

'n' - Used to make RT tasks nice-able

'o' - Will shut your system off (if configured and supported).

'p' - Will dump the current registers and flags to your console.

'q' - Will dump per CPU lists of all armed hrtimers (but NOT regular
timer_list timers) and detailed information about all
clockevent devices.

'r' - Turns off keyboard raw mode and sets it to XLATE.

's' - Will attempt to sync all mounted filesystems.

't' - Will dump a list of current tasks and their information to your
console.

'u' - Will attempt to remount all mounted filesystems read-only.

'v' - Forcefully restores framebuffer console
'v' - Causes ETM buffer dump [ARM-specific]

'w' - Dumps tasks that are in uninterruptable (blocked) state.

'x' - Used by xmon interface on ppc/powerpc platforms.

'y' - Show global CPU Registers [SPARC-64 specific]

'z' - Dump the ftrace buffer

'0'-'9' - Sets the console log level, controlling which kernel messages
will be printed to your console. ('0', for example would make
it so that only emergency messages like PANICs or OOPSes would
make it to your console.)

Source : Kernel documentation. 

Sunday, September 4, 2011

How ACL & MASK work in linux?

■ Requirement : Details on ACL & MASK
■ OS Environment : Linux, RHEL, Centos
■ Resolution : 

       When we set default ACL permission along with masking then there should not be any effect of masking bit. But apart from above all, mode of file and directory gets preference at kernel level. While we create any file, kernel passes mode 0666 to its *open* system call and it passes mode 0777 to *mkdir* system call during creating directory. Then based on the value of umask it sets permission of the file and directory. We know effective permission is mapped to masking permission while we pass extended attributes to setfacl. So, directory can inherit permission from parent but file can't do that. Even file won't get any execution permission. Whereas sub-directory will get full permission. If kernel passes mode 777 or 766 or 776 for file then there should have chance where file will get full execution permission(u+g+o), user execution(only u) and user plus group execution(u+g) permission respectively. I am going to describe these along with some examples here :

1. Case :1

Suppose we have a paranoid user who doesn't want anybody else to read his files, ever. He has set his umask to 077. Here's what we see in that case:

$ umask 077; strace -eopen touch testfile 2>&1 | tail -1; ls -l testfile

open("testfile", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
-rw-------. 1 root root 0 Sep 4 15:25 testfile

Here *touch* doesn't care of what the umask is. It just calls open with the desired permissions of 0666, and the kernel applies the umask. Our umask in this case is 0077, or ---rwxrwx, so those are the permissions we cross out. All that's left are the rw- for the owner; the group and other permissions are all taken away, and we have rw------- (0600).

2. Case :2

The same concepts apply to directories. The only real difference is that directories are created with execute permissions by default (0777 instead of 0666). Let's take a look at this:

$ umask 022; strace -emkdir mkdir testdir; ls -ld testdir
mkdir("testdir", 0777) = 0
drwxr-xr-x. 2 root root 4096 Sep 4 15:26 testdir


            There are a few new things in this example, so let's take them one at a time. The first is that we used the mkdir command, which then used the mkdir system call to the kernel. So we told strace to show us just that system call. Next, we see that mkdir (the command) told the kernel to mkdir (the system call) this directory with mode 0777 (which would be rwxrwxrwx). But the kernel took away the umask's bits, so we ended up with rwxr-xr-x (0755).

3. Case :3 (Applying default ACL)

Let's apply default ACL to check how file and directory get their permission.

$  strace -s 128 -fvTttto luv setfacl -m d:u:tgfurnish:rwx,u:tgfurnish:rwx hello
$ getfacl --all-effective hello
# file: hello
# owner: root
# group: root
user::rwx
user:tgfurnish:rwx #effective:rwx
group::r-x #effective:r-x
mask::rwx
other::r-x
default:user::rwx
default:user:tgfurnish:rwx #effective:rwx
default:group::r-x #effective:r-x
default:mask::rwx
default:other::r-x


Let's create a file inside hello directory :

$ strace -s 128 -fvTttto luvly touch hello/hii
$ getfacl --all-effective hello/hii
# file: hello/hii
# owner: root
# group: root
user::rw-
user:tgfurnish:rwx #effective:rw-
group::r-x #effective:r--
mask::rw-
other::r--

$ less luvly |grep open |tail -1
8721 1315131118.682518 open("hello/hii", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 <0 .000184="">

$ umask
0022

      Here  also see kernel passes "0666" mode to touch program. We were expecting that "mask" or "effective" permission will be "rwx" but it didn't happen where directory got similar permission. Note that mask won't be effective if we set default value. I have stated it earlier. That's why I didn't set masking permission though I passed extended attributes to setfacl. So, let's make a summary :

1. File won't get execution permission(masking or effective) whatever we use (ie acl or umask or mask at ACL)
2. Directory can get execution permission(This depends upon how we are setting masking field)
3. If we want to set execution permission to file which is under ACL permission then we have to manually set this permission using "chmod" command. We can implement this in shell script and run it as a cron job.

How does linux system set permission of files and directories while it uses default mask?

Ans : Kernel system call(open and mkdir) passes mode/permission 0666 to file and 0777 to directory during creation of file or directory. These are default value. As per value of umask it calculates permission by doing NOT AND logic operation. I shall describe how permissions are set. As per this mode, file won't get execution any time but directory will get.

Bash and console program uses 666 for file and 777 for directory. To confirm this I have analysed one umask value and calculated exact permission of file and directory.

Lets say we set umask 0007 at console.

Analysis for FILE : Here umask=0007 (set umask like :# umask 007):

Note : "Resultant permissions are calculated via the bitwise AND of the unary complement of the argument (using bitwise NOT) and the permissions specified by the program. Bash uses 666 for files, and 777 for directories. Remember that permission to execute a directory means being able to list it."

Example :

666 = 110 110 110 //since console uses 666 for file
007= 000 000 111 //(for NOT AND, bit will be reversed and anded)
AND = 000 000 110 = 006
NOTAND= 110 110 000 = 660=rw-rw----
rwx rwx rwx

Analysis For DIR : Here umask=0007, bash,console use 666 for its file, use 777 for directory. So directory will get 770 as calculated it here.

777 = 111 111 111
007 = 000 000 111 (for NOT AND bit will be reversed and anded)
AND = 000 000 000 = 000
NOT_AND 111 111 000 = 770 = drwxrwx---
rwx rwx rwx

Testing :

[root@vm46 log]# umask 0007
[root@vm46 log]# mkdir test123
[root@vm46 log]# touch hello
[root@vm46 log]# ls -ld test123
drwxrwx--- 2 root root 4096 Aug 31 20:31 test123
[root@vm46 log]# ls -al hello
-rw-rw---- 1 root root 0 Aug 31 20:31 hello
[root@vm46 log]# umask
0007
[root@vm46 log]#

So, example shows directory got drwxrwx--- and file got -rw-rw---- . This confirms above logic analysis.

Thursday, September 1, 2011

What is "WCHAN" attribute at "ps -alwww" on linux ?

■ Requirement : WCHAN value in output of  "ps -alwww" 
■ OS Environment : Linux, RHEL, Centos
■ Resolution  : 

        WCHAN : Name of the kernel function in which the process is sleeping, a "-" if the process is running, or a "*" if the process is multi-threaded and ps is not displaying threads.

Name of kernel functions are :

Proccess Kernel States (wchan=wait channel in ps -l ) :

biord: block on io read.
futex: [Linux emulation] process is waiting until a futex is released (see fast userspace mutex)
getblk: get block (seems to be generated often by tar)
nanoslp: process is sleeping for some number of nanoseconds (see nanosleep(2))
pause: process is waiting for a signal (see pause(3))
pcmwrv: waiting for audio samples to be played
piperd: read(2) from a pipe
pipewr: write(2) to a pipe
physrd: reading from a HDD
runnable: process is ready to run on the CPU
running: currently on CPU
sbwait: wait for socket to return data (see uipc_sockbuf.c)
swread: read in from swap
stopev: process is stopped because of a debugging event (see sys_process.c; relates to ptrace(2))
tttout: write(2) to a tty
ttyin: read(2) from a tty
ucond: a proccess is blocked until a pthreads mutex is released
vnread: part of the pager (see vnode_pager.c)
wait: wait(2) for a child process
wdrain: write drain. On a device mounted with the async option (or soft-updates) wait until all the previous writes have been completed. (see vfs_bio.c)
zombie: a process died but its parent did not wait(2) for it.

There are other syscalls that are similar to the ones mentioned above (such as readv(2) instead of read(2), and waitpid(2) instead of wait(2)) which will end up with the same wchans.

Meaning of ps STATUS : PROCESS STATE CODES

Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to
describe the state of a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.

For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group

umask concept

umask concept :

When user create a file or directory under Linux or UNIX, she create it with a default set of permissions. In most case the system defaults may be open or relaxed for file sharing purpose. For example, if a text file has 666 permissions, it grants read and write permission to everyone. Similarly a directory with 777 permissions, grants read, write, and execute permission to everyone.

Default umask Value

The user file-creation mode mask (umask) is use to determine the file permission for newly created files. It can be used to control the default file permission for new files. It is a four-digit octal number. A umask can be set or expressed using:

* Symbolic values
* Octal values

Procedure To Setup Default umask :

# vi /etc/profile or $ vi ~/.bashrc

put : umask 022

The default umask 002 used for normal user. With this mask default directory permissions are 775 and default file permissions are 664.
The default umask for the root user is 022 result into default directory permissions are 755 and default file permissions are 644.

Symbolic umasks :

A umask set to u=rwx,g=rwx,o= will result in new files having the modes -rw-rw----, and new directories having the modes drwxrwx---, if the creating programs specify the typical modes.

Symbolic umask example

In bash:

$ umask u=rwx,g=rwx,o=
$ umask
0007
$ mkdir fu
$ touch bar
$ ls -l
drwxrwx--- 2 dave dave 512 Sep 1 20:59 fu
-rw-rw---- 1 dave dave 0 Sep 1 20:59 bar

Octal umasks :

Resultant permissions are calculated via the bitwise AND of the unary complement of the argument (using bitwise NOT) and the permissions specified by the program. Bash uses 666 for files, and 777 for directories. Remember that permission to execute a directory means being able to list it.

The octal notation for the permissions masked out are:

0 – none (i.e. all permissions specified are preserved)
1 – execute only
2 – write only
3 – write and execute
4 – read only
5 – read and execute
6 – read and write
7 – read, write and execute (i.e. no permissions are preserved)

A common umask value is 022 masking out the write permission for the group and others, which ensures that new files are only writable for the owner (i.e. the user who created them). In bash:

$ umask 0022
$ mkdir xdir
$ touch xfile
$ ls -l
drwxr-xr-x 2 dave dave 512 Aug 18 20:59 xdir
-rw-r--r-- 1 dave dave 0 Aug 18 20:59 xfile

Using the above mask, octal 0 doesn't prevent any user bits being set, octal 2 prevents write and execute group bits being set, and second octal 2 prevents the write and execute bit being set for others.

Another common value is 002, which leaves the write permission for the file's group enabled. This can be used for files in shared workspaces, where several users work with the same files.

Calculating resultant permissions example :
With the umask value of 0278 (intended to prohibit non group members from accessing files and directories) any new file will be created with the permissions 640 since:

6668 AND NOT(0278) = 6408 symbolically rw-r-----

and any new directory will have permissions 750 since:

7778 AND NOT(0278) = 7508 symbolically: rwxr-x---

Early UNIX systems were often used by relatively small groups of close colleagues who found it convenient to have most files read/write by everyone. PWB/UNIX evolved in a computer center environment to serve hundreds of users from different organizations. Its developers had combed through the commands to make key file creation modes more restrictive, especially for cases exposing security holes, but this was not a general solution. The addition of umask (around 1978) allowed sites, groups, and individuals to chose their own defaults. Small close groups might choose 000, computer centers 022, security-conscious groups 077 or 066 for access to sub-directories under private directories.
[edit]

---------------------

But, How Do I Calculate umasks?

The octal umasks are calculated via the bitwise AND of the unary complement of the argument using bitwise NOT. The octal notations are as follows:

* Octal value : Permission
* 0 : read, write and execute
* 1 : read and write
* 2 : read and execute
* 3 : read only
* 4 : write and execute
* 5 : write only
* 6 : execute only
* 7 : no permissions

Now, you can use above table to calculate file permission. For example, if umask is set to 077, the permission can be calculated as follows:

Bit Targeted at File permission
0 Owner read, write and execute
7 Group No permissions
7 Others No permissions

To set the umask 077 type the following command at shell prompt:
$ umask 077
$ mkdir dir1
$ touch file
$ ls -ld dir1 file

Sample outputs:

drwx------ 2 vivek vivek 4096 2011-03-04 02:05 dir1
-rw------- 1 vivek vivek 0 2011-03-04 02:05 file
------------------

Effective permission :

Octal numbers and permissions :

You can use octal number to represent mode/permission:

* r: 4
* w: 2
* x: 1

For example, for file owner you can use octal mode as follows. Read, write and execute (full) permission on a file in octal is
0+r+w+x = 0+4+2+1 = 7

Only Read and write permission on a file in octal is
0+r+w+x = 0+4+2+0 = 6

Only read and execute permission on a file in octal is
0+r+w+x = 0+4+0+1 = 5

Use above method to calculate permission for group and others. Let us say you wish to give full permission to owner, read & execute permission to group, and read only permission to others, then you need to calculate permission as follows:

User = r+w+x = 0+4+2+1 = 7
Group= r+w+x = 0+4+2+0 = 6
Others = r+w+x = 0+0+0+1 = 1

Effective permission is 761.

===================
Octal masking in more detail :

0022 = 0 0=- - - 2= - w- 2=- w-
rwx rwx rwx
421 421 421
U G O
masking out^^ ie removing that permission.

$ umask 0022
$ mkdir xdir
$ touch xfile
$ ls -l
drwxr-xr-x 2 dave dave 512 Aug 18 20:59 xdir
-rw-r--r-- 1 dave dave 0 Aug 18 20:59 xfile

Symbolic umask example is just like setting default value of dir/files(file shouldn't have x permission) :

$ umask u=rwx,g=rwx,o=
$ umask
0007
$ mkdir fu
$ touch bar
$ ls -l
drwxrwx--- 2 dave dave 512 Sep 1 20:59 fu
-rw-rw---- 1 dave dave 0 Sep 1 20:59 bar

Detailed explanation :

FILE : Here umask=0007 , bash & console use 666 for its file, uses 777 for dir. So file will get 660 as calculated it here.

Note : "Resultant permissions are calculated via the bitwise AND of the unary complement of the argument (using bitwise NOT) and the permissions specified by the program. Bash uses 666 for files, and 777 for directories. Remember that permission to execute a directory means being able to list it."

Example :

666 = 110 110 110
007= 000 000 111 (for NOT AND bit will be reverse and anding)
AND: 000 000 110 = 006
NOT_AND:110 110 000 = 660=rw-rw----
rwx rwx rwx

For DIR : Here umask=0007 , bash & console use 666 for its file, uses 777 for dir. So dir will get 770 as calculated it here.

777 = 111 111 111
007 = 000 000 111 (for NOT AND bit will be reverse and anding)
AND = 000 000 000 = 000
NOT_AND 111 111 000 = 770 = drwxrwx---
rwx rwx rwx

Testing :

$ umask u=rwx,g=rwx,o=
$ umask
0007
$ mkdir fu
$ touch bar
$ ls -l
drwxrwx--- 2 dave dave 512 Sep 1 20:59 fu
-rw-rw---- 1 dave dave 0 Sep 1 20:59 bar