First of all; I’ll assume that if your reading this post that you already know what Chef is and how it can benefit your organization. If not, then you can read all about it over here. To the cloud! …err class, I mean.
I was lucky enough to take part in the Chef fundamentals 2-day training in NYC as part of a company-sponsored training event. It consisted of two 8 hour days –in my case, the training took place on a Thursday and Friday with a separate Chef hack-day on Saturday…ok so maybe three days if you count the hack day where we got to work on our own projects. In this post, I’ll attempt to answer some basic questions others might have about the chef training.
How was it presented? – The classes were largely instructor lead with a class size of about 15-20 people (your mileage may vary) and two instructors. Each class consisted of instructor-student review/lecture of course materials presented via power point, as well as a hands-on lab portion to accompany each section. We were provided with PDFs of the course material (so we could follow along) and a lab guide (also in PDF) which covered the in-class exercises. We were told what was required to set up our own chef workstation environment before attending the training. However, at the time of this post both virtual workstations and virtual server instances were provided to us for the purpose of completing the lab exercises.
What did it cover? – The lecture started with the very basics about what chef truly is and how it can be used in staging and production environments. Imagine a concise yet interactive version of the chef wiki being streamed directly to your brain over the course of a couple days. It started with the basics and steadily progressed to more complicated topics. Make no mistake, the opscode guys are thorough and at times it might feel like you are experiencing information overload. What do you expect? These classes are condensed cram sessions.
Who would benefit? – The short answer –just about anyone who is interested in system automation. Though I already had some basic working knowledge of chef, I found it pretty valuable to be able to bring my questions directly to the pros. Reviewing the course materials also forced me to go over some areas I had skipped over during my own research and practice.
I recently came across a Cpanel server (CentOS 5) upon which mysqld refused to start after /var was at 100%. After tailing the mysql error log in the default /var/lib/mysql/HOSTNAME.err it was no surprise to find that the mysql user table had been marked as crashed. On RHEL/CentOS servers, you cannot simply add the “innodb_force_recovery = 1″ (or whatever recovery level…2,3,4,5,6) to the /etc/my.cnf and do the regular service mysql start. You’ll have to edit the my.cnf to enable recovery and start mysql from the command line and not by the init script/service command. Only once you have mysql started on the command line can you run your repair on the mysql user table. Here is quick run-down with commands and queries to run on your Cpanel server;
[root@HOSTNAME mysql]# tail -f /var/lib/mysql/HOSTNAME.err
110108 10:37:45 InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 1 3749263016.
InnoDB: Doing recovery: scanned up to log sequence number 1 3749263050
InnoDB: Last MySQL binlog file position 0 79, file name ./host89-bin.000005
110108 10:37:45 InnoDB: Flushing modified pages from the buffer pool...
110108 10:37:45 InnoDB: Started; log sequence number 1 3749263050
InnoDB: !!! innodb_force_recovery is set to 1 !!!
110108 10:37:45 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'user' is marked as crashed and should be repaired
110108 10:37:45 mysqld ended
Now, go ahead and enable innodb forced recovery by opening your /etc/my.cnf in your favorite text editor and make sure you have something like this:
innodb_force_recovery = 1
Start mysql from the command line after enabling innodb forced recovery:
[root@HOSTNAME ~]#/usr/sbin/mysqld --skip-grant-tables --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/HOSTNAME.pid --skip-external-locking --port=3306 --socket=/var/lib/mysql/mysql.sock
Finally, it’s time to get back in to mysql and get your life back… or mysql user table at least!
[root@HOSTNAME ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 173 to server version: 4.1.22-standard-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> use mysql;
mysql> check table user;
| Table | Op | Msg_type | Msg_text |
| mysql.user | check | warning | Table is marked as crashed |
| mysql.user | check | warning | 6 clients are using or haven't closed the table properly |
| mysql.user | check | error | Record at pos: 24992 is not remove-marked |
| mysql.user | check | error | record delete-link-chain corrupted |
| mysql.user | check | error | Corrupt |
5 rows in set (0.02 sec)
mysql> repair table user;
| Table | Op | Msg_type | Msg_text |
| mysql.user | repair | warning | Number of rows changed from 1384 to 1385 |
| mysql.user | repair | status | OK |
2 rows in set (0.48 sec)
mysql> check table user;
| Table | Op | Msg_type | Msg_text |
| mysql.user | check | status | OK |
1 row in set (0.01 sec)
Now, don’t forget to REMOVE the innodb_force_recovery line from your my.cnf you added earlier! After that, just start mysql as you normally would.
[root@HOSTNAME mysql]# service mysql start
Today I found a server using reiserfs, but to my dismay there weren’t any patched kernels available that address the MCAST_MSFILTER Compat mode security vulnerability. I compiled my own and I’m making them available.
I’m making 64bit versions of the 2.6.18-194.11.4.el5 kernel for CentOSplus (version 5) available below:
Control Panel -> Email -> Add/Remove/Manage Accounts -> Show Disk Space Used
When there are inconsistencies with Cpanel’s email disk space usage, you should verify the actual disk space being utilized by:
du -sh /home/username/mail/userdomain.com/user/
tail -1 /home/username/mail/domain.com/user/maildirsize
(2) Compare the utilized disk space reported by the first du command with the bytes shown in the user’s maildirsize file. If the totals reported are different delete or rename the maildirsize file (/home/username/mail/domain.com/user/maildirsize). Next, logout and then back in again within the user’s control panel and go to Control Panel -> Email -> Add/Remove/Manage Accounts -> Show Disk Space Used
gzip of /var/log/messages (reading+writing after compression)
How many times have you wished there was an easy way to view the progress of a gzip? a grep? redirected file read/writes? file copies? There are so many situations where the addition of a simple progress bar would make the user experience 1 million times more bearable. Several years ago I had read about pipe viewer, or pv for short. This is truly the answer to my progress bar prayers. I cannot believe I had forgotten about such a nifty and useful tool. It’s been in existence so long, chances are good that you forgot about it too. I’ll just jump right in…
The project page describes it as;
…a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.
It’s hard to believe that Andrew Wood first produced this linux software gem waaay back in 2002! Its been around so long, I had trouble finding a 64bit version of it…. like it really matters. Anyway, if anyone cares; I rebuilt the RPM for x86_64 and have it available here at dlaube.com.
If you are using the RPMforge repository, you can simply yum install pv. If your fine with just RPM;
# i386 RPM for those not using RPMforge
rpm -ivh pv-1.1.4-1.i386.rpm
# 64bit RPM for those not using RPMforge
Official mirror: here AND Local mirror for i386 here | 64bit here.
I asked myself this question after reviewing the validity period for the RHCE. See my How long is the RHCE valid post. Back then I discovered the RHCE is valid for 2 full releases after the release on which the exam was taken. The answer to one question only created another! I wanted to know exactly how many years “2 full releases” equated to in human years.
After scouring Google quite a few times, I came to realize that there isn’t any cut and dry answer as to how long a Red Hat Enterprise Linux (RHEL) release is actively developed. Sure older versions are still sort of supported once a new major release comes out, but this quest is more about getting a feel for RedHat’s release time-line. The BIG question ultimately led to some digging around through RedHat press releases over here and of course; wikipedia. After gathering the data, I produced a quick and dirty chart comparing the different RHEL versions and their release dates by number months. Hope this helps those with the same questions floating through their heads.
Red Hat RHEL lifetime
RHEL 5: 3/1/2007 – present - 36 months as of 4/2010 – [ 6 releases (so far) ]
RHEL 4: 2/14/2005 – 5/18/2009 - 51 months – [ 4 releases ]
RHEL 3: 10/23/2003 – 6/15/2007 - 44 months – [ 5 releases ]
Armed with this information, one could guess that RHEL 5 will have a life expectancy of about 50 months. If that’s the case, then as of this post, we have just about 14 months left until RHEL 6! If anyone has any further insight, please post a comment.
How long is the RHCE “good” for? – It seems like this is the million dollar question everyone wants answered –including myself. Some people say the RHCE is only good for a single major release of Red Hat Enterprise Linux (RHEL), others say its a year or two without providing anything substantial to back such claims. However, according to RedHat, the RHCE is valid for 2 full releases after the release on which the exam was taken.
This is the legalese you have probably already seen at RedHat;
The validity period for all RHCEs and RHCTs is pegged to the release of the Enterprise product commercially available at the time certification was earned. RHCE and RHCT certifications are considered current until Red Hat retires exams of the release following the version on which your certification was earned. For example, certificates earned on Red Hat Enterprise Linux 3 will be current until August 31, 2007, the last date on which Red Hat Enterprise Linux 4 exams will be offered. Note that Red Hat Enterprise Linux 5 was released in March, months before the final retirement of the version 4 exams.
To provide further clarification for earlier versions, Red Hat Enterprise Linux 4 will remain current until Red Hat Enterprise Linux 5 exams are retired, several months after the release of Red Hat Enterprise Linux 6.
Now this raises another interesting question; What’s the life expectancy of a RHEL release? In my next blog post, I attempt to answer this question and provide an easy to read bar graph illustrating RedHat releases of RHEL3, RHEL4 and RHEL5. I also offer a time-line for each of the RHEL releases.
The default qmail toaster install wont give you much in terms of log data past a 20-30 minutes on a production mail server. It quickly becomes advantageous to increase the duration of log data. You will want to set the “s” and “n” options in multilog to accomplish this.
Open up your /service/qmail-smtpd/log/run file and change;
exec /usr/local/bin/setuidgid qmaill /usr/local/bin/multilog t /var/log/qmail/smtpd
#The "s16777215" option sets the size of the log file and the "n80" denotes the number of log file to keep on hand.
exec /usr/local/bin/setuidgid qmaill /usr/local/bin/multilog t s16777215 n80 /var/log/qmail/smtpd
Then you will need to restart multilog via supervise with the following command:
svc -h /service/qmail-send/log
When is it a good time to check to see if a hard drive is failing? Well, when your console is full of IO/seek errors, I’d say that is a pretty good time! Hah.
According to research conducted by Google, a document entitled Failure Trends in a Large Disk Drive Population states that the manufacturer, particular model and vintage plays a role, but does not provide failure statistics on model and manufacturers. Most drives were run at 45C or less.
From the SMART data, scan errors, reallocations, offline reallocations and probational counts had a significant correlation with failure probability, whereas seek errors, calibration retries and spin retries had little significance.
Soooo…. you want to look at the Raw_Read_Error_Rate, Seek_Error_Rate and Reallocated_Sector_Ct information from smartctl.
[root@SOMESERVER ~]# smartctl --all /dev/sdb | grep Error
Error logging capability: (0x01) Error logging supported.
1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 166491825
7 Seek_Error_Rate 0x000f 090 060 030 Pre-fail Always - 999290467
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
[root@SOMESERVER ~]# smartctl --all /dev/sdb| grep Reallocated_Sector_Ct5
Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
In regards to Reallocated_Sector_Ct, the normalized values (current=100, worst=100) indicate the drive is in perfect condition (higher is better, and looking at the overall report it appears that 100 is “best”). The threshold value (36) just indicates how low the normalized value would have to drop before the manufacturer would consider the drive to be in a “Pre-fail” condition.
If you run “smartctl –all /dev/sdb | grep Error” again and notice that Raw_Read_Error_Rate and Seek_Error_Rate keep incrementing AND Reallocated_Sector_Ct is greater than 0, its pretty safe to say that you have a ticking time-bomb on your hands. You should consider replacing those drives as soon as possible.
Recently noticed a bunch of servers (CentOS 5.2 and CentOS 4.8) with SSH fingerprint mismatches. After poking around, it appears that they had been compromised. chkrootkit and RKhunter found nothing but the suspicious entry in /dev (see below).
WARNING: DSA key found for host ourserver.domain.com
DSA key fingerprint f4:27:d0:32:6b:4c:9f:6e:52:6f:49:dd:19:54:c2:f1.
+–[ DSA 1024]—-+
| . o..|
| * . .E|
| + B . . .o|
| S * o …|
| . o = . |
| . o + |
| o . |
The authenticity of host ‘ourserver.domain.com (10.0.0.2)’ can’t be established
but keys of different type are already known for this host.
RSA key fingerprint is 4c:5f:95:3e:52:08:6c:cd:0f:f8:37:38:3c:dd:bf:56.
Are you sure you want to continue connecting (yes/no)? yes
/usr/local/sbin/sshd2 — captures login credentials for both root and non-root logins over SSH/SCP/sftp etc and logs to /dev/sax
/bin/suidshell — is exactly that, a shell with the suid bit set that instantly gives root access to any unprivileged user!
/dev/sax — stores plaintext username and password for root and non-root accounts