October 17

My Impressions of Opscode Chef Training

First of all; I’ll assume that if your reading this post that you already know what Chef is and how it can benefit your organization. If not, then you can read all about it over here. To the cloud! …err class, I mean.

I was lucky enough to take part in the Chef fundamentals 2-day training in NYC as part of a company-sponsored training event. It consisted of two 8 hour days –in my case, the training took place on a Thursday and Friday with a separate Chef hack-day on Saturday…ok so maybe three days if you count the hack day where we got to work on our own projects. In this post, I’ll attempt to answer some basic questions others might have about the chef training.

How was it presented? – The classes were largely instructor lead with a class size of about 15-20 people (your mileage may vary) and two instructors. Each class consisted of instructor-student review/lecture of course materials presented via power point, as well as a hands-on lab portion to accompany each section. We were provided with PDFs of the course material (so we could follow along) and a lab guide (also in PDF) which covered the in-class exercises. We were told what was required to set up our own chef workstation environment before attending the training. However, at the time of this post both virtual workstations and virtual server instances were provided to us for the purpose of completing the lab exercises.

What did it cover? – The lecture started with the very basics about what chef truly is and how it can be used in staging and production environments. Imagine a concise yet interactive version of the chef wiki being streamed directly to your brain over the course of a couple days. It started with the basics and steadily progressed to more complicated topics. Make no mistake, the opscode guys are thorough and at times it might feel like you are experiencing information overload. What do you expect? These classes are condensed cram sessions.

Who would benefit? – The short answer –just about anyone who is interested in system automation. Though I already had some basic working knowledge of chef, I found it pretty valuable to be able to bring my questions directly to the pros. Reviewing the course materials also forced me to go over some areas I had skipped over during my own research and practice.

Category: Chef, Linux | LEAVE A COMMENT
January 8

mysql Innodb – Table ‘user’ is marked as crashed and should be repaired

I recently came across a Cpanel server (CentOS 5) upon which mysqld refused to start after /var was at 100%. After tailing the mysql error log in the default /var/lib/mysql/HOSTNAME.err it was no surprise to find that the mysql user table had been marked as crashed. On RHEL/CentOS servers, you cannot simply add the “innodb_force_recovery = 1″ (or whatever recovery level…2,3,4,5,6) to the /etc/my.cnf and do the regular service mysql start. You’ll have to edit the my.cnf to enable recovery and start mysql from the command line and not by the init script/service command. Only once you have mysql started on the command line can you run your repair on the mysql user table. Here is quick run-down with commands and queries to run on your Cpanel server;

[root@HOSTNAME mysql]# tail -f /var/lib/mysql/HOSTNAME.err
110108 10:37:45  InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 1 3749263016.
InnoDB: Doing recovery: scanned up to log sequence number 1 3749263050
InnoDB: Last MySQL binlog file position 0 79, file name ./host89-bin.000005
110108 10:37:45  InnoDB: Flushing modified pages from the buffer pool...
110108 10:37:45  InnoDB: Started; log sequence number 1 3749263050
InnoDB: !!! innodb_force_recovery is set to 1 !!!
110108 10:37:45 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'user' is marked as crashed and should be repaired
110108 10:37:45  mysqld ended

Now, go ahead and enable innodb forced recovery by opening your /etc/my.cnf in your favorite text editor and make sure you have something like this:

innodb_force_recovery = 1

Start mysql from the command line after enabling innodb forced recovery:

[root@HOSTNAME ~]#/usr/sbin/mysqld --skip-grant-tables --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/HOSTNAME.pid --skip-external-locking --port=3306 --socket=/var/lib/mysql/mysql.sock

Finally, it’s time to get back in to mysql and get your life back… or mysql user table at least!

[root@HOSTNAME ~]# mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 173 to server version: 4.1.22-standard-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use mysql;
mysql> check table user;
| Table      | Op    | Msg_type | Msg_text                                                 |
| mysql.user | check | warning  | Table is marked as crashed                               |
| mysql.user | check | warning  | 6 clients are using or haven't closed the table properly |
| mysql.user | check | error    | Record at pos: 24992 is not remove-marked                |
| mysql.user | check | error    | record delete-link-chain corrupted                       |
| mysql.user | check | error    | Corrupt                                                  |
5 rows in set (0.02 sec)

mysql> repair table user;
| Table      | Op     | Msg_type | Msg_text                                 |
| mysql.user | repair | warning  | Number of rows changed from 1384 to 1385 |
| mysql.user | repair | status   | OK                                       |
2 rows in set (0.48 sec)

mysql> check table user;
| Table      | Op    | Msg_type | Msg_text |
| mysql.user | check | status   | OK       |
1 row in set (0.01 sec)


Now, don’t forget to REMOVE the innodb_force_recovery line from your my.cnf you added earlier! After that, just start mysql as you normally would.

[root@HOSTNAME mysql]# service mysql start
Category: Cpanel, Hosting, Linux | 4 Comments
September 27

CentOSplus Kernel RPMS

Today I found a server using reiserfs, but to my dismay there weren’t any patched kernels available that address the MCAST_MSFILTER Compat mode security vulnerability. I compiled my own and I’m making them available.

I’m making 64bit versions of the 2.6.18-194.11.4.el5 kernel for CentOSplus (version 5) available below:


…MORE here

May 7

Linux command line progress bars

progress bar linux command

gzip of /var/log/messages (reading+writing after compression)

How many times have you wished there was an easy way to view the progress of a gzip? a grep? redirected file read/writes? file copies? There are so many situations where the addition of a simple progress bar would make the user experience 1 million times more bearable. Several years ago I had read about pipe viewer, or pv for short. This is truly the answer to my progress bar prayers. I cannot believe I had forgotten about such a nifty and useful tool. It’s been in existence so long, chances are good that you forgot about it too. I’ll just jump right in…

The project page describes it as;

…a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

It’s hard to believe that Andrew Wood first produced this linux software gem waaay back in 2002! Its been around so long, I had trouble finding a 64bit version of it…. like it really matters. Anyway, if anyone cares; I rebuilt the RPM for x86_64 and have it available here at dlaube.com.

If you are using the RPMforge repository, you can simply yum install pv. If your fine with just RPM;

# i386 RPM for those not using RPMforge
wget http://pipeviewer.googlecode.com/files/pv-1.1.4-1.i386.rpm
rpm -ivh pv-1.1.4-1.i386.rpm

# 64bit RPM for those not using RPMforge
wget http://www.dlaube.com/wp-content/uploads/2010/05/pv-1.1.4-1.x86_64.rpm

Official mirror: here AND Local mirror for i386 here | 64bit here.

Category: Linux | 9 Comments
April 27

What’s the life expectancy of a RHEL release?

I asked myself this question after reviewing the validity period for the RHCE. See my How long is the RHCE valid post. Back then I discovered the RHCE is valid for 2 full releases after the release on which the exam was taken. The answer to one question only created another! I wanted to know exactly how many years “2 full releases” equated to in human years.

After scouring Google quite a few times, I came to realize that there isn’t any cut and dry answer as to how long a Red Hat Enterprise Linux (RHEL) release is actively developed. Sure older versions are still sort of supported once a new major release comes out, but this quest is more about getting a feel for RedHat’s release time-line. The BIG question ultimately led to some digging around through RedHat press releases over here and of course; wikipedia. After gathering the data, I produced a quick and dirty chart comparing the different RHEL versions and their release dates by number months. Hope this helps those with the same questions floating through their heads.

Red Hat Enterprise Linux lifetime

Red Hat RHEL lifetime

RHEL 5: 3/1/2007 – present - 36 months as of 4/2010 – [ 6 releases (so far) ]
RHEL 4: 2/14/2005 – 5/18/2009 - 51 months – [ 4 releases ]
RHEL 3: 10/23/2003 – 6/15/2007 - 44 months – [ 5 releases ]

Armed with this information, one could guess that RHEL 5 will have a life expectancy of about 50 months. If that’s the case, then as of this post, we have just about 14 months left until RHEL 6! If anyone has any further insight, please post a comment.

Category: Linux | 1 Comment
April 24

How long is the RHCE valid?

How long is the RHCE “good” for? – It seems like this is the million dollar question everyone wants answered –including myself. Some people say the RHCE is only good for a single major release of Red Hat Enterprise Linux (RHEL), others say its a year or two without providing anything substantial to back such claims. However, according to RedHat, the RHCE is valid for 2 full releases after the release on which the exam was taken.

This is the legalese you have probably already seen at RedHat;

The validity period for all RHCEs and RHCTs is pegged to the release of the Enterprise product commercially available at the time certification was earned. RHCE and RHCT certifications are considered current until Red Hat retires exams of the release following the version on which your certification was earned. For example, certificates earned on Red Hat Enterprise Linux 3 will be current until August 31, 2007, the last date on which Red Hat Enterprise Linux 4 exams will be offered. Note that Red Hat Enterprise Linux 5 was released in March, months before the final retirement of the version 4 exams.

To provide further clarification for earlier versions, Red Hat Enterprise Linux 4 will remain current until Red Hat Enterprise Linux 5 exams are retired, several months after the release of Red Hat Enterprise Linux 6.

Now this raises another interesting question; What’s the life expectancy of a RHEL release? In my next blog post, I attempt to answer this question and provide an easy to read bar graph illustrating RedHat releases of RHEL3, RHEL4 and RHEL5. I also offer a time-line for each of the RHEL releases.

Category: Linux | 1 Comment
April 12

How to determine if a SATA drive is failing.

When is it a good time to check to see if a hard drive is failing? Well, when your console is full of IO/seek errors, I’d say that is a pretty good time! Hah.

According to research conducted by Google, a document entitled Failure Trends in a Large Disk Drive Population states that the manufacturer, particular model and vintage plays a role, but does not provide failure statistics on model and manufacturers. Most drives were run at 45C or less.

From the SMART data, scan errors, reallocations, offline reallocations and probational counts had a significant correlation with failure probability, whereas seek errors, calibration retries and spin retries had little significance.

Soooo…. you want to look at the Raw_Read_Error_Rate, Seek_Error_Rate and Reallocated_Sector_Ct information from smartctl.

[root@SOMESERVER ~]# smartctl --all /dev/sdb | grep Error
Error logging capability:        (0x01)	Error logging supported.
  1 Raw_Read_Error_Rate     0x000f   117   100   006    Pre-fail  Always       -       166491825
  7 Seek_Error_Rate         0x000f   090   060   030    Pre-fail  Always       -       999290467
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
SMART Error Log Version: 1
[root@SOMESERVER ~]# smartctl --all /dev/sdb| grep Reallocated_Sector_Ct5
Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

In regards to Reallocated_Sector_Ct, the normalized values (current=100, worst=100) indicate the drive is in  perfect condition (higher is better, and looking at the overall report it appears that 100 is “best”). The threshold value (36) just indicates how low the normalized value would have to drop before the manufacturer would consider the drive to be in a “Pre-fail” condition.

If you run “smartctl –all /dev/sdb | grep Error” again and notice that Raw_Read_Error_Rate and Seek_Error_Rate keep incrementing AND Reallocated_Sector_Ct is greater than 0, its pretty safe to say that you have a ticking time-bomb on your hands. You should consider replacing those drives as soon as possible.

April 4

Unknown linux rootkit?

Recently noticed a bunch of servers (CentOS 5.2 and CentOS 4.8) with SSH fingerprint mismatches. After poking around, it appears that they had been compromised. chkrootkit and RKhunter found nothing but the suspicious entry in /dev (see below).

WARNING: DSA key found for host ourserver.domain.com
in /Users/username/.ssh/known_hosts:578
DSA key fingerprint f4:27:d0:32:6b:4c:9f:6e:52:6f:49:dd:19:54:c2:f1.
+–[ DSA 1024]—-+
| ..o+|
| . o..|
| * . .E|
| + B . . .o|
| S * o …|
| . o = . |
| . o + |
| o . |
| |

The authenticity of host ‘ourserver.domain.com (’ can’t be established
but keys of different type are already known for this host.
RSA key fingerprint is 4c:5f:95:3e:52:08:6c:cd:0f:f8:37:38:3c:dd:bf:56.
Are you sure you want to continue connecting (yes/no)? yes

Affected files:

    /usr/local/sbin/sshd2 — captures login credentials for both root and non-root logins over SSH/SCP/sftp etc and logs to /dev/sax

    /bin/suidshell — is exactly that, a shell with the suid bit set that instantly gives root access to any unprivileged user!

    /dev/sax — stores plaintext username and password for root and non-root accounts

March 24

pure-ftpd Can’t change directory to /var/ftp/

The problem arises when a user attempts to make an anonymous FTP connection to Cpanel user’s account who has already enabled anonymous FTP connections in their control panel. However, pure-ftpd drops you with the error “421 Can’t change directory to /var/ftp/”.

workstation:~ user$ ftp testing.com
Connected to testing.com.
220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 3 of 50 allowed.
220-Local time is now 11:02. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 15 minutes of inactivity.
Name (testing.com:user): anonymous
421 Can't change directory to /var/ftp/ [/]
ftp: Login failed.

The solution(s):
1. Use anonymous@domain.com and any password instead of just anonymous
2. Assign the Cpanel user a dedicated IP address where FTP logins with just “anonymous” will work.

Category: Cpanel, Hosting, Linux | 2 Comments