How to force fsck?

2007-12-25 11:45:00

Hello,

Last week I asked how to force the system to require an

interactive fsck at the next reboot, as part of a break/fix

hardware test for prospective sysadmins. (Full text of

original message is at the end).

In the end, mgmt decided that the test would not be

necessary (whew), so I am now able to offer you this

summary. It should be noted that I did not try any

of these suggestions, as the test was cancelled.

My thanks to the following people for their speedy replies.

   Kris (Unixboy)

   Darren Dunham

   Kevin Sheehan {Consulting Poster Child}

   Michael Stapleton

   Seferino Gardner

   Rich Jankowski

   Thad MacMillan

   Dan Lowe

   Dan Lorenzini

   Annette Lee

   Jerry Lu

   Gary Jenson

   David B. Harrington

   Dan Brown

   Toby A. Rider

   Brett Lymn

   Ian MacPhedran

   

Here are some of the utilities mentioned that can be used to

corrupt a filesystem

   fsdb(1M)

   unlink(1M)

   clri(1M)

   dd(1M)

   

Here are their suggestions and comments:

 1. You can destroy the primary superblock on a partion then run

    "fsck" to restore it from a backup superblock (block 32 is

    the traditional first choice):

      i. newfs /dev/rdsk/c?t?d?s? (create a new file system on a

         slice if you don't wanna use an existing partition)

     ii. fsck /dev/rdsk/c?t?d?s?

    iii. dd if=/kernel/genunix of=/dev/rdsk/c?t?d?s? count=32 bs=512

         (destroy the primary superblock on this partition)

     iv. fsck /dev/rdsk/c?t?d?s? (you'll see error now)

      v. fsck -o b=32 /dev/rdsk/c?t?d?s?

        (repair the superblock from block 32)

    If you reboot the machine right after step iv, you can force

    the fsck to run.

 2. Just change the reference to your raw device in the

    /etc/vfstab. This will cause the fsck to fail, as the device

    won't exist and this will drop you to the shell level. This

    is a much safer method than intentionally corrupting a

    filesystem. Who ever you interview will have to figure out

    the correct device and eventually fsck the correct one. Good

    way to test a junior person on fsck, format and basic file

    system principles.

   

 3. To do it in a repeatable way, look at fsdb(1M) and try your

    hand at manually manipulating the filesystem. Great

    opportunity, when you actually *want* to mess up the FS!

 4. If you're looking for a real munge, grab 'fastfs', put your

    /usr filesystem into the fast state and make changes.

    create/delete many files in a directory and pull the plug.

    It's not as repeatable, but you'll sure have the filesystem

    in a very unclean state when it comes back.

 5. I would create a toy file system on another partition first

    (a bit safer) and then hit either a directory or inode part of

    a cylinder group with (eg.)

   

    dd if=/dev/zero of=/dev/rdsk/c0t0d0s4

   

    This messes up the FS, but not very graceful stuff like

    duplicate inodes, etc.

 6. Use the unlink command to cause lost files. It removes the

    directory entry but does not delete the file.

   

 7. Make a directory somewhere in /usr then make another directory

    in the one you just created. Get the inode for the first

    directory (ls -il) then run clri(1M) on that inode. Make sure

    this is on a test system, as there's always the possibility

    that it'll really clobber /usr. You probably will still have

    to just turn the power off to make sure the FS is dirty so

    the fsck will happen on boot.

   

 8. Tell me if I'm off base here but can't you just make an rc file

    to do this before /usr is mounted ? The following files may be

    places to put something like that: /etc/rcS.d/S30rootusr.sh,

    /etc/rcS.d/S40standardmounts.sh [ It's not really what I was

    after. I wanted there to be a problem to fix, which requires

    the admin to run fsck, not to have fsck run automatically at

    reboot time. -- Mike]

   

 9. If it's ugly you want, jerking on the power cord and then

    jerking on it again in the middle of reboot ought to do the

    trick. Of course, controlled mayhem is probably available as

    a package from from Sun. (hehehehe)

   

10. You might go to your hardware guys and see if they have a bad

    hard drive somewhere that you can put in the test machine, and

    let them play with it that way. Of course, they should never

    be able to successfully fsck that drive, but that's real world.

11. Repartion the drive, overlap something on the /usr partition.

    Newfs the new partition or use it for swap. Guaranteed to hose

    whatever data is there. Don't ask me how I know :-)

12. Perhaps bonus points for someone who can repair a system with

    a broken dynamic library system without resorting to booting

    from CD - another machine available to copy stuff from helps.

    

       

   

Plus some comments on the validity of the test in general:

- This is kind of a difficult test, how are you going to base

  pass/fail? What if you trash one machine to the point of it

  being unrecoverable, and another where fsck recovers without

  incident? Then you'll be hiring people based on which machine

  they got and not their skills.

- If you have to have a hands on test for prospective employees,

  why not give them a machine and have them bring up networking?

  Or change the subnet/router/nameserver/nfs and have them

  reconfigure by hand. Do something like move the libraries, so

  they have to use the static binaries to recover the filesystem.

  This would probably give you a better idea of their problem

  solving skills.

  

- Wow -- this sounds like THE acid test for wanna-bes. If

  somebody had shown me just a glimpse 20 years ago of all the

  nasty things that might (and did) happen, I would have probably

  chosen another vocation.

  

- I have always considered fsck to be a simple task interactively,

  provided you remember to umount the drive. And I had a 36GB hard

  drive I needed to fsck, where I found about about the '-y'

  parameter (after 5 minutes of hitting y, Enter).

  

  

Several people asked for copies of the complete test once I was

done with it. Here it is. You'll notice where I have incorporated

some of the above suggestions into it. I didn't come up with a more

thorough test, as it was cancelled on me.

------------- begin -------------

Screen is blank

  - output device is ttya instead of screen. admin needs to

    fix eeprom setting.

System doesn't boot up, tries to boot from net.

  - usually this means that the eeprom setting "boot-device"

    is set to "net" rather than "disk". Instead, what happens

    sometimes after an error is that the setting "diag-switch?"

    is set to true rather than the default "false", which

    means it looks in the setting "diag-device" to decide where

    to boot from. "diag-device" is usually set to "net".

    To fix, need to change the eeprom setting "diag-switch?"

    from true to false.

System can't find boot block

  - often happens after restoring the root filesystem, and

    forgetting to install the bootblock on the disk. Admin

    needs to boot from cdrom, and run the "installboot"

    command on the appropriate disk. I can simulate this

    condition by doing a dump and restore.

/usr needs an fsck

  - the system can't boot up if it can't mount /usr, which it

    can't do if there are errors in the filesystem. I can

    manually corrupt the filesystem so it requires an fsck.

    I want to ensure that /usr is separate from / so that it

    doesn't conflict with the previous test.

  - we can be really sneaky and also have the system try to mount

    the wrong partition. This happens with a typo in the

    vfstab file. Will need to use the "format" command to

    find the correct filesystem to mount.

system hangs after a reboot

  - bad entry in the /etc/system file. set maxusers=0.

    Admin will have to boot from cdrom to fix.

can't login as root -- no shell

  - this is a common problem when people improperly change

    root's shell from /sbin/sh. Need to boot from cdrom

    to fix.

user cannot login as root from any terminal

  - caused by a space CONSOLE= setting in the

    /etc/default/login file, boot -s to fix.

convert a system from complete standalone to fully networked.

  - given the list of required info, such as NIS domainname,

    NIS server name and IP, interface names and IPs, subnet

    mask(s), default router, DNS domain name and name servers,

    have the admin bring up networking from the ground up.

------------- end -------------

Finally, here is my original query:

--- On Dec. 15, 1999, Mike van der Velden <mvanderv@yahoo.com> wrote:

> Hello,

>

> I have to design a small "hands-on" test for some prospective

> system administrators. This test will include troubleshooting

> boot-up problems among other things.

>

> One thing I want to do is corrupt the /usr filesystem enough that

> an interactive fsck is required. I know that just powering off a

> system without a proper shutdown will require an fsck, but this

> usually happens automatically at the next bootup and fsck is able

> to fix it without much fuss.

>

> I want some fuss. Simply removing or renaming certain files will

> undoubtedly impair the bootup process, but it doesn't corrupt the

> filesystem in any way. I want the prospects to have to run fsck.

>

> Does anyone have a command that I can use to accomplish this? How

> can I create duplicate inodes or lost files? Or maybe you know of

> a bug in the OS that will trigger something like this.

>

> Or, perhaps I'm way off base, and I'm better off testing something

> else. Let me know.

>

> Thanks in advance for all your comments. I'll summarize after the

> tests are complete, just in case any of the prospects read this

> list as well. :)

>

> Thanks in advance.

> Mike van der Velden

Mike van der Velden

Insurance Corporation of British Columbia

=====

9 days to Y2k. 345 days until the new millenium.

_________________________________________________________

Do You Yahoo!?

Get your free @yahoo.com address at http://mail.yahoo.com

Comments

Got something to say?

You must be logged in to post a comment.