2011-06-21

Updating to rawhide (part deux)

Continuing from yesterday's travails of having a desktop running rawhide.

  1. Let the system cool down for a couple of hours and try to boot again.
  2. Download and run the Lenovo diagnostics for the system. Everything passes except for the battery and printer port (turned off in BIOS). Both were known issues. Fans and memory say they are ok.
  3. Boot into rescue mode. The rpm database seems rather hosed that even doing the standard chicken foot fixes of rm /var/lib/rpm/__*; rpm --rebuilddb does not fix. Decide that trying to recover the database at that point was not worth it.
  4. Nuke from orbit (its the only way to be sure.) Make tar file backups of /home (but forget to backup /etc/ssh and other files.) and then reinstall
  5. Decide that installing Fedora 15 i686 makes more sense than x86_64. The box only has 2 gigabytes of ram so not gaining much for 64 bit.
  6. Do a custom install of the system and then afterwords install various fonts, eclipse, javascript, and emacs packages I want. Do not install everygame this time, just the few I know will get played.
  7. Put the system into runlevel 3. This will get past any yum update issues to rawhide that kill X etc (ok more chickenfoot voodoo, but been burned too many times. Plus being in run level 3 seems to make the system a lot cooler which I figure might have been problem also.
  8. Do a yum --disablerepo="*" --enablerepo="rawhide" --skip-broken update. After 3 minutes of dep-checking, there are 1651 packages out of 3536 installed packages to update.
  9. After about 2 hours of downloads and updating, box finishes updates. Commands work and its time to see what breaks.
  10. Found first breakage: yum. The first time I ran it I got a traceback.. by doing a yum clean all this removed that issue Doing a yum update again though just gave me a long list of packages that supposedly needed to be updated but conflicted with the packages that I just had updated. 
  11. Time to reboot and see if I have something workable. systemd and plymouthd seemed to spend a long time trying to shut the box down. (I could see this via a shell I had on the box via ssh.. but after 4 minutes it finally killed the sshd daemon.)
  12. Go for a walk, come back and reboot the system the old fashioned way. System reboots into plymouth and stops halfway through the building the icon. Oook gets out hammer.. and boots system into rescue mode again. 
  13. Remove quiet rhgb from grub.conf. Turn off hiddenmenu so I can type in options if needed. Make the timeout longer than 0 seconds.
  14. Boot my first 3.0.0 kernel.. and watch the bootup output like God intended us to. See a long pause due to the fact that I ran rescue mode on the system and a restorecon is needed. See a string of errors from the bind mounted directories. Watch a looooong relabel happen. Go for another walk.
  15. Found what the freeze was. A watchdog reset on the kernel which crashes it at "Starting Configure read-only root support". What part of the oops I can get onto the screen has to do with modprobe.
  16. Ok lets see if a single gets past this. If it does then we can work around with an update afterwords... nope. This is before single comes into play. 
  17. Now its time to try the pre 3.0-rc3 kernel. The box does boot up but gdm dies for some reason leaving just the login background of the doves and the mechanical bird. Logging into a console gives me the ability to set the system into init 3. Oh boy we can go old school with startx.
  18. Next bug.. startx doesn't. Ooooh gnome doesn't like things. Looking at the log file, I see a lot of javascript exceptions "FIXME: Only supporting fixed size ARRAYs".
  19. yum update still doesn't work but in new weird ways:
    Error: Package: ghc-tar-0.3.1.0-11.fc16.i686 (rawhide)
               Requires: libHSold-locale-1.0.0.2-ghc7.0.2.so
               Removing: ghc-old-locale-1.0.0.2-16.3.fc15.i686 (@updates/15)
                   libHSold-locale-1.0.0.2-ghc7.0.2.so
               Updated By: ghc-old-locale-1.0.0.2-26.fc16.i686 (rawhide)
                   Not found
    
    but it is there.. so I am confused. A --skip-broken doesn't fix the issue either.
  20. Ok lets go ask Seth Vidal. And yes.. somehow asking the expert makes it work when he asks me to repeat the steps I have done before. We have a system with console and if I pull back into the hinterland of knowledge before gdm and such.. there was .xinitrc. Set that to
    exec startxfce4
    
    and I have a working X session after a startx.
  21. Change the run level so when it reboots to runlevel3.target and time for bed. Wait a reboot sits around trying to stop syslogd.
    Unit systemd-kmsg-syslogd.service entered failed state.
    systemd-kmsg-syslogd.service: main process exited, code=exited, status=218
    
    Over and over again... looks like the reboot is killing syslogd, and systemd thinks it should still be up so it starts it up only to kill it again.. or something like that.. it is the reason for a reboot needing a power-off versus working.

Well after a lot of steps.. I have a system that boots. I relearned a lot of tech support skills and various things like rhgb is bad for testing systems when they dont work :). Now to work with Lennart on why syslogd is whacked and with others on why kernel-3.0rc3 kills my box.

No comments: