Help - Search - Members - Calendar
Full Version: Asa Website And Bulletin Board Down
GlamisDunes.com > Sand Community Issues > American Sand Association
Pages: 1, 2
LoBuck
There is a problem with the server for the ASA website and Bulletin Board causing them to be down. Since ISDRATRT.org is located on the server, it is down as well. Attempts are being made to get the sites back up ASAP. Thanks for your patience.
Glamisbound
We're going on a week now, crack the whip!!
socaldmax
What's up with that?
Markie_Mark
Did Jason get pissed off and sabotage it?....oh well no worries that site is a thing of the past anyways..
socaldmax
Speaking of things of the past...



"Deepsand will be back season of 2006-2007"



how is this coming along?
Crusty
/\/\ ROFL
jhitesma
Jason's hands have been tied because while the server was still up....it wasn't letting any accounts login.

The delay is mainly because not being able to get in from remote we had to get someone physically over to the server to replace the failed drive and bring things back up. And the people who are close to the server were all in the dunes.

Thankfully I just got off the phone with Greg Gorman who's at the hosting facility now. He was able to bring it back up and it looks like nothing was lost....but it's back down right now while he works on transferring everything to a new drive and rebuilds the RAID. So it's going to be a few more hours but it should be back up today sometime.

Yarder
QUOTE(socaldmax @ Jan 4 2008, 02:06 PM) *
Speaking of things of the past...



"Deepsand will be back season of 2006-2007"



how is this coming along?


laugh1.gif
Double G
For those of you who care.....

We have a dual-CPU system running Debian Linux, with a ton of RAM. It has a MegaRAID controller that manages up to 6 SATA drives in any configuration you want.

The system is currently configured with 3 drives, one that is 80Gb that holds the OS and is the boot drive and is configured as an "array" even though it is only one disk. There are 2 250Gb drives configured as a second array that are "mirrored" (they are exact duplicates of each other) that is used to hold all of the websites.

Of course, the single-string disk is the one that failed... the RAID controller senses a failure and locks the drive out. Usually that is a good thing when you are running a mirror, but since there wasn't a second disk, everything stopped.

When I told the controller to bring the disk back on line, the server booted fine. But, we decided to swap out the drive with a new 160Gb one, and that will take about 10hr for the RAID controller to rebuild and migrate to it.

Once that is up and running then I'll put a 2nd 160Gb (maybe 2 more) and configure it to at least mirror. Then we won't have this kind of issue again hopefully.

Greg
Cookie
QUOTE(socaldmax @ Jan 4 2008, 01:06 PM) *
Speaking of things of the past...



"Deepsand will be back season of 2006-2007"



how is this coming along?



icon_hot.gif tongue.gif
HozaykwAIRvo
Thanks for the heads up Greg 25cheers.gif
jhitesma
See, that's why I make it a rule not to give time estimates....I'm always wrong on them!

Double G
Even mine is a SWAG - Super Wild Ass Guess....started the repair and after 1hr it was 6% done....10hr may not be even close!

Greg
GRANT@FUNCO
QUOTE(gman @ Jan 4 2008, 03:21 PM) *
For those of you who care.....

We have a dual-CPU system running Debian Linux, with a ton of RAM. It has a MegaRAID controller that manages up to 6 SATA drives in any configuration you want.

The system is currently configured with 3 drives, one that is 80Gb that holds the OS and is the boot drive and is configured as an "array" even though it is only one disk. There are 2 250Gb drives configured as a second array that are "mirrored" (they are exact duplicates of each other) that is used to hold all of the websites.

Of course, the single-string disk is the one that failed... the RAID controller senses a failure and locks the drive out. Usually that is a good thing when you are running a mirror, but since there wasn't a second disk, everything stopped.

When I told the controller to bring the disk back on line, the server booted fine. But, we decided to swap out the drive with a new 160Gb one, and that will take about 10hr for the RAID controller to rebuild and migrate to it.

Once that is up and running then I'll put a 2nd 160Gb (maybe 2 more) and configure it to at least mirror. Then we won't have this kind of issue again hopefully.

Greg



WHATCHU TALKIN BOUT WILLIS !!!
KingGlamis
QUOTE(GRANT@FUNCO @ Jan 4 2008, 06:33 PM) *
QUOTE(gman @ Jan 4 2008, 03:21 PM) *
For those of you who care.....

We have a dual-CPU system running Debian Linux, with a ton of RAM. It has a MegaRAID controller that manages up to 6 SATA drives in any configuration you want.

The system is currently configured with 3 drives, one that is 80Gb that holds the OS and is the boot drive and is configured as an "array" even though it is only one disk. There are 2 250Gb drives configured as a second array that are "mirrored" (they are exact duplicates of each other) that is used to hold all of the websites.

Of course, the single-string disk is the one that failed... the RAID controller senses a failure and locks the drive out. Usually that is a good thing when you are running a mirror, but since there wasn't a second disk, everything stopped.

When I told the controller to bring the disk back on line, the server booted fine. But, we decided to swap out the drive with a new 160Gb one, and that will take about 10hr for the RAID controller to rebuild and migrate to it.

Once that is up and running then I'll put a 2nd 160Gb (maybe 2 more) and configure it to at least mirror. Then we won't have this kind of issue again hopefully.

Greg



WHATCHU TALKIN BOUT WILLIS !!!


I'll translate for you Grant...

Basically one shock was out of nitrogen, the turbo seized up, the shifter needed to be adjusted and the HIDs had to be rewired. But with the right mechanic on the job all of that was an easy fix, just time consuming. icon_biggrin.gif
APHANTOMDUCK
QUOTE(gman @ Jan 4 2008, 03:21 PM) *
For those of you who care.....

We have a dual-CPU system running Debian Linux, with a ton of RAM. It has a MegaRAID controller that manages up to 6 SATA drives in any configuration you want.

The system is currently configured with 3 drives, one that is 80Gb that holds the OS and is the boot drive and is configured as an "array" even though it is only one disk. There are 2 250Gb drives configured as a second array that are "mirrored" (they are exact duplicates of each other) that is used to hold all of the websites.

Of course, the single-string disk is the one that failed... the RAID controller senses a failure and locks the drive out. Usually that is a good thing when you are running a mirror, but since there wasn't a second disk, everything stopped.

When I told the controller to bring the disk back on line, the server booted fine. But, we decided to swap out the drive with a new 160Gb one, and that will take about 10hr for the RAID controller to rebuild and migrate to it.

Once that is up and running then I'll put a 2nd 160Gb (maybe 2 more) and configure it to at least mirror. Then we won't have this kind of issue again hopefully.

Greg


The more I learn about web servers and the like, them more I find out what I don't know.

This being said Greg - why are you using Debian Linux vs the Microsoft Server product?
responder
why did it go down.... did the guy in the "blue funco" crash it???






that was kinda funny.... right?
socaldmax
The short version is... Linux works, Microsoft sux ballz.
Greg Hall
QUOTE(gman @ Jan 4 2008, 03:21 PM) *
For those of you who care.....

We have a dual-CPU system running Debian Linux, with a ton of RAM. It has a MegaRAID controller that manages up to 6 SATA drives in any configuration you want.

The system is currently configured with 3 drives, one that is 80Gb that holds the OS and is the boot drive and is configured as an "array" even though it is only one disk. There are 2 250Gb drives configured as a second array that are "mirrored" (they are exact duplicates of each other) that is used to hold all of the websites.

Of course, the single-string disk is the one that failed... the RAID controller senses a failure and locks the drive out. Usually that is a good thing when you are running a mirror, but since there wasn't a second disk, everything stopped.

When I told the controller to bring the disk back on line, the server booted fine. But, we decided to swap out the drive with a new 160Gb one, and that will take about 10hr for the RAID controller to rebuild and migrate to it.

Once that is up and running then I'll put a 2nd 160Gb (maybe 2 more) and configure it to at least mirror. Then we won't have this kind of issue again hopefully.

G


Thanks for the report Greg! Sounds like fun......
http404
QUOTE(APHANTOMDUCK @ Jan 4 2008, 07:59 PM) *
This being said Greg - why are you using Debian Linux vs the Microsoft Server product?



Good one!
APHANTOMDUCK
I really was not attempting to be a smart ass with this question.

I'm attempting to learn more about web sites and the interworking of them and admire what information Greg has on the subject.
QueenGlamis
QUOTE(socaldmax @ Jan 4 2008, 02:06 PM) *
Speaking of things of the past...



"Deepsand will be back season of 2006-2007"



how is this coming along?


rotf.gif OWNED! moof.gif
jhitesma
QUOTE(APHANTOMDUCK @ Jan 5 2008, 09:14 AM) *
I really was not attempting to be a smart ass with this question.

I'm attempting to learn more about web sites and the interworking of them and admire what information Greg has on the subject.



Not trying to be a smart ass back but...

In this situation the question has about as much to do with the problem as asking someone who just broke their tranny what brand of gas they're running. The problem was hardware failure, the choice of OS really doesn't any kind of difference. It was the drive and the controller (which work the same under linux or windows) that caused things to crash and burn. The delay in fixing it also isn't OS related - it's just a matter of some lapses of communication coupled with key people being out of reach at a bad time that has caused it to take this long to fix. Thankfully we now have more information about the services available at the current co-lo facility and in the future this much down time should be easy to avoid.

That out of the way. I'll say as a full time professional web guy since before most people even knew what the internet was I can't think of any reason for hosting a website on windows. Why?

1) Easy of remote maintenance. I can login and solve most problems on a unix server from a command line before I can even get a screen up on a windows remote server connection. Sometimes command lines are more efficient. I love a graphical desktop on my desktop - but on my servers the low overhead of a console just can't be beat for saving me time - and time is after all money. Plus many changes to a windows server that would require a reboot can be done on a unix server without rebooting. Which brings me to....

2) Uptime. I can't remember the last time I had to reboot one of my unix servers. The only time they go down is for hardware upgrades (and with hot swappable drives and power supplies a lot of times even that can be avoided) or when the power goes out. I don't power down my windows machines at night...but at least once a week they either crash or need to be rebooted because of memory leaks.

3) Software. I'm a HUGE fan of the "LAMP" stack (Linux/Apache/MySql/PHP) the M and P can be replaced (Perl, Python, Java, Ruby or any of a number of other server side scripting languages for the P and Postgress, Oracle or a number of other DB servers for the M) but Linux and Apache are a dynamite combo. The flexibility and easy of configuration blows windows out of the water for the kind of work I do. Apache MySQL and PHP all have windows versions as well....but they're native to unix and the windows versions are not nearly as stable or supported. I've deployed a number of sites on WAMP setups (or even worse WIMP using IIS instead of Apache...and/or MsSql instead of MySql...and in one cause <shudder>access</shudder>)

4) Cost. When you're buying from a hosting company the cost difference isn't as noticeable since you're paying for service on top of everything else. But windows for servers is not cheap and until very recently (last week) had limits on how many connections it could deal with at one time without upgrading to a more expensive version (or running multiple servers.) The hidden cost to a lot of people is development cost. Under unix pretty much all the tools you could dream of (compilers, IDE's, editors, scripting languages, graphics libraries, network libraries....) are free and top-notch. Under windows there are some questionable but affordable tools, some barely usable free tools, and some nice but incredibly expensive high end tools. Most of the server side stuff on windows (running under IIS) is pretty much designed to make it easier if you have the high end tools to help sell tools.


That's 4 quickies off the top of my head, trying to to get too geeky or detailed for the casual reader icon_wink.gif I could go on around a campfire of nerds for hours if I had to...but I'm really not in the mood right now...our annual Christmas tree bonfire is tonight (the anniversary of Amy and I meeting) and I should get some work done first.

Short version - having been building websites since before there were hosting options on windows (heck there wasn't even a stock TCP/IP stack for windows when I started building sites!) it still feels to me like Windows is playing catch-up and not doing a very good job at all offering a limited set of functionality and higher cost of entry with no real gains.

FWIW - quite often I end up having to make low level changes to things to help support my clients needs. This isn't the kind of thing someone with a copy of frontpage making a simple site would be up against. Or even someone installing off the shelf packages like a CMS, BBS, Gallery, mailing list or other commonly available software packages. I do things like write those tools or build custom tools (one recent tool had to talk to an outdated SCADA system and the BOR's admittedly poorly designed database so a local water district could build their own feeds to their employees out in the field for their wireless devices as well as available in another form from any available web browser.) So for the work I do having access to free professional quality development tools and compilers with no restrictions is a HUGE benefit. But if all you're looking to do is to install a CMS package or BBS then go with what the package creator(s) recommend and support the best. I won't even get into a public debate of one CMS or BBS vs. another - that's holy war territory that makes radical islamists look as fierce as quakers.

Greg Hall
QUOTE(jhitesma @ Jan 5 2008, 12:19 PM) *
QUOTE(APHANTOMDUCK @ Jan 5 2008, 09:14 AM) *
I really was not attempting to be a smart ass with this question.

I'm attempting to learn more about web sites and the interworking of them and admire what information Greg has on the subject.



Not trying to be a smart ass back but...

In this situation the question has about as much to do with the problem as asking someone who just broke their tranny what brand of gas they're running. The problem was hardware failure, the choice of OS really doesn't any kind of difference. It was the drive and the controller (which work the same under linux or windows) that caused things to crash and burn. The delay in fixing it also isn't OS related - it's just a matter of some lapses of communication coupled with key people being out of reach at a bad time that has caused it to take this long to fix. Thankfully we now have more information about the services available at the current co-lo facility and in the future this much down time should be easy to avoid.

That out of the way. I'll say as a full time professional web guy since before most people even knew what the internet was I can't think of any reason for hosting a website on windows. Why?

1) Easy of remote maintenance. I can login and solve most problems on a unix server from a command line before I can even get a screen up on a windows remote server connection. Sometimes command lines are more efficient. I love a graphical desktop on my desktop - but on my servers the low overhead of a console just can't be beat for saving me time - and time is after all money. Plus many changes to a windows server that would require a reboot can be done on a unix server without rebooting. Which brings me to....

2) Uptime. I can't remember the last time I had to reboot one of my unix servers. The only time they go down is for hardware upgrades (and with hot swappable drives and power supplies a lot of times even that can be avoided) or when the power goes out. I don't power down my windows machines at night...but at least once a week they either crash or need to be rebooted because of memory leaks.

3) Software. I'm a HUGE fan of the "LAMP" stack (Linux/Apache/MySql/PHP) the M and P can be replaced (Perl, Python, Java, Ruby or any of a number of other server side scripting languages for the P and Postgress, Oracle or a number of other DB servers for the M) but Linux and Apache are a dynamite combo. The flexibility and easy of configuration blows windows out of the water for the kind of work I do. Apache MySQL and PHP all have windows versions as well....but they're native to unix and the windows versions are not nearly as stable or supported. I've deployed a number of sites on WAMP setups (or even worse WIMP using IIS instead of Apache...and/or MsSql instead of MySql...and in one cause <shudder>access</shudder>)

4) Cost. When you're buying from a hosting company the cost difference isn't as noticeable since you're paying for service on top of everything else. But windows for servers is not cheap and until very recently (last week) had limits on how many connections it could deal with at one time without upgrading to a more expensive version (or running multiple servers.) The hidden cost to a lot of people is development cost. Under unix pretty much all the tools you could dream of (compilers, IDE's, editors, scripting languages, graphics libraries, network libraries....) are free and top-notch. Under windows there are some questionable but affordable tools, some barely usable free tools, and some nice but incredibly expensive high end tools. Most of the server side stuff on windows (running under IIS) is pretty much designed to make it easier if you have the high end tools to help sell tools.


That's 4 quickies off the top of my head, trying to to get too geeky or detailed for the casual reader icon_wink.gif I could go on around a campfire of nerds for hours if I had to...but I'm really not in the mood right now...our annual Christmas tree bonfire is tonight (the anniversary of Amy and I meeting) and I should get some work done first.

Short version - having been building websites since before there were hosting options on windows (heck there wasn't even a stock TCP/IP stack for windows when I started building sites!) it still feels to me like Windows is playing catch-up and not doing a very good job at all offering a limited set of functionality and higher cost of entry with no real gains.

FWIW - quite often I end up having to make low level changes to things to help support my clients needs. This isn't the kind of thing someone with a copy of frontpage making a simple site would be up against. Or even someone installing off the shelf packages like a CMS, BBS, Gallery, mailing list or other commonly available software packages. I do things like write those tools or build custom tools (one recent tool had to talk to an outdated SCADA system and the BOR's admittedly poorly designed database so a local water district could build their own feeds to their employees out in the field for their wireless devices as well as available in another form from any available web browser.) So for the work I do having access to free professional quality development tools and compilers with no restrictions is a HUGE benefit. But if all you're looking to do is to install a CMS package or BBS then go with what the package creator(s) recommend and support the best. I won't even get into a public debate of one CMS or BBS vs. another - that's holy war territory that makes radical islamists look as fierce as quakers.


And that was Jason's short answer!!!!
socaldmax
QUOTE(Greg Hall @ Jan 5 2008, 02:32 PM) *
And that was Jason's short answer!!!!




laughing.gif


Yeah, but it was good, and I read it all. laughing.gif


It's occurred to me that as cheap as drives are, perhaps more and newer drives are in order for the server. You know, planning ahead rather than chasing the problem.
Greg Hall
QUOTE(socaldmax @ Jan 5 2008, 02:39 PM) *
QUOTE(Greg Hall @ Jan 5 2008, 02:32 PM) *
And that was Jason's short answer!!!!




laughing.gif


Yeah, but it was good, and I read it all. laughing.gif


It's occurred to me that as cheap as drives are, perhaps more and newer drives are in order for the server. You know, planning ahead rather than chasing the problem.


It looks like Greg is doing just that.....I certainly hope it works out as planned.
JDMeister
Hot spares

Both hardware and software implementations may support the use of hot spare drives, a pre-installed drive which is used to immediately (and automatically) replace a drive that has failed, by rebuilding the array onto that empty drive. This reduces the mean time to repair period during which a second drive failure in the same RAID redundancy group can result in loss of data, though it doesn't eliminate it completely; array rebuilds still take time, especially on active systems. It also prevents data loss when multiple drives fail in a short period of time, as is common when all drives in an array have undergone similar use patterns, and experience wear-out failures. This can be especially troublesome when multiple drives in a RAID set are from the same manufacturer batch.

Linux RAID Link

There are number of different RAID levels:


# Level 0 -- Striped Disk Array without Fault Tolerance: Provides data striping (spreading out blocks of each file across multiple disk drives) but no redundancy. This improves performance but does not deliver fault tolerance. If one drive fails then all data in the array is lost.

# Level 1 -- Mirroring and Duplexing: Provides disk mirroring. Level 1 provides twice the read transaction rate of single disks and the same write transaction rate as single disks.

# Level 2 -- Error-Correcting Coding: Not a typical implementation and rarely used, Level 2 stripes data at the bit level rather than the block level.

# Level 3 -- Bit-Interleaved Parity: Provides byte-level striping with a dedicated parity disk. Level 3, which cannot service simultaneous multiple requests, also is rarely used.

# Level 4 -- Dedicated Parity Drive: A commonly used implementation of RAID, Level 4 provides block-level striping (like Level 0) with a parity disk. If a data disk fails, the parity data is used to create a replacement disk. A disadvantage to Level 4 is that the parity disk can create write bottlenecks.

# Level 5 -- Block Interleaved Distributed Parity: Provides data striping at the byte level and also stripe error correction information. This results in excellent performance and good fault tolerance. Level 5 is one of the most popular implementations of RAID.

# Level 6 -- Independent Data Disks with Double Parity: Provides block-level striping with parity data distributed across all disks.

# Level 0+1 – A Mirror of Stripes: Not one of the original RAID levels, two RAID 0 stripes are created, and a RAID 1 mirror is created over them. Used for both replicating and sharing data among disks.

# Level 10 – A Stripe of Mirrors: Not one of the original RAID levels, multiple RAID 1 mirrors are created, and a RAID 0 stripe is created over these.

# Level 7: A trademark of Storage Computer Corporation that adds caching to Levels 3 or 4.

# RAID S: EMC Corporation's proprietary striped parity RAID system used in its Symmetrix storage systems.

yay.gif
APHANTOMDUCK
Jason, I fully understood the hardware failure and accepted Greg's explanation of the same.

In my learning curve of this stuff I always see a lot of debate about what product to use. The primary reason I hear about Linux (an open source version of Unix) is the ability to use the open source (code) for development instead of not having that option with the Microsoft product(s).

I understand that there are several approaches for server technology under Linix and little problems with many of the same. A couple of folks I know who work for Microsoft in the corporate customer server technical services division (for lack of a better description) explain to me Linix (Unix) products have some good strong technology but have their own set of issues on par with some of the Microsoft server products.

At our business, we own two web servers in anticipation of our ecommerce web site launch. When I talked with the consultants who help us (two very highly regarded pro's who deal with corporate America on a daily basis and one owns quite a large hosting company and helps for free) they both explained that while some other products are used in the big picture, "the world talks Bill".

I looked closely at the following web site before making a decsion on server products. This web site was of great help.
But when one gets down to the business of making a profit - Microsoft seems to have won the war in corporate America.

Learning this stuff is sure tough!

[Edited to add new information]
MichaelAZ
Maybe ask Markie Mark how much he wants for his hardware since it isnt doing anything dunno.gif
MWBbanshee
QUOTE(MichaelAZ @ Jan 5 2008, 05:20 PM) *
Maybe ask Markie Mark how much he wants for his hardware since it isnt doing anything dunno.gif

Damm MAZ most peopledon't slam a door that hard.
Glamisbound
Vanna, spin me a letter...I'm sure Woody is dying to post about the Monster Ride. Heck, they don't even know I blew up my tranny right after we parted ways after our 76 mile dune run.
Double G
Jason pretty much answered the "why Linux" question. In general, Linux is much more stable and scalable and especially cost-effective for cash-conserving organizations such as the ASA. Most of the bigger websites use some kind of Unix system simply because they work lots better than Windows/IIS.

Basically the failure was the "C:" drive for you windows folks, it's where the Linux operating system is and is typically called the "root" drive or "/". We have all of the websites and their data on the "D:" drive (that's the mirrored dual-disk RAID) except for the MySQL database (might be a good idea to move it...).

The controller that the disk is plugged into recognizes a failure and immediately pulls the disk off-line. Like I said before, this is *usually* a good idea _IF_ you have either a) a hot-spare or b) a mirrored drive. Since that disk did not, things started to go wacky. This looks like it was an intermittent failure because the disk came right back up when I enabled it. However, we need to replace it soon.

When we bought that server (2-3?) years ago disks of the size we chose were still very pricey. We took a chance on the boot drive being single-string. Everything else about the server is high-reliability: dual power supplies, hot-swappable drives, dual CPU, dual fans, etc.

I have asked Scott to get us another 250G drive for the data RAID (it already is dual-redundant, this will give us a 'hot spare') and another 2 160G drives for the boot. Then we can have full hot-swap, fail-safe RAID for everything.

One of those things, you just have to work through it. If I had known it was down before my week-long trip to the dunes I could have configured the drive back on-line in a few minutes...but to do that we have to get someone to lay hands on the box so it could not be done remotely.

Greg
Markie_Mark
All this fun at my expense and I had no idea...good one but not very original...
Glamisbound
Down again. Any word???
mnimud
QUOTE (Glamisbound @ Apr 30 2008, 08:57 AM) *
Down again. Any word???


Whats the 10 code for being Fed up? I'm 5149 I tell ya!
Robbie
QUOTE (mnimud @ Apr 30 2008, 11:42 AM) *
QUOTE (Glamisbound @ Apr 30 2008, 08:57 AM) *
Down again. Any word???


Whats the 10 code for being Fed up? I'm 5149 I tell ya!


laugh1.gif Thats a good one, hadn't heard it before
Glamisbound
Are you Code 5 Sloppy?
mnimud
Hells ya I'm Code 5!!! Can't stop checking on it. It can't be down forever. I'm fricken bored AND I got calender page stuff to get work'n on.
jhitesma
The server is still running. But there's been another disk error so none of the website is accessible.

We're working on restoring it but due to other commitments and limitations on who can do what and when they're available it's taking longer than we would like.

mnimud
5149.5
Glamisbound
Thanks Jason.

Let's pass some time.

Here's a code test for Jon:

1. HBD
2. TC
3. OIS
4. T4
5. Code 8
6. 10-16
jhitesma
Latest word is that the drive controller went bad in the server. They're trying to get a new controller in there so we can find out what the status of the actual drives is next.

Let's just all be glad that GD and the ASA have never both gone down a the same time. There could be riots icon_biggrin.gif
APHANTOMDUCK
That figures, a $ 200.00 part fails and the whole day is ruined.

Jason, is this something that can be tested from time to time? We have three servers and I'd hate to have the problems you folks are having now.
jhitesma
QUOTE (APHANTOMDUCK @ May 1 2008, 01:24 PM) *
That figures, a $ 200.00 part fails and the whole day is ruined.

Jason, is this something that can be tested from time to time? We have three servers and I'd hate to have the problems you folks are having now.


Well, latest news is it may just be a drive and not the controller after all.

Most enterprise level drives and controllers do have built in diagnostics and can usually give warning when a drive is showing signs of impending failure...but not all failures give warnings. Since we just upgraded the drives back in December it's really odd that they'd die again this soon - but that's kind of how drives are usually. They'll either die in the first 6 months or they'll last for years. Slow failures like bearings wearing out or sectors getting corrupted are easy to detect and most modern OSes will alert you that there is trouble coming up. But there's always the possibility of sudden catastrophic failure that is completely undetectable. That's why we run most of the system on a RAID where data is written simultaneously to two separate drives so if one fails we have a hot backup.

Unfortunately there was still one drive in the system that wasn't setup that way - and that's the one that crashed. We had planned on moving it to a mirrored setup but hadn't had time to deal with it yet. I don't know exactly what failed just yet but since we weren't getting any warnings and the drives all checked out great just 6 months ago I'm guessing it was a sudden catastrophic failure. We did have some power issues with the datacenter in the weeks just prior to the crash which caused a number of unexpected shut-downs...and in general servers don't like that kind of thing, they prefer to have warning before the power is removed so drives can be parked and caches can be dumped. Thankfully due to modern journaled file systems and RAID controllers unexpected power failures don't usually result in loss of data like they used to. But it's still not very good for the hardware to have it suddenly loose power without warning. So there's a good chance that the power issues at the datacenter contributed to the sudden demise of this drive. But again I don't have enough first hand data to say for sure since I'm not there dealing with the problem this time.

The situation the ASA is in isn't quite what most people have with hosting due to a number of reasons. Due to volunteers offering free and low cost hosting the ASA ended up buying their own server and owning their own hardware since at the time it was more cost effective than buying space/bandwidth on an ISP owned server. The hosting market has changed since then and there are now lots of options where you can lease the hardware and get just as good of a deal - and the ASA has even changed hosts since the deal but since we already owned the server we kept it instead of moving to a leased platform where someone else deals with the hardware issues. We're already payed up for awhile with the current host so we won't be moving anything right away...but once the server is back up we are having discussions about looking into other hosting arrangements where the ISP provides more management of the hardware so we won't have to deal with trying to drag volunteers into the datacenter to fix things when stuff like this goes down. Times change and what was the best option a year ago isn't always still the best option today...and the ASA's situation goes back to decisions that had to be made quickly several years back due to issues with the ISP they were with at the time. It's not a situation that very many people who host sites find themselves in so I wouldn't worry too much. Unless you own your own hardware and don't have a management contract on it you probably don't have much if anything to worry about.


socaldmax
I suppose it's a good thing that isn't a mission critical site.

Murphy's Law guaranteed that that drive would fail, since it was the one that wasn't mirrored.
jhitesma
QUOTE (socaldmax @ May 1 2008, 10:07 PM) *
I suppose it's a good thing that isn't a mission critical site.

Murphy's Law guaranteed that that drive would fail, since it was the one that wasn't mirrored.


Yep. And plans were to mirror it. Just no one had time to do it yet.

HozaykwAIRvo
QUOTE (jhitesma @ May 1 2008, 10:42 AM) *
Let's just all be glad that GD and the ASA have never both gone down a the same time. There could be riots icon_biggrin.gif



NO DOUBT! scared.gif
jhitesma
Latest update:

It appears all the data on the drives is still intact - however the drive controller is having serious issues. As a result an entirely new server has been rush ordered and should be here early this week. Once the data from the old drives is copied over and verified the site should be back up soon.
Glamisbound
Checking daily. Thanks for the updates Jason.
jhitesma
New server arrived today. Scott got the OS loaded along with the DB and Webserver software and has it configured so he can drop it off at the co-lo (hosting company) tomorrow. There's still the issue of transferring our data from the old drives...but we could be back up fairly soon now.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2012 Invision Power Services, Inc.