No one wants downtime. In this age of Web 2.0 (or even more), the usage of internet based applications has been redefined. People tend to engage themselves in much more volume than ever before, making it even harder for infrastructure guys to keep up the servers in full health. Server uptime is often one of the key factors for customer retention. Mashups built on top of open API based services are dependent on the health of the data source. For example, if Twitter is down, no Twitter-mashups may work without its own previously stored tweets. For complex, Line of Business and mission critical systems such as financial applications, availability is a vital aspect in success of the business. Whether the application is being served from on-premise servers or third party datacenters, there are numerous things that can go wrong and may directly impact availability.
1. Web or Database server got corrupted
No wonder. It can happen. Anytime! There are various reasons as to why it may occur. You may have automatic Windows Updates turned on, and it got itself installed new patches that are not working so well with your software. You may have installed new software, hardware or driver for your newly installed hardware, weak firewall, Antivirus or Antispyware got failed, disk failure and many more. Security threats are up-to-date and intelligent, so if you do not regularly install Service Packs, fixes, this may result in to server crash too. The resolution may be reinstalling Windows and setting up all other applications necessary, which may lead to a really long downtime.
2. Hard disk failure
The single most important hardware in your server is probably the hard disk. You may have static resources cached in many places like RAM, Content Delivery Network, etc. However, dynamic queries like displaying list of items from inventory, getting a list of currently logged on user and so on, involve database. All kinds of databases store data in hard disk. File and database operations are more dependent on hard disk usage than other operations. Large and complex queries, even excessive calls to less complicated queries and file operations may be responsible for a relatively cheaper hard disk to get overheated, and hence turn itself down or get damaged. Hard disk failures are often hard to fix, costly and a big threat to application stability unless regular and proper backups were taken or there is no effective replication implemented in place.
3. Database replication is not easy
If your primary database server goes offline, you need a standby server that can serve requests almost instantly with complete and most current data. Replication ensures propagation of the update to the slaves, the primary server has just made. The slave then acknowledges, thus allows the sender to ripple the same through subsequent slaves. While distributed databases are not technically difficult to implement in local datacenter, performance-wise they are not the same since the machines are networked. On the other hand, to mitigate the situation of natural disasters, terrorist attacks, or such bring your datacenter down, you may want to have distributed datacenters in different continents across the globe.
4. Power Outages and Internet Cable-cuts
Information superhighways are super connected with each other. When a user hits your website, the request travels across several hops around the world to reach your server. Some of them may not have backup connectivity which may make your site unavailable to various parts of the world when they are down. Although, most well reputed hosting providers have connectivity to good backbones which also take care of decreasing the number of hops by intelligent routing, hence reducing chance of downtime. Power outages can happen in many ways and frequency of it varies in different countries. It is even harder for on-premise server environment to stay online without backup power generators. Both power and internet outages may occur very few times a year.
5. Server Monitoring is painful
There are so many things to look out regularly to ensure the servers stay in maximum health. You have to keep an eye on how it performs in extreme load, response/second to the requests over time, request execution time, disk read/write time, memory usage, CPU usage and so on. Although there are tools which can do this job for you, however, you still have to go through the logs and reports to find out what went wrong, when, and figure out possible reasons.
6. Architecting network is challenging
Network architecture for your internet application should be done well. To scale out you need more servers to be added to the system. The network architecture that can handle scalability, load balance the servers in harmony, keep network latency at minimum, replications requires network experts and visionary architects. Network latency directly affects user experience, so to keep it low, you may need to have datacenters established across the continents. A good architecture must have a door open for that too. Bad network architecture not only restricts you to flawlessly scale your business and present user with good experience, but also troubleshooting it can cause you sleepless nights.
7. Scaling is inconvenient
We, the programmers and architects may not always write code and design keeping scalability in mind. Even if we do, it often happens that userbase goes beyond expectation that the whole architecture requires to be redesigned, or the product finds less userbase than anticipated that most of the IT infrastructure stay underused, hence cause waste in operating cost. Both ways, scaling is a difficult decision to make. Whether we would like to scale up, down or out, it involves hundreds other technology and business criteria to be taken into consideration. Scaling does not happen right away as you have to have several meetings with IT department, business strategists, policy makers, etc. in your company to come an agreement.
8. More attention on infrastructure layer
The casualties take place in infrastructure layer distracts the whole company whether you are a developer or business executive and gets you put less attention to your work. You have a demo to a potential customer and your demo server stopped working right before it. This server is down, that server is up, but that functionality is not working, we are blocked on background services to run, this server has the wrong URL redirection – these are the most common sentences you hear in the morning and throughout the day in a company that uses IT.
In this post, I have shown 8 key pain points of traditional server systems. Hope this will give you a little glimpse of a few challenges of a living software.

Pingback: DotNetShoutout
Pingback: Tanzim Saqib » The ‘Web of Pain’ for Those... | .NET, Architecture, Cloud Computing and Database | Syngu
Pingback: Programming Office 365 Jump Start 1 | Tanzim Saqib