LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   Another IT disaster! You don't even need malware, just incompetence. (https://www.linuxquestions.org/questions/general-10/another-it-disaster-you-dont-even-need-malware-just-incompetence-4175606807/)

hazel 05-28-2017 01:15 AM

Another IT disaster! You don't even need malware, just incompetence.
 
The whole of British Airways closed down yesterday when their IT systems crashed. It will take days to clear up the mess.

This time there was no sign of malware. It was just a systems crash. Employees of the company say that the IT department has been starved of funds and important maintenance work has been contracted out.

syg00 05-28-2017 03:02 AM

What chance they decided to use the long weekend for some work in the datacentre ?.
I've worked for a vendor that had customers (BIG banks) that decided to do major upgrades over the "quiet" time between Christmas and New Year.
Sometimes (not always) the customers didn't even notice .... ;)

Turbocapitalist 05-28-2017 04:07 AM

Quote:

Originally Posted by hazel (Post 5716110)
It was just a systems crash.

A systems crash or were they trying to run production services on top of M$?

I'm pretty jaded about these kind of things and when I see that they have avoided mentioning specifics it usually turns out that M$ products were the software culprits. We'll have to wait and see if more information is ever released. Remember the LSE.

Though if it is the case they have deployed M$ products instead of production-ready systems, then the real culprits are the managers that brought in the M$ staff and told them to deploy M$ products in a production environment. So we might not ever find out what really happened, especially if it was M$ related.

An anecdote: I recall one bank that went down hard for weeks. It was actually months before they could really resume normal operations. During the first weeks many could not transfer money into or out of their accounts. That applied to businesses as well as individual private customers. On occasion, it was possible to access other accounts and transfer money in or out, but not your own. It turned out to be 100% an M$ problem, mainly Sharepoint. However, the media loyally reported on 'computer glich' and 'IT problems' instead. When pressed, they repeated the talking point "IBM is responsible for the network" though the problem was unrelated to the network.

Soadyheid 05-28-2017 10:59 AM

Quote:

What chance they decided to use the long weekend for some work in the datacentre ?.
Er... unlike banks which go quiet on holiday (Bank Holiday?) weekends, Airlines volume of business increases!

The press was quoting some sort of power failure? Huh? I really can't believe that in this day and age a company as global as British Airways doesn't have at least a second DR Datacentre, if not more, fed by multiple power sources and backed up by diesel generators!

I'd reckon they're still investigating and the media are just putting out statements which they think Joe Public will accept as the cause.

Play Bonny!

:hattip:

syg00 05-28-2017 06:33 PM

Neither do banks - hence my use of double quotes.

Turbocapitalist 05-29-2017 06:56 AM

It looks like there are several different kinds of stupid involved, above and beyond simply outsourcing core, mission-critical functions.

Circumstantial evidence is starting to pile up. Here are some relevant items from last year:

1a. BA faces IT staff protest over offshoring
1b. Fears for up to 800 UK jobs as British Airways hires Indian firm Tata Consultancy Services to provide IT services

2. TCS Recognized as a Global Leader in Microsoft Enterprise Applications Implementation by IDC MarketScape

In all likelihood, M$ has flown out several teams of their highest paid PR flunkies to help draft some more misleading announcements to be published soon, maybe even to hide the BA board of directors involvement. I'd guess no major decisions, like those leading up to the disaster, could be made without the board members' knowledge and approval.

Jjanel 05-29-2017 07:49 AM

RHEL? https://www.redhat.com/en/about/pres...cloud-platform
Quote:

The Group’s work with Red Hat started in 2003, with British Airways selecting Red Hat Enterprise Linux as its Linux-based platform to support core operational functions, including flight and passenger management. International Airlines Group was formed in 2011 following the merger of British Airways and Iberia and has since acquired Aer Lingus and Vueling. The core commitment to an open source technology backbone remained, with Red Hat Virtualization now running on more than 600 physical servers and supporting more than 7,500 virtual machines.
August 9, 2007 an electrician accidentally blew out the power to datacenter where cisco.com had webservers, with no dynamic web content failover implemented. https://blogs.cisco.com/news/final_u...iscocom_outage

DavidMcCann 05-29-2017 12:37 PM

The latest from Reuters:
Quote:

BA Chief Executive Alex Cruz said the root of the problem … had been a power surge … It was so strong it also rendered the back-up systems ineffective.
This sort of thing is hardly uncommon: Lufthansa, AirFrance, and Delta have had similar, but smaller, crises in the last year.

The staff protests were about the out-sourcing of a data-centre, which is obviously unconnected. Knowing the GMB (General, Municipal, and Boiler-makers!) union, I can sympathise with BA not wanting them handing their customers! How's this for a conspiracy theory: GMB member sabotage?

Soadyheid 05-29-2017 05:16 PM

Just a few overall thoughts...

One of the problems being generated by globalisation is the acquisition and merger of large companies; airlines, banks,etc, into even larger ones. This involves attempts to rationalise their associated IT infrastructures (including staff, don't we know it!) to lower costs and increase profits. Invariably the rationalisation will involve some legacy equipment which defies upgrade attempts as it supports some invaluable application which was written in a language no longer taught and was only understood by programmers now retired or long dead.

The equipment itself, which has chugged along merrily for decades, has been superseded by newer and faster kit becomes harder, and more costly, to maintain. (Just finding spare parts for the irreplaceable 1960's ZXQ3000-1 surface-mounted-yogurt-maker is infinitely time consuming and pretty much pointless, even using EBay!)

Trying to "bolt" together the differing networks and make the resulting conglomeration of ill matched hardware and software function as the proverbial "Well oiled Machine" is a gigantic task which may now be beyond the actual human resources available. There seems to be a distinct lack of programmers, analysts and other IT staff in the world to go round!

Just to make things worse, we in the UK, have managed to add to this problem, by voting to leave the EU which is going to add to the resource depletion as untangling all the legal, trade, health, security, financial treaties, etc, to enable us to "Brexit" is going to be a nightmare. And the Politicians say it'll take only two years! Two years my a*se! It'll be nearer ten, and that's probably an underestimation.

Perhaps we should be more worried than we appear about the future, technology failures and our dependencies?

This problem was foreseen back in 1909 by E. M. Forster, The Machine Stops.

I'm now going to lie down quietly in a darkened room... ;)

Play Bonny!

:hattip:

hazel 05-30-2017 01:35 AM

Quote:

Originally Posted by Soadyheid (Post 5716690)
Invariably the rationalisation will involve some legacy equipment which defies upgrade attempts as it supports some invaluable application which was written in a language no longer taught and was only understood by programmers now retired or long dead.

I love it! There is a place somewhere in the US (I've forgotten the name) where they emulate all kinds of obsolete hardware and software. Apparently there's an awful lot of data stored on tape in obsolete formats which can no longer be read on modern machines.

sundialsvcs 05-30-2017 07:00 AM

I consider it most likely that this was a small-scale experimental attack on this business, at a most-critical time in its business year, and that it was a successful one.

Like many other corporations, BA "outsourced" its most-essential operation – its IT operation – to a "third-world third-party," with no more motivation than: "it would allow thousands of 'expensive British people' to be replaced by 'cheap third-world people,' hence 'more profits.'"

So, working very quietly from their out-of-the-country 'cheap' data centers, these 'cheap' out-of-the-country workers brought British Airways to its knees.

They could have – and still could – bring airliners crashing to the ground ... the full-scale equivalent of a terrorist hijacking ... from thousands of miles away, just by tampering with digital records. And, no one from Britain would know, because they'd all been laid off "to save money." :mad:

No one is watching the chicken-house. This attack was completely successful.

By steadfastly ignoring the human factor of IT ... by pretending somehow that we can trust machines and thus hire "anybody, anywhere (as long as they're 'cheap')" to run them, and that we can locate them "anywhere in our 'happy little cloud' (as long as labor and electrical power are cheap there)" ... we have set ourselves up worldwide for "devastating 'Acts of War.'"

We naively assume that every person anywhere is totally trustworthy (as long as s/he is 'cheap'), and then we profess to be 'mystified' when events like these occur. "What fools these Mortals be!" Do we not know that there are dark-hearted people out there who possess powers of imagination?

Soadyheid 05-30-2017 11:24 AM

Hmmm... Interesting theory, probably the same off-shored IT people who brought down the financial institutions in 2007 - 2008.

As airline disruptions go, I prefer this method, no IT involved! :D

Play Bonny!

:hattip:

hazel 06-01-2017 02:03 AM

The latest version on the BBC News this morning:

1) There was a power failure.
2) The backup power system didn't work so they had to reboot the servers.
3) The people who did the reboot screwed up. This could have been because they had outsourced their expertise to Tata as per Turbocapitalist.

Soadyheid 06-01-2017 06:06 AM

Quote:

1) There was a power failure.
In both Data Centres? Any global company worth their salt must have at least two Data Centres. Generally quite a distance apart (In case a BA plane crashes into one of them?)

Quote:

2) The backup power system didn't work so they had to reboot the servers.
I wonder how often they fail over to DR systems to check they work? Not often enough probably. Sometimes critical maintenance procedures like this just get risk assessed and somebody takes the gamble. Not worth it!

I've been in a Data Centre when the power failed, head stuck in a server rack, pitch black, scary as hell, then I noticed all the little red lights and the screaming audio warnings from the UPS' Yup, it was way back when all the servers were clustered, no VMs, each box had its own UPS. Luckily the power came back before the ten minutes UPS autonomy expired. Changed days!

Play Bonny!

:hattip:

hazel 06-01-2017 06:37 AM

It's an old story. People religiously do backups and store them away, then one day they get a data crash and need to restore from backup. And whaddyaknow! The backups are corrupt, probably have been for a long time. Because no one bothered to test them.

I bet they don't often test backup generators either.

sundialsvcs 06-01-2017 08:56 AM

Quote:

Originally Posted by hazel (Post 5717822)
It's an old story. People religiously do backups and store them away, then one day they get a data crash and need to restore from backup. And whaddyaknow! The backups are corrupt, probably have been for a long time. Because no one bothered to test them.

British Airways?! I hardly think so.

No, this was neither incompetence nor a power failure. If anyone looks closely, they will find it to be an act of sabotage. But they might not publicly disclose it and they might not even want to look.

One day soon, our industry will become regulated, like most other industries (electrical, mechanical engineering,construction, low-voltage lighting :rolleyes:) that affect people's daily lives. You will have to have a government-issued license to work with software, and there will be multiple levels (apprentice, journeyman, master) as there are in most professions.

We will stop treating "people" as merely "workers" when we recognize just what people in IT can (and, if unsupervised, will) do. We will stop looking so rosily at "the happy little Cloud" as though it somehow didn't matter where in the world your data centers are.

Yes, our "take" on our industry will abruptly become much more mature. It's only a matter of time. And, only a matter of just how many disruptive events like this one will we tolerate before we do something meaningful about it.

AnanthaP 06-03-2017 12:26 AM

In James Bond language, the government puts a D-notice
( https://en.wikipedia.org/wiki/DSMA-N...United_Kingdom ) on the affair since BA is TBF (too big to fail) so that we can all speculate and in the meanwhile, BA can go back to sleep without publishing the conclusions.

Next something for the conspirationistas (world domination perhaps)..HA HA
Tata consultancy is part of TATA SONS among whose other non Indian companies are CORUS (since divested back to British Government), JLR and Tetley Tea (largest tea company in UK and Canada and big in USA).

PS : Old money - TATAs were supported by ABN Amro, Rothschilds, and Deutsche Bank in their successful price war bid for buying CORUS.

OK

AnanthaP 06-03-2017 01:04 AM

Quote:

I love it! There is a place somewhere in the US (I've forgotten the name) where they emulate all kinds of obsolete hardware and software. ..
HA HA.

Isn't at the US military that still has some missile launch systems in obsolete systems?

OK

Turbocapitalist 06-03-2017 01:10 AM

Quote:

Originally Posted by AnanthaP (Post 5718561)
Isn't at the US military that still has some missile launch systems in obsolete systems?

No. They use the right tools for the job from what I read about it.

BA, in contrast, is using stuff proven the hard way not to work. Yet BA is already steering the investigation away from the equipment itself. It'll drag on until most in the public have forgotten and the press has moved on, then there will be some quiet written equivalents of mumbling, probably about staff. Even if it really was an issue of having a single point of failure that could be taken out by a single power supply, an implausible situation, they are still unlikely to look into the human factors that arranged such a single point of failure.

hazel 06-03-2017 01:29 AM

There's an urban myth about a hospital intensive care ward where one bed was reputed to be unlucky. Whoever was put into that bed died. Finally they put in cctv to try and find out what was going on. A cleaner came in at 3:00 in the morning, unplugged one of the IC machines, plugged in her hoover and swept the ward. Then she unplugged the hoover, plugged in the IC machine again and left.

chrism01 06-09-2017 02:37 AM

Not entirely unbelievable; I've actually worked in an office where that was happening ...

dave@burn-it.co.uk 06-09-2017 05:57 AM

So have I.
It took us months to work out why the machine was going down at about 3AM every night when there was supposed to be no-one there.
This was at a (supposedly) highly secure installation that was working on military and cutting edge stuff.


All times are GMT -5. The time now is 07:54 PM.