Reliability of Computer Systems and Networks

Fault Tolerance, Analysis, and Design

Author: Martin L. Shooman

Publisher: John Wiley & Sons

ISBN: 0471464066

Category: Technology & Engineering

Page: 552

View: 4906

With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks. Market: Systems and Networking Engineers, Computer Programmers, IT Professionals.

Computing System Reliability

Models and Analysis

Author: Min Xie,Kim-Leng Poh,Yuan-Shun Dai

Publisher: Springer Science & Business Media

ISBN: 0306486369

Category: Mathematics

Page: 293

View: 1292

Computing systems are of growing importance because of their wide use in many areas including those in safety-critical systems. This book describes the basic models and approaches to the reliability analysis of such systems. An extensive review is provided and models are categorized into different types. Some Markov models are extended to the analysis of some specific computing systems such as combined software and hardware, imperfect debugging processes, failure correlation, multi-state systems, heterogeneous subsystems, etc. One of the aims of the presentation is that based on the sound analysis and simplicity of the approaches, the use of Markov models can be better implemented in the computing system reliability.

Computer System Reliability

Safety and Usability

Author: B.S. Dhillon

Publisher: CRC Press

ISBN: 1466573139

Category: Computers

Page: 249

View: 8754

Computer systems have become an important element of the world economy, with billions of dollars spent each year on development, manufacture, operation, and maintenance. Combining coverage of computer system reliability, safety, usability, and other related topics into a single volume, Computer System Reliability: Safety and Usability eliminates the need to consult many different and diverse sources in the hunt for the information required to design better computer systems. After presenting introductory aspects of computer system reliability such as safety, usability-related facts and figures, terms and definitions, and sources for obtaining useful information on computer system reliability, safety, and usability, the book: Reviews mathematical concepts considered useful to understanding subsequent chapters Presents various introductory aspects of reliability, safety, and usability and computer system reliability basics Covers software reliability assessment and improvement methods Discusses important aspects of software quality and human error and software bugs in computer systems Highlights software safety and Internet reliability Details important aspects of software usability including the need for considering usability during the software development phase, software usability engineering process, software usability inspection methods, software usability test methods, and guidelines for conducting software usability testing Elucidates web usability facts and figures, common design errors, web page design, tools for evaluating web usability, and questions to evaluate website message communication effectiveness Examines important aspects of computer system life cycle costing Written by systems reliability expert B.S. Dhillon, the book is accessible to all levels of readership, making it useful to beginners and seasoned professionals alike. Reflecting practical trends in computer engineering especially in the area of software, Dhillon emphasizes the importance of usability in software systems and expands reliability to web usability and management. It provides methods for designing systems with increased reliability, safety, and usability.

Computing System Reliability: Models and Analysis

Author: Min Xie,Yuan-Shun Dai,Kim-Leng Poh

Publisher: Springer Science & Business Media

ISBN: 030648496X

Category: Computers

Page: 293

View: 1605

Computing systems are of growing importance because of their wide use in many areas including those in safety-critical systems. This book describes the basic models and approaches to the reliability analysis of such systems. An extensive review is provided and models are categorized into different types. Some Markov models are extended to the analysis of some specific computing systems such as combined software and hardware, imperfect debugging processes, failure correlation, multi-state systems, heterogeneous subsystems, etc. One of the aims of the presentation is that based on the sound analysis and simplicity of the approaches, the use of Markov models can be better implemented in the computing system reliability.

Reliability in Computer System Design

Author: B. S. Dhillon

Publisher: Intellect Books

ISBN: 9780893914127

Category: Computers

Page: 282

View: 5483

This volume covers wide areas of interest such as life cycle costing, microcomputers, common-cause failures and space computers. Every effort is made to present difficult material with the aid of an example along with its solution. The material covered is summarized at the end of each chapter. The information is written in a format that allows readers to learn and better understand the philosophy of reliability in computer system design. At the same time, it tests their comprehension through listed exercises.

Reliable Computer Systems

Design and Evaluatuion

Author: Daniel Siewiorek,Robert Swarz

Publisher: Digital Press

ISBN: 1483297438

Category: Computers

Page: 908

View: 5752

Enhance your hardware/software reliability Enhancement of system reliability has been a major concern of computer users and designers ¦ and this major revision of the 1982 classic meets users' continuing need for practical information on this pressing topic. Included are case studies of reliable systems from manufacturers such as Tandem, Stratus, IBM, and Digital, as well as coverage of special systems such as the Galileo Orbiter fault protection system and AT&T telephone switching processors.

Design for Reliability

Information and Computer-Based Systems

Author: Eric Bauer

Publisher: John Wiley & Sons

ISBN: 9781118075081

Category: Computers

Page: 325

View: 2753

System reliability, availability and robustness are often not well understood by system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a concrete, functional attribute of a system, rather than an abstract, non-functional notion.

Performance and Reliability Analysis of Computer Systems

An Example-Based Approach Using the SHARPE Software Package

Author: Robin A. Sahner,Kishor Trivedi,Antonio Puliafito

Publisher: Springer Science & Business Media

ISBN: 1461523672

Category: Computers

Page: 404

View: 4764

Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package provides a variety of probabilistic, discrete-state models used to assess the reliability and performance of computer and communication systems. The models included are combinatorial reliability models (reliability block diagrams, fault trees and reliability graphs), directed, acyclic task precedence graphs, Markov and semi-Markov models (including Markov reward models), product-form queueing networks and generalized stochastic Petri nets. A practical approach to system modeling is followed; all of the examples described are solved and analyzed using the SHARPE tool. In structuring the book, the authors have been careful to provide the reader with a methodological approach to analytical modeling techniques. These techniques are not seen as alternatives but rather as an integral part of a single process of assessment which, by hierarchically combining results from different kinds of models, makes it possible to use state-space methods for those parts of a system that require them and non-state-space methods for the more well-behaved parts of the system. The SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) package is the `toolchest' that allows the authors to specify stochastic models easily and solve them quickly, adopting model hierarchies and very efficient solution techniques. All the models described in the book are specified and solved using the SHARPE language; its syntax is described and the source code of almost all the examples discussed is provided. Audience: Suitable for use in advanced level courses covering reliability and performance of computer and communications systems and by researchers and practicing engineers whose work involves modeling of system performance and reliability.

Beyond Redundancy

How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems

Author: Eric Bauer,Randee Adams,Daniel Eustace

Publisher: John Wiley & Sons

ISBN: 9781118104934

Category: Computers

Page: 336

View: 557

How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-Based Systems Enterprises make significant investments in geographically redundant systems to mitigate the very unlikely risk of a natural or man-made disaster rendering their primary site inaccessible or destroying it completely. While geographic redundancy has obvious benefits for disaster recovery, it is far less obvious what benefit georedundancy offers for more common hardware, software, and human failures. Beyond Redundancy provides both a theoretical and practical treatment of the feasible and likely benefits from geographic redundancy for both service availability and service reliability. The book is organized into three sections: Basics provides the necessary background on georedundancy and service availability Modeling and Analysis of Redundancy gives the technical and mathematical details of service availability modeling of georedundant configurations Recommendations offers specific recommendations on architecture, requirements, design, testing, and analysis of georedundant configurations A complete georedundant case study is included to illustrate the recommendations. The book considers both georedundant systems and georedundant solutions. The text also provides a general discussion about the capital expense/operating expense tradeoff that frames system redundancy and georedundancy. These added features make Beyond Redundancy an invaluable resource for network/system planners, IS/IT personnel, system architects, system engineers, developers, testers, and disaster recovery/business continuity consultants and planners.

Reliable Computer Systems

Collected Papers of the Newcastle Reliability Project

Author: Santosh Shrivastava

Publisher: Springer Science & Business Media

ISBN: 3642824706

Category: Computers

Page: 580

View: 7454

A research project to investigate the design and construction of reliable computing systems was initiated by B. Randell at the University of Newcastle upon Tyne in 1972. In over ten years of research on system reliability, a substantial number of papers have been produced by the members of this project. These papers have appeared in a variety of journals and conference proceedings and it is hoped that this book will prove to be a convenient reference volume for research workers active in this important area. In selecting papers published by past and present members of this project, I have used the following criteria: a paper is selected if it is concerned with fault tolerance and is not a review paper and was published before 1983. I have used these criteria (with only one or two exceptions!) in order to present a collection of papers with a common theme and, at the same time, to limit the size of the book to a reasonable length. The papers have been grouped into seven chapters. The first chapter introduces fundamental concepts of fault tolerance and ends with the earliest Newcastle paper on reliability. The project perhaps became well known after the invention of recovery blocks - a simple yet effective means of incorporating fault tolerance in software. The second chapter contains papers on recovery blocks, starting with the paper which first introduced the concept.

System Reliability Theory

Models and Statistical Methods

Author: Arnljot H?yland,Marvin Rausand

Publisher: John Wiley & Sons

ISBN: 0470317744

Category: Technology & Engineering

Page: 536

View: 6524

A comprehensive introduction to reliability analysis. The first section provides a thorough but elementary prologue to reliability theory. The latter half comprises more advanced analytical tools including Markov processes, renewal theory, life data analysis, accelerated life testing and Bayesian reliability analysis. Features numerous worked examples. Each chapter concludes with a selection of problems plus additional material on applications.

Computer System Reliability

Safety and Usability

Author: B.S. Dhillon

Publisher: CRC Press

ISBN: 1466573139

Category: Computers

Page: 249

View: 3181

Computer systems have become an important element of the world economy, with billions of dollars spent each year on development, manufacture, operation, and maintenance. Combining coverage of computer system reliability, safety, usability, and other related topics into a single volume, Computer System Reliability: Safety and Usability eliminates the need to consult many different and diverse sources in the hunt for the information required to design better computer systems. After presenting introductory aspects of computer system reliability such as safety, usability-related facts and figures, terms and definitions, and sources for obtaining useful information on computer system reliability, safety, and usability, the book: Reviews mathematical concepts considered useful to understanding subsequent chapters Presents various introductory aspects of reliability, safety, and usability and computer system reliability basics Covers software reliability assessment and improvement methods Discusses important aspects of software quality and human error and software bugs in computer systems Highlights software safety and Internet reliability Details important aspects of software usability including the need for considering usability during the software development phase, software usability engineering process, software usability inspection methods, software usability test methods, and guidelines for conducting software usability testing Elucidates web usability facts and figures, common design errors, web page design, tools for evaluating web usability, and questions to evaluate website message communication effectiveness Examines important aspects of computer system life cycle costing Written by systems reliability expert B.S. Dhillon, the book is accessible to all levels of readership, making it useful to beginners and seasoned professionals alike. Reflecting practical trends in computer engineering especially in the area of software, Dhillon emphasizes the importance of usability in software systems and expands reliability to web usability and management. It provides methods for designing systems with increased reliability, safety, and usability.

Reliability Modeling With Computer And Maintenance Applications

Author: Nakamura Syouji,Qian Cun Hua,Nakagawa Toshio

Publisher: #N/A

ISBN: 9813224517

Category: Technology & Engineering

Page: 396

View: 5578

The development of Reliability and Maintenance theory and applications has become major concerns of engineers and managers engaged in order to design and product systems that are highly reliable. This book aims to cover the ongoing research topics in computer system, reliability analysis, reliability applications and maintenance policies, so as to provide awareness for those who engage systems design, being students, technicians, or research engineers, as a reference guidebook.

Systems Reliability and Failure Prevention

Author: Herbert Hecht

Publisher: Artech House

ISBN: 9781580537957

Category: Technology & Engineering

Page: 230

View: 9729

This timely resource offers you a comprehensive, unified treatment of the techniques and practice of systems reliability and failure prevention, without the use of advanced mathematics. Featuring numerous, in-depth real-world examples, the book distills the authorOCOs many years of practical experience in designing and testing critical systems. The book helps you set reliability requirements for a new product, monitor compliance with these requirements during development and later life cycle phases, account for software failures in an integrated reliability assessment, and allocate a fixed reliability improvement budget to guide decisions by cost considerations and trade-offs."

Computer Systems Reliability

Author: Tom Anderson,Brian Randell

Publisher: CUP Archive

ISBN: 9780521227674

Category: Computers

Page: 482

View: 1955

Mathematical Models for Systems Reliability

Author: Benjamin Epstein,Ishay Weissman

Publisher: CRC Press

ISBN: 9781420080834

Category: Mathematics

Page: 272

View: 7913

Evolved from the lectures of a recognized pioneer in developing the theory of reliability, Mathematical Models for Systems Reliability provides a rigorous treatment of the required probability background for understanding reliability theory. This classroom-tested text begins by discussing the Poisson process and its associated probability laws. It then uses a number of stochastic models to provide a framework for life length distributions and presents formal rules for computing the reliability of nonrepairable systems that possess commonly occurring structures. The next two chapters explore the stochastic behavior over time of one- and two-unit repairable systems. After covering general continuous-time Markov chains, pure birth and death processes, and transitions and rates diagrams, the authors consider first passage-time problems in the context of systems reliability. The final chapters show how certain techniques can be applied to a variety of reliability problems. Illustrating the models and methods with a host of examples, this book offers a sound introduction to mathematical probabilistic models and lucidly explores how they are used in systems reliability problems.

Robust Communications Software

Extreme Availability, Reliability and Scalability for Carrier-Grade Systems

Author: Greg Utas

Publisher: John Wiley & Sons

ISBN: 0470011785

Category: Technology & Engineering

Page: 352

View: 4652

Learn how to design scalable, robust software for cutting-edge communications productsâ?¦ Carrier-grade software must satisfy the stringent quality requirements of network operators whose systems provide mission-critical communications services. This book describes proven carrier-grade software techniques used in flagship products designed by industry leaders such as Lucent, Nortel, and Ericsson. In the age of 24/7, software robustness is a competitive advantage. This authoritative guide for software engineers, managers, and testers of products that face carrier-grade requirements helps you to develop state-of-the-art software that will give you an edge in todayâ??s marketplace. Robust Communications Software: Extreme Availability, Reliability and Scalability for Carrier-Grade Systems offers advice on choosing the right technologies for building reliable software incorporates real-world examples and design rationales when describing how to construct robust, embedded software for communications systems presents a comprehensive set of carrier-grade design patterns that help you to meet extreme availability, reliability, scalability, and capacity requirements gives advice on how to protect against and recover from software faults discusses system installation, operability, maintenance, and on-site debugging

Safety of Computer Control Systems

Proceedings of the IFAC Workshop, Stuttgart, Federal Republic of Germany, 16-18 May 1979

Author: R. Lauber

Publisher: Elsevier

ISBN: 1483153754

Category: Technology & Engineering

Page: 230

View: 9255

Safety of Computer Control Systems is a collection of papers from the Proceedings of the IFAC Workshop, held in Stuttgart, Germany on May 16-18, 1979. This book discusses the inherent problems in the hardware and software application of computerized control to automated systems safeguarding human life, property, and the environment. The papers discuss more specific concerns, such as railway systems, aircraft landing systems, nuclear power stations, chemical reactors, elevators, and cranes. The book also describes the safety and reliability of complex industrial computer systems together with an example showing the application of computers in power plants. One paper presents guidelines in documenting safety related computer systems that will help various parties who are involved in their purchase and operation. Another paper discusses how to detect failures in microcomputer systems such as memory violations and invalid operation code detectors. This book then concludes by discussing the necessity of inspecting process computers used in nuclear power plants, especially when computers are used in reactor protection, control rod, and authentication of log-in systems. This collection can be of interest for students of programming, process-computer analysts, heads of computer technology departments and institutions, and lecturers in industrial computer programming and design.

Engineering Systems Reliability, Safety, and Maintenance

An Integrated Approach

Author: B.S. Dhillon

Publisher: CRC Press

ISBN: 1351662708

Category: Technology & Engineering

Page: 298

View: 446

Today, engineering systems are an important element of the world economy and each year billions of dollars are spent to develop, manufacture, operate, and maintain various types of engineering systems around the globe. Many of these systems are highly sophisticated and contain millions of parts. For example, a Boeing jumbo 747 is made up of approximately 4.5 million parts including fasteners. Needless to say, reliability, safety, and maintenance of systems such as this have become more important than ever before.? Global competition and other factors are forcing manufacturers to produce highly reliable, safe, and maintainable engineering products. Therefore, there is a definite need for the reliability, safety, and maintenance professionals to work closely during design and other phases. Engineering Systems Reliability, Safety, and Maintenance: An Integrated Approach eliminates the need to consult many different and diverse sources in the hunt for the information required to design better engineering systems.