Michael Nygard on Building Resilient Systems

news2024/2/27 15:22:58

原文 @ InfoQ.

  • Feature Complete Software 和 Production Ready Software是不同的。而很多时候,开发人员不清楚Production下的情况,所以没有很好的考虑到在Production下运行的情况。例如,在开发环境下,Sever A和Server B的压力是 1:1的,但是在Production下有可能是20:1,那么这里对Server B就可能会出问题。这一点开发人员往往是不知道而忽略掉的。
Hi, my name is Ryan Slobojan. I'm here with Michael Nygard. Michael, what's the difference between feature complete software and production ready software? Is there a difference?
There is definitely a difference. In fact, I think to some extent, the only possible answer for that question is Mu. You have to unask the question. Feature completeness really tells us nothing at all about a software's ability to survive the real world of production. Feature complete tells us that that's past QA, which means that, by large, when I click this button, that label gets activated or when enter a date it's in proper format, it says nothing at all about whether the software will handle continuous traffic from millions of users 4 weeks at a time.
  • Circuit Breaker
  • Log很有用。监控的内容尽量和业务实现分开,因为监控的策略会经常变化。监控的很多配置项最好是可以动态配置的。

 Anywhere there is a pool definitely track who's blocking and how often, high water, low water and some stats about number of times things are being checked in and out. Other kind of health indicators: any place you've got a cache, keep track of how many items are in cache, what the hit rate is, what the eviction rate is; any place you've got the circuit breakers, keep track of how many times the circuit breakers are flipping from an open to a closed state or from closed to open, current state of all of them, of course, and the thresholds that are configured into it. Those are all useful things to expose through a monitoring and management interface.

It can also be useful to expose controls on these things - for instance, with the circuit breaker, a control to reset it; with a pool a control to change what the high water and low water mark will be. I can think of several cases where we've had an ongoing partial failure mode and we needed to go in and change the maximum number of connections in a connection pool and dial it down, so that the front end system would stop crushing the back end system. That's a very useful kind of control to have at runtime.

  • 有一些问题,如果在开发阶段解决的, 就会为产品维护节省很多费用。算了一笔账:对于一个访问量为100万的网站,如果每次页面请求多出来250毫秒,这不起眼的250,折合70个额外的计算时间,就需要4个服务器。而出去服务器的购买和维护费用,还有licence的费用,合同管理,还要投入人力维护这些服务器,接下来又涉及到这些维护人员的管理…… 像蝴蝶效应一样。

If we do that we will make the decision differently in some cases and we'll make the decision the same way in some cases. By that I mean we'll sometimes choose to incur that ongoing operational cost we'll sometimes choose to spend some additional development time to avoid the ongoing operations cost. One of the examples that I use when I talk about capacity is if you're handling say, web page requests and you have 1 million hits per day - 1 million hits per day is not all that large these days - and each one takes just an extra 250 milliseconds.

First of all, that's going to have an impact on your revenues, and companies like Google and Amazon have identified that very clearly, but secondly an extra 250 milliseconds on 1 million hits per day is about 70 hours of additional computing time, which means roughly you need 4 additional servers to handle the load. 4 additional servers draw power every month, they require administration every month, they may or may not require software licensing every month, they probably have support contracts. Once you get enough administrators, you need managers of administrators to keep the organization in check, so really, that 250 milliseconds per page that seems pretty small in development, translates into a pretty substantial ongoing operations cost.  

转载于:https://www.cnblogs.com/caff/archive/2010/04/10/1708907.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.dtcms.cn/news/show-830713.html

如若内容造成侵权/违法违规/事实不符,请联系七分地网进行投诉反馈,一经查实,立即删除!

相关文章

python开发_tkinter_单选按钮

这篇blog主要是描述python中tkinter的单选按钮操作 下面是我做的demo 运行效果: 代码部分: 1 from tkinter import *2 3 # This is a demo program that shows how to4 # create radio buttons and how to get other widgets to5 # share the informat…

多继承有什么坏处,为什么java搞单继承,接口为什么可以摈弃这些坏处

2019独角兽企业重金招聘Python工程师标准>>> 多继承虽然能使子类同时拥有多个父类的特征,但是其缺点也是很显著的,主要有两方面: (1)如果在一个子类继承的多个父类中拥有相同名字的实例变量,子类在引用该变量时将产生歧…

提取身份证信息-阶段1 图像处理

目标 为了实现pc端,提取一张拍摄的身份证照片中人物的信息,照片背景单一且为浅色,初步使用图像处理知识进行处理。 由浅入深,第一步,使用简单的图片,并且有针对性的对某幅图片进行针对性处理,得…

xfce下thunar启动慢解决

为什么80%的码农都做不了架构师?>>> sudo vim /usr/share/gvfs/mounts/network.mount 查找: AutoMounttrue 修改成: AutoMountfalse 转载于:https://my.oschina.net/walle/blog/162803

0100-Same Tree(相同的树)

这个系列算是出于个人兴趣开的一个新坑吧,最近看到同学刷LeetCode算法题,就想写写那些可以一行Python代码写出来的题目,因此本专栏的文章的解题方式效率不做保证,只为追求“一行的浪漫”。 题目 题解 给你两棵二叉树的根节点 p 和…

iphone数据存储之-- Core Data的使用(一)

转自:http://www.cnblogs.com/xiaodao/archive/2012/10/08/2715477.html 一、概念 1.Core Data 是数据持久化存储的最佳方式 2.数据最终的存储类型可以是:SQLite数据库,XML,二进制,内存里,或自定义数据类型…

redis scan 效率太慢_Redis 基础、高级特性与性能调优(下)

数据淘汰机制Redis提供了5种数据淘汰策略:volatile-lru:使用LRU算法进行数据淘汰(淘汰上次使用时间最早的,且使用次数最少的key),只淘汰设定了有效期的keyallkeys-lru:使用LRU算法进行数据淘汰,所有的key都…

理解WebKit和Chromium: Web应用和Web运行环境

转载请注明原文地址:http://blog.csdn.net/milado_nju注:鉴于这一领域非常热,自己也投身其中,会单独开辟一个专题介绍Web应用和Web运行环境。## 概述Web已经从web网页向web应用(web application)方向发展&a…

kafka 削峰_从面试角度一文学完 Kafka

Kafka 是一个优秀的分布式消息中间件,许多系统中都会使用到 Kafka 来做消息通信。对分布式消息系统的了解和使用几乎成为一个后台开发人员必备的技能。今天就从常见的 Kafka 面试题入手,和大家聊聊 Kafka 的那些事儿。思维导图讲一讲分布式消息中间件问题…

第二类斯特林数入门

推荐博客 : https://www.cnblogs.com/gzy-cjoier/p/8426987.html 转载于:https://www.cnblogs.com/ccut-ry/p/9510200.html

jQuery性能优化

2019独角兽企业重金招聘Python工程师标准>>> jQuery性能优化 现在越来越多的人应用jQuery了,有些同学在享受爽快淋漓coding时就将性能问题忽略了, 比如我. jquery虽在诸多的js类库中性能表现还算优秀, 但毕竟不是在用原生的javascript开发, 性能问题还是…

算法模板-双指针

简介 在很多数组问题中,双指针是一个反复被提及的解法。所谓双指针,指的是在对象遍历的过程中,并非单个指针进行访问,而是使用两个同向(快慢指针)或者反向(对撞指针)来进行扫描&…

python 控制qq_最必要的最小建议集:写给刚入门编程(python)的同学

写给谁刚准备入手学习python编程的大学生或者研究生。你为什么学python当你做实验(生物信息学,地理信息学,计量经济学,心理学,运筹学,图像处理,语音处理,信号处理,嵌入式…

Delphi XE5 for Android (十一)

以下内容是根据Delphi的帮助文件进行试验的,主要测试Android下的消息提醒。 首先建立一个空白的Android工程,然后在窗体中加入一个TNotificationCenter控件,如下图: 再在uses中引用文件,如下: usesFMX.Plat…

List、Array与ArrayList

2019独角兽企业重金招聘Python工程师标准>>> 数组在内存中是连续存储的,所以它的索引速度很快,而且赋值和修改元素也非常快,比如: string[] snew string[3]; //赋值s[0]"a"; s[1]"b"; s[2]"c…

int main(int argc,char * argv[]) windows 下的使用

通常对于初学C语言的同学来说,我们的main函数,都是没有形参的,那么这是怎么回事呢? 根据C语言规定,main函数的参数只能有两个,习惯上这两个参数为argc和argv,格式如下: int main(int…

牛客 - sequence(笛卡尔树+线段树)

题目链接:点击查看 题目大意:给出一个长度为 n 的数列 a 和数列 b ,求 题目分析:不算难的题目,对于每个 a[ i ] 求一下贡献然后维护最大值就好,具体思路就是,先找出每个 a[ i ] 左右两侧分别小…

ByteTrack实时多目标跟踪

去年的1024我写了一篇FairMOT实时多目标跟踪,兜兜转转,一年过去了,最近FairMOT原作者发布了更快更强的ByteTrack,也就有了这篇文章,有种恍如隔世之感。 简介 ByteTrack是近期公开的一个新的多目标跟踪SOTA方法&#x…

zookeeper的设计猜想-数据同步

接着上面那个结论再来思考,如果要满足这样的一个高性能集群,我们最直观的想法应该是,每个节点都能接收到请求,并且每个节点的数据都必须要保持一致。要实现各个节点的数据一致性,就势必要一个leader节点负责协调和数据…

Linux logrotate日志切割详解

1,对于Linux系统安全来说,日志文件是极其重要的工具。不知为何,我发现很多运维同学的服务器上都运行着一些诸如每天切分Nginx日志之类的CRON脚本,大家似乎遗忘了Logrotate,争相发明自己的轮子,这真是让人沮…