源码阅读之storm操作zookeeper-cluster.clj

2023-09-23 06:36:06 241

storm操作zookeeper的主要函数都定义在命名空间backtype.storm.cluster中（即cluster.clj文件中）。backtype.storm.cluster定义了两个重要protocol：ClusterState和StormClusterState。

clojure中的protocol可以看成java中的接口，封装了一组方法。ClusterState协议中封装了一组与zookeeper进行交互的基础函数，如获取子节点函数，获取子节点数据函数等，ClusterState协议定义如下：

ClusterState协议

(defprotocolClusterState
(set-ephemeral-node[thispathdata])
(delete-node[thispath])
(create-sequential[thispathdata])
;;ifnodedoesnotexist,createpersistentwiththisdata
(set-data[thispathdata])
(get-data[thispathwatch?])
(get-version[thispathwatch?])
(get-data-with-version[thispathwatch?])
(get-children[thispathwatch?])
(mkdirs[thispath])
(close[this])
(register[thiscallback])
(unregister[thisid]))

StormClusterState协议封装了一组storm与zookeeper进行交互的函数，可以将StormClusterState协议中的函数看成ClusterState协议中函数的"组合"。StormClusterState协议定义如下：

StormClusterState协议

(defprotocolStormClusterState
(assignments[thiscallback])
(assignment-info[thisstorm-idcallback])
(assignment-info-with-version[thisstorm-idcallback])
(assignment-version[thisstorm-idcallback])
(active-storms[this])
(storm-base[thisstorm-idcallback])
(get-worker-heartbeat[thisstorm-idnodeport])
(executor-beats[thisstorm-idexecutor->node+port])
(supervisors[thiscallback])
(supervisor-info[thissupervisor-id]);;returnsnilifdoesn'texist
(setup-heartbeats![thisstorm-id])
(teardown-heartbeats![thisstorm-id])
(teardown-topology-errors![thisstorm-id])
(heartbeat-storms[this])
(error-topologies[this])
(worker-heartbeat![thisstorm-idnodeportinfo])
(remove-worker-heartbeat![thisstorm-idnodeport])
(supervisor-heartbeat![thissupervisor-idinfo])
(activate-storm![thisstorm-idstorm-base])
(update-storm![thisstorm-idnew-elems])
(remove-storm-base![thisstorm-id])
(set-assignment![thisstorm-idinfo])
(remove-storm![thisstorm-id])
(report-error[thisstorm-idtask-idnodeporterror])
(errors[thisstorm-idtask-id])
(disconnect[this]))

命名空间backtype.storm.cluster除了定义ClusterState和StormClusterState这两个重要协议外，还定义了两个重要函数：mk-distributed-cluster-state和mk-storm-cluster-state。

mk-distributed-cluster-state函数如下：

该函数返回一个实现了ClusterState协议的对象，通过这个对象就可以与zookeeper进行交互了。

mk-distributed-cluster-state函数

(defnmk-distributed-cluster-state
;;conf绑定了storm.yaml中的配置信息，是一个map对象
[conf]
;;zk绑定一个zkclient，Storm使用CuratorFramework与Zookeeper进行交互
(let[zk(zk/mk-clientconf(confSTORM-ZOOKEEPER-SERVERS)(confSTORM-ZOOKEEPER-PORT):auth-confconf)]
;;创建storm集群在zookeeper上的根目录，默认值为/storm
(zk/mkdirszk(confSTORM-ZOOKEEPER-ROOT))
(.closezk))
;;callbacks绑定回调函数集合，是一个map对象
(let[callbacks(atom{})
;;active标示zookeeper集群状态
active(atomtrue)
;;zk重新绑定新的zkclient，该zkclient设置了watcher，这样当zookeeper集群的状态发生变化时，zkserver会给zkclient发送相应的event，zkclient设置的watcher会调用callbacks中相应回调函数来处理event
;;启动nimbus时，callbacks是一个空集合，所以nimbus端收到event后不会调用任何回调函数；但是启动supervisor时，callbacks中注册了回调函数，所以当supervisor收到zkserver发送的event后，会调用相应的回调函数
;;mk-client函数定义在zookeeper.clj文件中，请参见其定义部分
zk(zk/mk-clientconf
(confSTORM-ZOOKEEPER-SERVERS)
(confSTORM-ZOOKEEPER-PORT)
:auth-confconf
:root(confSTORM-ZOOKEEPER-ROOT)
;;:watcher绑定一个函数，指定zkclient的默认watcher函数，state标示当前zkclient的状态；type标示事件类型；path标示zookeeper上产生该事件的znode
;;该watcher函数主要功能就是执行callbacks集合中的函数，callbacks集合中的函数是在mk-storm-cluster-state函数中通过调用ClusterState的register函数添加的
:watcher(fn[statetypepath]
(when@active
(when-not(=:connectedstate)
(log-warn"Receivedevent"state":"type":"path"withdisconnectedZookeeper."))
(when-not(=:nonetype)
(doseq[callback(vals@callbacks)]
(callbacktypepath))))))]
;;reify相当于java中的implements，这里表示实现一个协议
(reify
ClusterState
;;register函数用于将回调函数加入callbacks中，key是一个32位的标识
(register
[thiscallback]
(let[id(uuid)]
(swap!callbacksassocidcallback)
id))
;;unregister函数用于将指定key的回调函数从callbacks中删除
(unregister
[thisid]
(swap!callbacksdissocid))
;;在zookeeper上添加一个临时节点
(set-ephemeral-node
[thispathdata]
(zk/mkdirszk(parent-pathpath))
(if(zk/existszkpathfalse)
(try-cause
(zk/set-datazkpathdata);shouldverifythatit'sephemeral
(catchKeeperException$NoNodeExceptione
(log-warn-errore"Ephemeralnodedisappearedbetweencheckingforexistingandsettingdata")
(zk/create-nodezkpathdata:ephemeral)
))
(zk/create-nodezkpathdata:ephemeral)))
;;在zookeeper上添加一个顺序节点
(create-sequential
[thispathdata]
(zk/create-nodezkpathdata:sequential))
;;修改某个节点数据
(set-data
[thispathdata]
;;note:thisdoesnotturnoffanyexistingwatches
(if(zk/existszkpathfalse)
(zk/set-datazkpathdata)
(do
(zk/mkdirszk(parent-pathpath))
(zk/create-nodezkpathdata:persistent))))
;;删除指定节点
(delete-node
[thispath]
(zk/delete-recursivezkpath))
;;获取指定节点数据。path标示节点路径；watch?是一个布尔类型值，表示是否需要对该节点进行"观察"，如果watch?=true，当调用set-data函数修改该节点数据后，
;;会给zkclient发送一个事件，zkclient接收事件后，会调用创建zkclient时指定的默认watcher函数（即:watcher绑定的函数）
(get-data
[thispathwatch?]
(zk/get-datazkpathwatch?))
;;与get-data函数的区别就是获取指定节点数据的同时，获取节点数据的version，version表示节点数据修改的次数
(get-data-with-version
[thispathwatch?]
(zk/get-data-with-versionzkpathwatch?))
;;获取指定节点的version，watch?的含义与get-data函数中的watch?相同
(get-version
[thispathwatch?]
(zk/get-versionzkpathwatch?))
;;获取指定节点的子节点列表，watch?的含义与get-data函数中的watch?相同
(get-children
[thispathwatch?]
(zk/get-childrenzkpathwatch?))
;;在zookeeper上创建一个节点
(mkdirs
[thispath]
(zk/mkdirszkpath))
;;关闭zkclient
(close
[this]
(reset!activefalse)
(.closezk)))))

mk-storm-cluster-state函数定义如下：

mk-storm-cluster-state函数非常重要，该函数返回一个实现了StormClusterState协议的实例，通过该实例storm就可以更加方便与zookeeper进行交互。

在启动nimbus和supervisor的函数中均调用了mk-storm-cluster-state函数。关于nimbus和supervisor的启动将在之后的文章中介绍。

mk-storm-cluster-state函数

(defnmk-storm-cluster-state
[cluster-state-spec]
;;satisfies?谓词相当于java中的instanceof，判断cluster-state-spec是不是ClusterState实例
(let[[solo?cluster-state](if(satisfies?ClusterStatecluster-state-spec)
[falsecluster-state-spec]
[true(mk-distributed-cluster-statecluster-state-spec)])
;;绑定topologyid->回调函数的map，当/assignments/{topologyid}数据发生变化时，zkclient执行assignment-info-callback中topologyid所对应的回调函数
assignment-info-callback(atom{})
;;assignment-info-with-version-callback与assignment-info-callback类似
assignment-info-with-version-callback(atom{})
;;assignment-version-callback与assignments-callback类似
assignment-version-callback(atom{})
;;当/supervisors标示的znode的子节点发生变化时，zkclient执行supervisors-callback指向的函数
supervisors-callback(atomnil)
;;当/assignments标示的znode的子节点发生变化时，zkclient执行assignments-callback指向的函数
assignments-callback(atomnil)
;;当/storms/{topologyid}标示的znode的数据发生变化时，zkclient执行storm-base-callback中topologyid所对应的回调函数
storm-base-callback(atom{})
;;register函数将"回调函数(fn...)"添加到cluster-state的callbacks集合中，并返回标示该回调函数的uuid
state-id(register
cluster-state
;;定义"回调函数"，type标示事件类型，path标示znode
(fn[typepath]
;;subtree绑定路径前缀如"assignments"、"storms"、"supervisors"等，args存放topologyid
(let[[subtree&args](tokenize-pathpath)]
;;condp相当于java中的switch
(condp=subtree
;;当subtree="assignments"时，如果args为空，说明是/assignments的子节点发生变化，执行assignments-callback指向的回调函数，否则
;;说明/assignments/{topologyid}标示的节点数据发生变化，执行assignment-info-callback指向的回调函数
ASSIGNMENTS-ROOT(if(empty?args)
(issue-callback!assignments-callback)
(issue-map-callback!assignment-info-callback(firstargs)))
;;当subtree="supervisors"时，说明是/supervisors的子节点发生变化，执行supervisors-callback指向的回调函数
SUPERVISORS-ROOT(issue-callback!supervisors-callback)
;;当subtree="storms"时，说明是/storms/{topologyid}标示的节点数据发生变化，执行storm-base-callback指向的回调函数
STORMS-ROOT(issue-map-callback!storm-base-callback(firstargs))
;;thisshouldneverhappen
(exit-process!30"Unknowncallbackforsubtree"subtreeargs)))))]
;;在zookeeper上创建storm运行topology所必需的znode
(doseq[p[ASSIGNMENTS-SUBTREESTORMS-SUBTREESUPERVISORS-SUBTREEWORKERBEATS-SUBTREEERRORS-SUBTREE]]
(mkdirscluster-statep))
;;返回一个实现StormClusterState协议的实例
(reify
StormClusterState
;;获取/assignments的子节点列表，如果callback不为空，将其赋值给assignments-callback，并对/assignments添加"节点观察"
(assignments
[thiscallback]
(whencallback
(reset!assignments-callbackcallback))
(get-childrencluster-stateASSIGNMENTS-SUBTREE(not-nil?callback)))
;;获取/assignments/{storm-id}节点数据，即storm-id的分配信息，如果callback不为空，将其添加到assignment-info-callback中，并对/assignments/{storm-id}添加"数据观察"
(assignment-info
[thisstorm-idcallback]
(whencallback
(swap!assignment-info-callbackassocstorm-idcallback))
(maybe-deserialize(get-datacluster-state(assignment-pathstorm-id)(not-nil?callback))))
;;获取/assignments/{storm-id}节点数据包括version信息，如果callback不为空，将其添加到assignment-info-with-version-callback中，并对/assignments/{storm-id}添加"数据观察"
(assignment-info-with-version
[thisstorm-idcallback]
(whencallback
(swap!assignment-info-with-version-callbackassocstorm-idcallback))
(let[{data:dataversion:version}
(get-data-with-versioncluster-state(assignment-pathstorm-id)(not-nil?callback))]
{:data(maybe-deserializedata)
:versionversion}))
;;获取/assignments/{storm-id}节点数据的version信息，如果callback不为空，将其添加到assignment-version-callback中，并对/assignments/{storm-id}添加"数据观察"
(assignment-version
[thisstorm-idcallback]
(whencallback
(swap!assignment-version-callbackassocstorm-idcallback))
(get-versioncluster-state(assignment-pathstorm-id)(not-nil?callback)))
;;获取storm集群中正在运行的topologyid即/storms的子节点列表
(active-storms
[this]
(get-childrencluster-stateSTORMS-SUBTREEfalse))
;;获取storm集群中所有有心跳的topologyid即/workerbeats的子节点列表
(heartbeat-storms
[this]
(get-childrencluster-stateWORKERBEATS-SUBTREEfalse))
;;获取所有有错误的topologyid即/errors的子节点列表
(error-topologies
[this]
(get-childrencluster-stateERRORS-SUBTREEfalse))
;;获取指定storm-id进程的心跳信息，即/workerbeats/{storm-id}/{node-port}节点数据
(get-worker-heartbeat
[thisstorm-idnodeport]
(->cluster-state
(get-data(workerbeat-pathstorm-idnodeport)false)
maybe-deserialize))
;;获取指定进程中所有线程的心跳信息
(executor-beats
[thisstorm-idexecutor->node+port]
;;needtotakeexecutor->node+portinexplicitlysothatwedon'trunintoasituationwherea
;;longdeadworkerwithaskewedclockoverridesallthetimestamps.Byonlycheckingheartbeats
;;withanassignednode+port,andonlyreadingexecutorsfromthatheartbeatthatareactuallyassigned,
;;weavoidsituationslikethat
(let[node+port->executors(reverse-mapexecutor->node+port)
all-heartbeats(for[[[nodeport]executors]node+port->executors]
(->>(get-worker-heartbeatthisstorm-idnodeport)
(convert-executor-beatsexecutors)
))]
(applymergeall-heartbeats)))
;;获取/supervisors的子节点列表，如果callback不为空，将其赋值给supervisors-callback，并对/supervisors添加"节点观察"
(supervisors
[thiscallback]
(whencallback
(reset!supervisors-callbackcallback))
(get-childrencluster-stateSUPERVISORS-SUBTREE(not-nil?callback)))
;;获取/supervisors/{supervisor-id}节点数据，即supervisor的心跳信息
(supervisor-info
[thissupervisor-id]
(maybe-deserialize(get-datacluster-state(supervisor-pathsupervisor-id)false)))
;;设置进程心跳信息
(worker-heartbeat!
[thisstorm-idnodeportinfo]
(set-datacluster-state(workerbeat-pathstorm-idnodeport)(Utils/serializeinfo)))
;;删除进程心跳信息
(remove-worker-heartbeat!
[thisstorm-idnodeport]
(delete-nodecluster-state(workerbeat-pathstorm-idnodeport)))
;;创建指定storm-id的topology的用于存放心跳信息的节点
(setup-heartbeats!
[thisstorm-id]
(mkdirscluster-state(workerbeat-storm-rootstorm-id)))
;;删除指定storm-id的topology的心跳信息节点
(teardown-heartbeats!
[thisstorm-id]
(try-cause
(delete-nodecluster-state(workerbeat-storm-rootstorm-id))
(catchKeeperExceptione
(log-warn-errore"Couldnotteardownheartbeatsfor"storm-id))))
;;删除指定storm-id的topology的错误信息节点
(teardown-topology-errors!
[thisstorm-id]
(try-cause
(delete-nodecluster-state(error-storm-rootstorm-id))
(catchKeeperExceptione
(log-warn-errore"Couldnotteardownerrorsfor"storm-id))))
;;创建临时节点存放supervisor的心跳信息
(supervisor-heartbeat!
[thissupervisor-idinfo]
(set-ephemeral-nodecluster-state(supervisor-pathsupervisor-id)(Utils/serializeinfo)))
;;创建/storms/{storm-id}节点
(activate-storm!
[thisstorm-idstorm-base]
(set-datacluster-state(storm-pathstorm-id)(Utils/serializestorm-base)))
;;更新topology对应的StormBase对象，即更新/storm/{storm-id}节点
(update-storm!
[thisstorm-idnew-elems]
;;base绑定storm-id在zookeeper上的StormBase对象
(let[base(storm-basethisstorm-idnil)
;;executors绑定component名称->组件并行度的map
executors(:component->executorsbase)
;;new-elems绑定合并后的组件并行度map，update函数将组件新并行度map合并到旧map中
new-elems(updatenew-elems:component->executors(partialmergeexecutors))]
;;更新StormBase对象中的组件并行度map，并写入zookeeper的/storms/{storm-id}节点
(set-datacluster-state(storm-pathstorm-id)
(->base
(mergenew-elems)
Utils/serialize))))
;;获取storm-id的StormBase对象，即读取/storms/{storm-id}节点数据，如果callback不为空，将其赋值给storm-base-callback，并为/storms/{storm-id}节点添加"数据观察"
(storm-base
[thisstorm-idcallback]
(whencallback
(swap!storm-base-callbackassocstorm-idcallback))
(maybe-deserialize(get-datacluster-state(storm-pathstorm-id)(not-nil?callback))))
;;删除storm-id的StormBase对象，即删除/storms/{storm-id}节点
(remove-storm-base!
[thisstorm-id]
(delete-nodecluster-state(storm-pathstorm-id)))
;;更新storm-id的分配信息，即更新/assignments/{storm-id}节点数据
(set-assignment!
[thisstorm-idinfo]
(set-datacluster-state(assignment-pathstorm-id)(Utils/serializeinfo)))
;;删除storm-id的分配信息，同时删除其StormBase信息，即删除/assignments/{storm-id}节点和/storms/{storm-id}节点
(remove-storm!
[thisstorm-id]
(delete-nodecluster-state(assignment-pathstorm-id))
(remove-storm-base!thisstorm-id))
;;将组件异常信息写入zookeeper
(report-error
[thisstorm-idcomponent-idnodeporterror]
;;path绑定"/errors/{storm-id}/{component-id}"
(let[path(error-pathstorm-idcomponent-id)
;;data绑定异常信息，包括异常时间、异常堆栈信息、主机和端口
data{:time-secs(current-time-secs):error(stringify-errorerror):hostnode:portport}
;;创建/errors/{storm-id}/{component-id}节点
_(mkdirscluster-statepath)
;;创建/errors/{storm-id}/{component-id}的子顺序节点，并写入异常信息
_(create-sequentialcluster-state(strpath"/e")(Utils/serializedata))
;;to-kill绑定除去顺序节点编号最大的前10个节点的剩余节点的集合
to-kill(->>(get-childrencluster-statepathfalse)
(sort-byparse-error-path)
reverse
(drop10))]
;;删除to-kill中包含的节点
(doseq[kto-kill]
(delete-nodecluster-state(strpath"/"k)))))
;;得到给定的storm-idcomponent-id下的异常信息
(errors
[thisstorm-idcomponent-id]
(let[path(error-pathstorm-idcomponent-id)
_(mkdirscluster-statepath)
children(get-childrencluster-statepathfalse)
errors(dofor[cchildren]
(let[data(->(get-datacluster-state(strpath"/"c)false)
maybe-deserialize)]
(whendata
(structTaskError(:errordata)(:time-secsdata)(:hostdata)(:portdata))
)))
]
(->>(filternot-nil?errors)
(sort-by(comp-:time-secs)))))
;;关闭连接，在关闭连接前，将回调函数从cluster-state的callbacks中删除
(disconnect
[this]
(unregistercluster-statestate-id)
(whensolo?
(closecluster-state))))))

zookeeper.clj中mk-client函数

mk-client函数创建一个CuratorFramework实例，为该实例注册了CuratorListener，当一个后台操作完成或者指定的watch被触发时将会执行CuratorListener中的eventReceived()。eventReceived中调用的wacher函数就是mk-distributed-cluster-state中:watcher绑定的函数。

(defnkmk-client
[confserversport
:root""
:watcherdefault-watcher
:auth-confnil]
(let[fk(Utils/newCuratorconfserversportroot(whenauth-conf(ZookeeperAuthInfo.auth-conf)))]
(..fk
(getCuratorListenable)
(addListener
(reifyCuratorListener
(^voideventReceived[this^CuratorFramework_fk^CuratorEvente]
(when(=(.getTypee)CuratorEventType/WATCHED)
(let[^WatchedEventevent(.getWatchedEvente)]
(watcher(zk-keeper-states(.getStateevent))
(zk-event-types(.getTypeevent))
(.getPathevent))))))))
(.startfk)
fk))

以上就是storm与zookeeper进行交互的源码分析，我觉得最重要的部分就是如何给zkclient添加"wacher"，storm的很多功能都是通过zookeeper的wacher机制实现的，如"分配信息领取"。添加"wacher"大概分为以下几个步骤：

mk-distributed-cluster-state函数创建了一个zkclient，并通过:watcher给该zkclient指定了"wacher"函数，这个"wacher"函数只是简单调用ClusterState的callbacks集合中的函数，这样这个"wacher"函数执行哪些函数将由ClusterState实例决定
ClusterState实例提供register函数来更新callbacks集合，ClusterState实例被传递给了mk-storm-cluster-state函数，在mk-storm-cluster-state中调用register添加了一个函数(fn[typepath]...)，这个函数实现了"watcher"函数的全部逻辑
mk-storm-cluster-state中注册的函数执行的具体内容由StormClusterState实例决定，对zookeeper节点添加"观察"也是通过StormClusterState实例实现的，这样我们就可以通过StormClusterState实例对我们感兴趣的节点添加"观察"和"回调函数"，当节点或节点数据发生变化后，zkserver就会给zkclient发送"通知"，zkclient中的"wather"函数将被调用，进而我们注册的"回到函数"将被执行。

总结

这部分源码与zookeeper联系十分紧密，涉及了很多zookeeper中的概念和特性，如"数据观察"和"节点观察"等，有关zookeeper的wacher机制请参考：https://www.nhooo.com/article/124295.htm，storm并没有直接使用zookeeper的api，而是使用Curator框架，Curator框架简化了访问zookeeper的操作。关于Curator框架请参考：https://www.nhooo.com/article/125785.htm。

以上就是本文关于源码阅读之storm操作zookeeper-cluster.clj的全部内容了，感兴趣的朋友可以参阅：zookeeperwatch机制的理解、apachezookeeper使用方法实例详解、为zookeeper配置相应的acl权限等，希望对大家有所帮助。感谢各位的阅读！

源码阅读之storm操作zookeeper-cluster.clj

热门推荐

随机推荐