Before explaining Task Manager, introduce the notional term that a few Task Manager can use first here.
In graph database Nebula Graph, put in the errand that a few long-term backstage run, we say for Job. The part that stores the DBA of an existence is used dictates, for instance: Data is finished after guiding, want to do Compaction in overall situation, it is Job category.
As a distributed system, the Job in Nebula Graph is finished by different Storaged, and the Job that our canal runs on a Storaged child the task is called Task. The control of Job is in charge of by the Job Manager on Metad, and the control of Task is in charge of by the Task Manager on Storaged.
In the article, our move tells about the Task that how costs to growing to undertake again management and attemper function of farther promotion database.
Task Manager wants settlement problem
The Task of Task Manager pilot on Storaged of above paragraphs respecting is Meta pilot Job child the task, is itself of that Task Manager specific what problem to solve? In Nebula Graph Task Manager basically solved the following 2 problems:
- Will before the RPC of deferent means instead that passes HTTP (Thrift) average user is in when building group, know communication uses Thrift agreement between Storaged, can open firewall to what Thrift needs port, but consciousness is less than the likelihood Nebula Graph still needs to use HTTP port, we had encountered community user for many times to carry out the thing that forgets open HTTP port.
- Storaged has to Task attemper ability this content will be paragraphic below the article exhibit begin lecturing to narrate.
Task Manager is in the position in Nebula Graph
The Meta in Task Manager system
In Task Manager system, ? ? Metad (JobManager) the task is a Job Request that comes over according to be being passed in Graphd, single out corresponding Storaged Host, spell group of Storaged that give Task Request to send correspondence. Discover not hard, the Meta in the system accepts Job Request, go all out group? Task Request, ? Send Task Request to reach accept Task to return a result, it is stable that these cover a region logically. And how to spell group TaskRequest, will? Task Request? Send what Storaged to be met change somewhat according to different Job. JobManager is used
Mo Ban Ce Lue ? +
Jian Dan Gong Chang ? Expand in order to answer to what did not come.
The Job that lets future accedes likewise at MetaJobExecutor, implement Prepare() and Execute() method can.
Of Task Manager attemper control
Mention before, of Task Manager attemper control hopes to accomplish at 2 o\’clock:
- When systematic natural resources is sufficient, as far as possible tall intercurrent executive Task
- When systematic natural resources is critical, the resource that the Task in letting all moving takes up does not exceed the threshold value of certain set.
Tall intercurrent executive Task
Task Manager is what he holds some line Cheng to say in systematic natural resources Worker. Task Manager has the imitate prototype in a reality — the postal service hall of the bank. Imagine, the meeting when we go to a bank doing business has the following situations:
- Setting 1: An order is obtained in the machine discharging date of the doorway
- Setting 2: A seat seeks in the hall, the edge plays mobile phone edge to wait make bugle call
- Setting 3: Wait when making bugle call, to appoint the window to deal with
In the meantime, you still can encounter such and such problem:
- Setting 4: VIP is OK jump the queue
- Setting 5: You may discharge a team, because of certain reason, quited this second professional work
- Setting 6: You may discharge a platoon to wear team, the bank closed
So, arrange, this namely the main demand of Task Manager
- Task carries out by FIFO order: Different Task has different first step, of fast first step can jump the queue
- The user can cancel a Task in queueing up
- Storaged at any time Shutdown
- A Task, to make its erupt simultaneously high as far as possible, can be by fractionation many SubTask, subTask is the mission that every Worker executes truly
- Task Manager is overall situation exclusive example, want to consider multi-line Cheng security
Then, had following implementation:
- Come true 1: With the JobId in Thrift structure and TaskId, decide a Task, call Task Handle.
- Come true 2: TaskManager can have a Blocking Queue, the Handle that is in charge of letting Task queues up to carry out (the machine that discharge date) , and Cheng of line of Blocking Queue itself is safe.
- Come true 3: Blocking Queue supports different first step at the same time, fast first step gives a group first (the function of VIP jump the queue) .
- Come true 4: Task Manager maintains the Map with an unique overall situation, key is Task Handle, value is specific Task (the hall of the bank) . The Concurrent Hash Map of Folly was used in Nebula Graph, the Map of line Cheng safety.
- Come true 5: If have user Cancel Task, direct corresponding in finding Map according to Handle Task, label Cancel, to Queue medium Handle does not do processing.
- Come true 6: If have the Task that moving, meet to the Shutdown of Storaged when the SubTask that this Task is carrying out carries out ending ability to return.
The resource threshold value that demarcate Task takes up
Guarantee against exceeds threshold value still is very simple, because Worker is line Cheng, should let all Worker only out pool of Cheng of a line, can make sure the biggest Worker is counted. Troublesome is will child the task allocates Worker on average in, we will discuss lower part case:
Method one: Use Round-robin adds the job
The simplest method is to use the means of Round-robin to add the job. Decompose Task namely after be Sub Task, ordinal in increasing each present Worker.
But may have a problem, e.g. , I have 3 Worker, 2? Task (blue is Task 1, yellow is Task 2) :
Round-robin pursues 1
If the Sub Task in Task 2 is carried out far fast at Task1, so good collateral strategy should be such:
Round-robin pursues 2
The finishing time that simple and crude Round-robin can allow Task 2 relies on Task 1 (see Round-robin graph 1) .
Method 2: A group of Worker handle a Task
Be aimed at the circumstance that the method may appear, the Worker with special set handles designation Task only, avoid many Task interdependent problem thereby. But still not quite good, e.g. :
Make sure executive time of every Sub Task is basic and same very hard, the execution of hypothesis Sub Task 1 is apparently slow at other Sub Task, so good executive strategy should be such:
It is difficult that this plan still cannot avoid 1 nucleus to have, 10 nucleuses are surrounded view problem