RAM is used for storage of the table data and as a temporary space for result sets and views. The most important factor in sizing is determining the total storage required by the base tables. From base tables, you can add estimate for additional base tables based on requirements. The following calculation reflects total base storage requirements.
(Total Base Storage) = (Storage occupied during PoC) + (Additional Base Tables)
Based on the Total Base Storage calculated above, Kinetica recommends that additional 30% of the total system RAM to be reserved for analytics processing. Kinetica recommends to calculate total System RAM as
(Total System RAM) = (Total Base Storage) * 1.3
Furthermore, Kinetica recommends that you have between 500GB and 1024GB (1TB) of system RAM per server. The reason is because exceeding 1TB of RAM per server can reduce the overall memory bandwidth of the cluster.
500GB <= RAM per server <= 1024GB
Consequently, given RAM per server size you choose, the number of servers needed for the cluster is then determined by the RAM size per server and should be enough to host the total system RAM, as follows:
(Number of servers) >= (Total System RAM) / (RAM per server)
Configuring RAM tier limit
Configuration for RAM tier limit parameters are located in /opt/gpudb/core/etc/gpudb.conf file. The details are described in the following section.
Keep in mind that RAM tier limit in gpudb.conf file is configured per rank. Therefore, remember you are NOT setting RAM per server size as limit, but instead RAM per server size is divided evenly among number of ranks in the node, such that 70% of RAM per server size are divided by number of ranks per node. Configuration parameters are described in the following.
RAM default parameters:
- tier.ram.default.limit: maximum number of bytes of RAM that can be allocated for tiering. We recommend it is set to 70% of total memory divided by number of worker ranks in the node. Note: Additional setting per rank overrides the default setting.
- tier.ram.default.high_watermark: high watermark for the tier in percent. Default value is 90
- tier.ram.default.low_watermark: low watermark for the tier in percent. Default value is 50
RAM head rank (rank0) parameters:
- tier.ram.rank0.limit: maximum number of bytes of RAM that can be allocated for the head rank. We recommend the head rank receive 10% of available host memory of the head node.
RAM worker rank (rank1,2,...n) parameters
- tier.ram.rank1.limit: maximum number of bytes of RAM that can be allocated for rank1. We recommend it is set to 70% of total memory in the node divided by number of worker ranks in the node.
- tier.ram.rank1.high_watermark: high watermark for rank1. Default value follows what is set at default parameter.
- tier.ram.rank1.low_watermark: low watermark for rank1. Default value follows what is set at default parameter.
For example, suppose a cluster of 2 nodes, RAM per server size is 750GB. Number of worker rank is set as 3 worker ranks per node.
Based on example above, default RAM limit parameter is calculated as 750GB * 70% / 3 = 175GB, i.e. 187904819200 bytes.
Head rank RAM limit is 10% * 750GB = 75GB, i.e.
In this case, high watermark and low watermark are set at their default value.
Therefore, gpudb.conf file is set as follows. Note that the limit values are set in bytes. In the case that each worker rank value is the same as default value, we can omit RAM worker rank parameter.
tier.ram.default.limit = 187904819200
tier.ram.default.high_watermark = 90
tier.ram.default.low_watermark = 50
tier.ram.rank0.limit = 80530636800