Dynamic Routing and Flexible Scheduling: Bella OpenAPI's Traffic Distribution Mechanism

Introduction: The Importance of Intelligent Traffic Distribution

When building an enterprise-level AI capability gateway, a core challenge is how to intelligently distribute traffic across multiple service providers, various models, and changing load conditions. This not only affects system performance and availability but also directly impacts user experience and operational costs. Bella OpenAPI has successfully addressed this challenge through its unique dynamic routing and flexible scheduling mechanism. Based on an in-depth analysis of Bella OpenAPI's source code, this article reveals the design principles and implementation details of its traffic distribution mechanism.

Multi-dimensional Routing Strategy

Through analysis of the core code in ChannelRouter.java, we find that Bella OpenAPI's routing decisions are based on comprehensive considerations across multiple dimensions:

1. Dual-entry Flexibility

Bella OpenAPI supports two routing entry methods:

Model-based: Users specify a particular AI model, and the system automatically selects an appropriate channel
Endpoint-based: Users specify a functional endpoint, and the system selects an appropriate channel based on the endpoint

This dual-entry design allows developers to choose the appropriate level of abstraction according to their needs: either focusing only on the required functionality (endpoint) or precisely specifying the desired model.

2. Multi-level Filtering Mechanism

During the routing process, the system applies a multi-level filtering mechanism through the filter method:

public ChannelDB route(String endpoint, String model, ApikeyInfo 
apikeyInfo, boolean isMock) {
 // Find available channels
 List<ChannelDB> channels;
 String entityCode;
 if(model != null) {
 String terminal = 
modelService.fetchTerminalModelName(model);
 entityCode = terminal;
 channels = 
channelService.listActives(EntityConstants.MODEL, terminal);
 } else {
 entityCode = endpoint;
 channels = 
channelService.listActives(EntityConstants.ENDPOINT, endpoint);
 }
 
 // Filter and select channels
 if(!isMock) {
 channels = filter(channels, entityCode, apikeyInfo);
 }
 channels = pickMaxPriority(channels);
 ChannelDB channel = random(channels);
 return isMock ? mockChannel(channel) : channel;
}

This filtering mechanism includes several key dimensions:

Visibility control: Private channels are only available to specific accounts
Security level matching: Matching based on data flow direction (registered, internal, domestic, overseas) and user security level
Channel health status: Automatically excluding currently unavailable channels
Trial limitations: Setting special traffic control rules for trial accounts

3. Priority Strategy

After filtering out available channels, the system selects the highest priority channels through the pickMaxPriority method:

private List<ChannelDB> pickMaxPriority(List<ChannelDB> channels) 
{
 List<ChannelDB> highest = new ArrayList<>();
 String curVisibility = EntityConstants.PUBLIC;
 String curPriority = EntityConstants.LOW;
 for (ChannelDB channel : channels) {
 String priority = channel.getPriority();
 String visibility = 
StringUtils.isNotBlank(channel.getVisibility()) ? 
channel.getVisibility() : EntityConstants.PUBLIC;
 int compare = compare(priority, curPriority, visibility, 
curVisibility);
 if(compare > 0) {
 highest.clear();
 curPriority = priority;
 curVisibility = visibility;
 }
 if(compare >= 0) {
 highest.add(channel);
 }
 }
 return highest;
}

This implements a composite priority strategy:

First considering the channel's visibility (private channels take precedence over public channels)
Then considering the channel's priority setting (high > medium > low)
For channels with the same priority, keeping all channels for subsequent load balancing

4. Load Balancing

After determining the list of highest priority channels, the system implements load balancing through a simple but effective random strategy:

private ChannelDB random(List<ChannelDB> list) {
 if(list.size() == 1) {
 return list.get(0);
 }
 int rand = random.nextInt(list.size());
 return list.get(rand);
}

This random selection ensures that multiple channels of the same priority receive traffic evenly, avoiding excessive pressure on any single point.

Real-time Monitoring and Dynamic Adjustment

Bella OpenAPI's traffic distribution is not just a set of static rules but is continuously optimized through real-time monitoring and dynamic adjustment. This is primarily implemented through two components: MetricsManager and LimiterManager.

1. Channel Health Monitoring

MetricsManager is responsible for collecting and analyzing the health status of various channels:

The system tracks several key metrics:

Error rates and too many requests (429 status code) situations
Number and proportion of completed requests
Endpoint-specific performance metrics (customized through resolvers)

When a channel's error rate exceeds the threshold or its performance significantly decreases, the system marks it as temporarily unavailable, implementing automatic circuit breaking.

2. Fine-grained Traffic Control

LimiterManager implements a Redis-based distributed rate limiting mechanism:

public void record(EndpointProcessData processData) throws 
IOException {
 // Calculate unavailable time
 int unavailableSeconds = resolver == null ? 0 : 
resolver.resolveUnavailableSeconds(processData);
 
 // Record various metrics
 metrics.add(minCompletedThreshold);
 metrics.add(errorRateThreshold);
 metrics.add(httpCode);
 metrics.add(unavailableSeconds);
 metrics.add(DateTimeUtils.getCurrentSeconds());
 metrics.add("errors");
 metrics.add(httpCode < 500 ? 0 : 1);
 metrics.add("request_too_many");
 metrics.add(httpCode == 429 ? 1 : 0);
 metrics.add("completed");
 metrics.add(1);
 
 // Execute metrics recording script
 executor.execute(processData.getEndpoint(), 
ScriptType.metrics, key, metrics);
}

This rate limiting mechanism provides control in two key dimensions:

RPM (Requests Per Minute): Controls request frequency in short periods
Concurrency control: Limits the number of simultaneous requests

Notably, the system sets stricter limits for trial accounts:

if(freeAkOverload(EndpointContext.getProcessData().getAkCode(), 
entityCode)) {
 throw new ChannelException.RateLimitException("Currently using trial quota, 
maximum of " + freeRpm + " requests per minute, and parallel requests 
cannot exceed " + freeConcurrent);
}

This tiered rate limiting strategy both protects system resources and implements differentiated service for different user types.

Data Security Features: Security Levels and Data Flow Direction

Bella OpenAPI's routing system also incorporates data security considerations, ensuring data compliance through matching security levels and data flow directions.

Scenario Three: Automatic Switching for Priority Channel Failures

When a high-priority channel fails, the system doesn't immediately downgrade to a low-priority channel but first tries other channels of the same priority. Only when all channels in a priority group are unavailable does it consider downgrading, ensuring service quality while enhancing system resilience.

Deep Technical Implementation Considerations

Bella OpenAPI's traffic distribution mechanism reflects several important considerations in its technical implementation:

1. Performance First

Through the combination of Redis and Lua scripts, it achieves high-performance distributed rate limiting and metrics collection:

executor.execute("/rpm", ScriptType.limiter, keys, params);
executor.execute(processData.getEndpoint(), ScriptType.metrics, 
key, metrics);

This design avoids multiple network I/O operations, significantly improving processing efficiency.

2. Fault Isolation

The system manages and monitors channels for different endpoints and different models independently, ensuring that a failure in one component does not affect the entire system. Only truly unavailable channels are isolated, while other channels continue to provide normal service.

3. Flexibility

The system allows for immediate adjustment of routing strategies in the production environment by simply modifying a channel's priority attributes:

This design enables operations teams to implement routing strategy adjustments through simple configuration changes (rather than code modifications) based on actual production environment needs:

When a specific model is under high load, its priority can be temporarily lowered to direct traffic to other models
During the initial period of a new channel's launch, a lower priority can be set for gradual testing
During specific business peak periods, dedicated channel priorities can be increased to ensure critical business gets resources first

This flexibility is crucial for managing large-scale production environments, allowing the system to adapt to changing business needs without downtime.

4. Scalability

Through the combination of MetricsManager and custom Lua scripts, the system supports customized monitoring metrics for different channels:

This design allows:

Defining specific health metrics for different types of AI services (such as focusing on latency for voice services and throughput for text services)
Implementing complex metric calculation logic through custom Lua scripts without modifying Java code
Customizing monitoring thresholds and unavailability determination rules based on specific service provider characteristics

For example, for channels prone to 429 (too many requests) errors, specific Lua scripts can be written to implement more sensitive overload detection; while for channels with occasional high latency but overall stability, more tolerant availability determination logic can be designed.

This highly customized monitoring mechanism ensures that the system can accurately identify the health status of various types of services, providing accurate bases for dynamic routing decisions.

Through these two aspects of scalable design, Bella OpenAPI achieves flexible adaptation to various complex production environment needs without modifying the core code, greatly reducing operational costs and risks.

Conclusion: The Art of Balance

Bella OpenAPI's dynamic routing and flexible scheduling mechanism is essentially an art of balance, seeking the optimal balance across multiple dimensions:

Balance between service quality and system load
Balance between user experience and cost control
Balance between functional richness and system complexity
Balance between security compliance and open convenience

Through carefully designed multi-level routing strategies, real-time health monitoring, and intelligent rate limiting mechanisms, Bella OpenAPI has successfully built a powerful yet flexible traffic distribution system capable of supporting 150 million API calls daily, providing a solid foundation for enterprise-level AI applications.

For teams planning to build their own AI capability gateway, Bella OpenAPI's routing design provides a battle-tested architectural reference.

If you are interested in Bella OpenAPI's dynamic routing and traffic distribution mechanism, please visit the GitHub repository to study its implementation details in depth or experience the power of this mechanism firsthand through the online experience version.

Introduction: The Importance of Intelligent Traffic Distribution​

Multi-dimensional Routing Strategy​

1. Dual-entry Flexibility​

2. Multi-level Filtering Mechanism​

3. Priority Strategy​

4. Load Balancing​

Real-time Monitoring and Dynamic Adjustment​

1. Channel Health Monitoring​

2. Fine-grained Traffic Control​

Data Security Features: Security Levels and Data Flow Direction​

Scenario Three: Automatic Switching for Priority Channel Failures​

Deep Technical Implementation Considerations​

1. Performance First​

2. Fault Isolation​

3. Flexibility​

4. Scalability​

Conclusion: The Art of Balance​