Canopy 10.2

Get Started

  • What is Canopy?
    • How Far Can Canopy Scale?
  • When to Use Canopy
    • Multi-Tenant Database
    • Real-Time Analytics
    • Considerations for Use
    • When Canopy is Inappropriate
  • Quick Tutorials
    • Multi-tenant Applications
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries
    • Real-time Analytics
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries

Install

  • Single-Node Canopy
    • CentOS or Red Hat
  • Multi-Node Canopy
    • CentOS or Red Hat
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node

Use-Case Guides

  • Multi-tenant Applications
    • Let’s Make an App – Ad Analytics
    • Scaling the Relational Data Model
    • Preparing Tables and Ingesting Data
      • Try it Yourself
    • Integrating Applications
    • Sharing Data Between Tenants
    • Online Changes to the Schema
    • When Data Differs Across Tenants
    • Scaling Hardware Resources
    • Where to Go From Here
  • Real-Time Dashboards
    • Data Model
    • Rollups
    • Expiring Old Data
    • Approximate Distinct Counts
    • Unstructured Data with JSONB
  • Timeseries Data
    • Scaling Timeseries Data on Canopy
    • Automating Partition Creation
    • Archiving with Columnar Storage
      • Archiving a Row Partition to Columnar Storage

Architecture

  • Concepts
    • Nodes
      • Coordinator and Workers
    • Distributed Data
      • Table Types
        • Type 1: Distributed Tables
        • Type 2: Reference Tables
        • Type 3: Local Tables
      • Shards
        • Shard Placements
      • Co-Location
      • Parallelism
    • Query Execution

Develop

  • Determining Application Type
    • At a Glance
    • Examples and Characteristics
  • Choosing Distribution Column
    • Multi-Tenant Apps
      • Best Practices
    • Real-Time Apps
      • Best Practices
    • Timeseries Data
      • Best Practices
    • Table Co-Location
      • Data co-location in Canopy for hash-distributed tables
      • A practical example of co-location
      • Using Regular LightDB Tables
      • Distributing tables by ID
      • Distributing tables by tenant
      • Co-location means better feature support
      • Query Performance
  • Migrating an Existing App
    • Identify Distribution Strategy
      • Pick distribution key
      • Identify types of tables
    • Prepare Source Tables for Migration
      • Add distribution keys
      • Backfill newly created columns
    • Prepare Application for Canopy
      • Set up Development Canopy Cluster
        • Include distribution column in keys
      • Add distribution key to queries
        • Ruby on Rails
        • Django
        • ASP.NET
        • Other (SQL Principles)
      • Enable Secure Connections
      • Check for cross-node traffic
    • Migrate Production Data
      • Small Database Migration
  • SQL Reference
    • Creating and Modifying Distributed Tables (DDL)
      • Creating And Distributing Tables
        • Reference Tables
        • Distributing Coordinator Data
      • Co-Locating Tables
      • Dropping Tables
      • Modifying Tables
        • Adding/Modifying Columns
        • Adding/Removing Constraints
        • Using NOT VALID Constraints
        • Adding/Removing Indices
        • Manual Modification
    • Ingesting, Modifying Data (DML)
      • Inserting Data
        • “From Select” Clause (Distributed Rollups)
        • COPY Command (Bulk load)
    • Caching Aggregations with Rollups
      • Updates and Deletion
      • Maximizing Write Performance
    • Querying Distributed Tables (SQL)
      • Aggregate Functions
        • Count (Distinct) Aggregates
        • Estimating Top N Items
        • Percentile Calculations
      • Limit Pushdown
      • Views on Distributed Tables
      • Joins
        • Co-located joins
        • Reference table joins
        • Repartition joins
    • Query Processing
      • Distributed Query Planner
      • Distributed Query Executor
        • Subquery/CTE Push-Pull Execution
      • LightDB planner and executor
    • Manual Query Propagation
      • Running on all Workers
      • Running on all Shards
      • Running on all Placements
      • Limitations
    • SQL Support and Workarounds
      • Workarounds
        • Work around limitations using CTEs
        • Temp Tables: the Workaround of Last Resort
  • Canopy API
    • Canopy Utility Functions
      • Table and Shard DDL
        • create_distributed_table
        • truncate_local_data_after_distributing_table
        • undistribute_table
        • alter_distributed_table
        • alter_table_set_access_method
        • remove_local_tables_from_metadata
        • create_reference_table
        • mark_tables_colocated
        • update_distributed_table_colocation
        • create_distributed_function
        • master_create_empty_shard
        • alter_columnar_table_set
        • create_time_partitions
        • drop_old_time_partitions
        • alter_old_partitions_set_access_method
      • Table and Shard DML
        • master_append_table_to_shard
        • master_apply_delete_command
      • Metadata / Configuration Information
        • canopy_add_node
        • canopy_update_node
        • canopy_set_node_property
        • canopy_add_inactive_node
        • canopy_activate_node
        • canopy_disable_node
        • canopy_add_secondary_node
        • canopy_remove_node
        • canopy_get_active_worker_nodes
        • canopy_set_coordinator_host
        • master_get_table_metadata
        • get_shard_id_for_distribution_column
        • column_to_column_name
        • canopy_relation_size
        • canopy_table_size
        • canopy_total_relation_size
      • Cluster Management And Repair Functions
        • canopy_copy_shard_placement
        • canopy_move_shard_placement
        • rebalance_table_shards
        • get_rebalance_table_shards_plan
        • get_rebalance_progress
        • canopy_add_rebalance_strategy
        • canopy_set_default_rebalance_strategy
        • canopy_remote_connection_stats
        • canopy_drain_node
        • replicate_table_shards
        • canopy_create_restore_point
    • Canopy Tables and Views
      • Coordinator Metadata
        • Partition table
        • Shard table
        • Shard information view
        • Shard placement table
        • Worker node table
        • Distributed object table
        • Canopy tables view
        • Time partitions view
        • Co-location group table
        • Rebalancer strategy table
        • Distributed Query Activity
    • Configuration Reference
      • General configuration
        • canopy.max_worker_nodes_tracked (integer)
        • canopy.use_secondary_nodes (enum)
        • canopy.cluster_name (text)
        • canopy.enable_version_checks (boolean)
        • canopy.log_distributed_deadlock_detection (boolean)
        • canopy.distributed_deadlock_detection_factor (floating point)
        • canopy.node_connection_timeout (integer)
        • canopy.node_conninfo (text)
        • canopy.local_hostname (text)
      • Data Loading
        • canopy.multi_shard_commit_protocol (enum)
        • canopy.shard_replication_factor (integer)
        • canopy.shard_count (integer)
        • canopy.shard_max_size (integer)
        • canopy.replicate_reference_tables_on_activate (boolean)
      • Planner Configuration
        • canopy.local_table_join_policy (enum)
        • canopy.limit_clause_row_fetch_count (integer)
        • canopy.count_distinct_error_rate (floating point)
        • canopy.task_assignment_policy (enum)
      • Intermediate Data Transfer
        • canopy.binary_worker_copy_format (boolean)
        • canopy.max_intermediate_result_size (integer)
      • DDL
        • canopy.enable_ddl_propagation (boolean)
        • canopy.enable_local_reference_table_foreign_keys (boolean)
      • Executor Configuration
        • General
        • Explain output
    • Append Distribution
      • Creating and Distributing Tables
      • Expiring Data
      • Deleting Data
      • Dropping Tables
      • Data Loading
        • Bulk load using \copy
        • Incremental loads by appending to existing shards
        • Increasing data loading performance
      • Scaling Data Ingestion
        • Coordinator Node Bulk Ingestion (100k/s-200k/s)
        • Worker Node Bulk Ingestion (100k/s-1M/s)
        • Pre-processing Data in Canopy
  • External Integrations
    • Ingesting Data from Kafka
      • Caveats
    • Ingesting Data from Spark
    • Business Intelligence with Tableau

Administer

  • Cluster Management
    • Choosing Cluster Size
      • Shard Count
        • Multi-Tenant SaaS Use-Case
        • Real-Time Analytics Use-Case
    • Initial Hardware Size
      • Multi-Tenant SaaS Use-Case
      • Real-Time Analytics Use-Case
    • Scaling the cluster
      • Add a worker
      • Rebalance Shards
      • Adding a coordinator
    • Dealing With Node Failures
      • Worker Node Failures
      • Coordinator Node Failures
    • Resource Conservation
      • Limiting Long-Running Queries
    • Security
      • Connection Management
      • Setup Certificate Authority signed certificates
      • Increasing Worker Security
    • LightDB extensions
    • Creating a New Database
  • Table Management
    • Determining Table and Relation Size
    • Vacuuming Distributed Tables
    • Analyzing Distributed Tables
    • Columnar Storage
      • Usage
      • Measuring compression
      • Example
      • Gotchas
      • Limitations

Troubleshoot

  • Query Performance Tuning
    • Table Distribution and Shards
    • LightDB tuning
    • Scaling Out Performance
    • Distributed Query Performance Tuning
      • General
        • Subquery/CTE Network Overhead
      • Advanced
        • Connection Management
        • Task Assignment Policy
        • Intermediate Data Transfer Format
        • Binary protocol
    • Scaling Out Data Ingestion
      • Real-time Insert and Updates
        • Insert Throughput
        • Update Throughput
        • Insert and Update: Throughput Checklist
        • Insert and Update: Latency
      • Staging Data Temporarily
      • Bulk Copy (250K - 2M/s)
  • Useful Diagnostic Queries
    • Finding which shard contains data for a specific tenant
    • Finding the distribution column for a table
    • Detecting locks
    • Querying the size of your shards
    • Querying the size of all distributed tables
    • Determining Replication Factor per Table
    • Identifying unused indices
    • Monitoring client connection count
    • Viewing system queries
      • Active queries
      • Why are queries waiting
    • Index hit rate
    • Cache hit rate
  • Common Error Messages
    • Could not receive query results
      • Resolution
    • Canceling the transaction since it was involved in a distributed deadlock
      • Resolution
    • Could not connect to server: Cannot assign requested address
      • Resolution
    • SSL error: certificate verify failed
      • Resolution
    • Could not connect to any active placements
      • Resolution
    • Remaining connection slots are reserved for non-replication superuser connections
      • Resolution
    • PgBouncer cannot connect to server
      • Resolution
    • Cannot create uniqueness constraint
      • Resolution
    • Function create_distributed_table does not exist
      • Resolution
    • STABLE functions used in UPDATE queries cannot be called with column references
      • Resolution

FAQ

  • Frequently Asked Questions
    • Can I create primary keys on distributed tables?
    • How do I add nodes to an existing Canopy cluster?
    • How does Canopy handle failure of a worker node?
    • How does Canopy handle failover of the coordinator node?
    • Are there any LightDB features not supported by Canopy?
    • How do I choose the shard count when I hash-partition my data?
    • How do I change the shard count for a hash partitioned table?
    • How does canopy support count(distinct) queries?
    • In which situations are uniqueness constraints supported on distributed tables?
    • How do I create database roles, functions, extensions etc in a Canopy cluster?
    • What if a worker node’s address changes?
    • Which shard contains data for a particular tenant?
    • I forgot the distribution column of a table, how do I find it?
    • Can I distribute a table by multiple keys?
    • Why does pg_relation_size report zero bytes for a distributed table?
    • Why am I seeing an error about max_intermediate_result_size?
    • Can I shard by schema on Canopy for multi-tenant applications?

Articles

  • Related Articles
    • LightDB Parallel Indexing in Canopy
    • Real-time Event Aggregation at Scale Using LightDB with Canopy
    • How Distributed Outer Joins on LightDB with Canopy Work
      • Distributed Outer Joins with Canopy
    • Designing your SaaS Database for Scale with LightDB
    • Sharding a Multi-Tenant App with LightDB
      • Tenancy
      • Multi-tenancy and co-location, a perfect pair
      • In conclusion
    • Sharding LightDB with Semi-Structured Data and Its Performance Implications
      • One large table, without joins
      • Enter Canopy
      • The query workload
      • Every distribution has its thorns
    • Scalable Real-time Product Search using LightDB with Canopy
Canopy
  • »
  • Search


© Copyright 2019, Hundsun Technologies Inc.