Firestore Data Modeling
Design efficient Firestore data models with subcollections, denormalization, and security rules.
Prerequisites
- GCP project with Firestore access
- Basic understanding of NoSQL database concepts
- Familiarity with document-oriented data models
Firestore Fundamentals
Cloud Firestore is a serverless, NoSQL document database that scales automatically and provides real-time synchronization for mobile and web applications. It stores data as documents organized into collections. Unlike relational databases, Firestore has no fixed schema: each document can contain different fields, and the structure is entirely determined by your application code. This flexibility is both its greatest strength and its biggest pitfall: without careful planning, Firestore data models can become unmaintainable and expensive to query.
Firestore operates in two modes: Native mode (the full-featured default with real-time listeners, offline support, and mobile SDKs) and Datastore mode (backward-compatible with the legacy Cloud Datastore API, supporting higher write throughput but lacking real-time features). Once you choose a mode for a project database, you cannot change it. For new projects, always choose Native mode unless you have existing Datastore applications to migrate.
Firestore automatically replicates data across multiple zones (or regions, in multi-region configurations), providing strong consistency for all reads. This is a significant advantage over eventually consistent NoSQL databases: you never read stale data from Firestore.
Firestore vs. Other GCP Databases
Firestore is ideal for mobile/web apps needing real-time sync and offline support. For analytical queries over massive datasets, use BigQuery. For relational data with strong consistency at global scale, use Cloud Spanner. For traditional PostgreSQL/MySQL workloads, use Cloud SQL or AlloyDB. Firestore is not a general-purpose database. It excels at specific patterns (hierarchical data, real-time sync, offline-first) and struggles with others (ad-hoc queries, aggregations, joins).
Document and Collection Design
The fundamental principle of Firestore data modeling is to structure your data to match your queries. Firestore does not support joins or complex aggregation across collections (with limited exceptions via aggregation queries added in 2023). Instead, you denormalize data so each query reads from a single document or collection. This is the opposite of relational database design, where normalization eliminates data duplication.
Firestore's data hierarchy:
- Collection: A container of documents. Collections cannot contain other collections directly. Collection names must be unique within their parent.
- Document: A record with fields (key-value pairs). Maximum 1 MB per document. Identified by a unique ID within its collection. Documents can contain nested maps and arrays, but these cannot be queried independently.
- Subcollection: A collection nested under a document. This creates a hierarchy:
users/alice/orders/order123. Subcollections are independent of their parent document. Deleting a parent does not delete its subcollections.
Document Size Considerations
The 1 MB document size limit sounds generous, but it can be reached quickly with arrays or nested maps. A document with a 10,000-element array of objects can easily exceed 1 MB. More importantly, Firestore charges for each byte read, so large documents cost more per read even if you only need one field. Design documents to contain only the data needed for a single view or query.
| Design Choice | Pros | Cons | When to Use |
|---|---|---|---|
| Small, focused documents | Fast reads, low per-read cost | More documents, more writes to update | High-read, low-write workloads |
| Large, denormalized documents | Single read gets all data | Wastes bandwidth if you only need some fields | Display-heavy screens, read-once patterns |
| Subcollections | Independent scaling, parent-scoped queries | Cannot easily query across parents | Data naturally scoped to a parent |
| Root collections | Flexible cross-entity queries | Manual parent tracking, no hierarchy | Data queried independently of any parent |
// users/{userId}
interface UserDocument {
displayName: string;
email: string;
photoUrl: string;
createdAt: Timestamp;
// Denormalized counts for display (avoid querying subcollections)
orderCount: number;
totalSpent: number;
// Preferences stored directly (small, frequently accessed)
preferences: {
theme: "light" | "dark";
currency: string;
notifications: boolean;
};
}
// users/{userId}/orders/{orderId}
interface OrderDocument {
status: "pending" | "processing" | "shipped" | "delivered";
items: OrderItem[];
subtotal: number;
tax: number;
total: number;
shippingAddress: Address;
createdAt: Timestamp;
updatedAt: Timestamp;
// Denormalized user info (avoid extra read to fetch user)
userName: string;
userEmail: string;
}
// products/{productId}
interface ProductDocument {
name: string;
description: string;
price: number;
category: string;
tags: string[];
inventory: number;
images: string[];
averageRating: number;
reviewCount: number;
// Denormalized category info
categoryName: string;
categorySlug: string;
}Key Modeling Patterns
Pattern 1: Denormalization
In Firestore, reads are cheap but joins are impossible. The primary strategy is to duplicate data across documents so that each screen in your application can be served with a single query. The tradeoff is that writes become more complex because you need to update duplicated fields in multiple places. This is the inverse of relational database design, and it is the most important mental shift when working with Firestore.
Denormalize data that is:
- Read frequently but written infrequently
- Stable (e.g., user display names change rarely)
- Needed alongside the document's own data for display
Do not denormalize data that:
- Changes frequently (you will need to fan out updates constantly)
- Is sensitive (PII duplicated across documents increases breach impact)
- Is large (duplicating images or blobs wastes storage)
import {
doc, setDoc, getDoc, updateDoc, collection,
serverTimestamp, increment
} from "firebase/firestore";
async function createOrder(userId: string, items: OrderItem[]) {
// Read user once to get denormalized fields
const userSnap = await getDoc(doc(db, "users", userId));
const user = userSnap.data();
if (!user) throw new Error("User not found");
const orderRef = doc(collection(db, "users", userId, "orders"));
const orderTotal = items.reduce(
(sum, item) => sum + item.price * item.quantity, 0
);
await setDoc(orderRef, {
items,
total: orderTotal,
status: "pending",
createdAt: serverTimestamp(),
// Denormalized from user document
userName: user.displayName,
userEmail: user.email,
});
// Update denormalized count on user document
await updateDoc(doc(db, "users", userId), {
orderCount: increment(1),
totalSpent: increment(orderTotal),
});
}Pattern 2: Subcollections vs. Root Collections
Choose between subcollections and root collections based on your query patterns. This is one of the most impactful decisions in Firestore modeling:
| Approach | Structure | Best For | Limitation |
|---|---|---|---|
| Subcollection | users/alice/orders/order1 | Data naturally scoped to a parent (user's orders) | Cannot easily query across all users' orders |
| Root collection | orders/order1 with userId field | Cross-user queries (all orders by status) | Requires composite indexes for filtered queries |
| Collection group query | Subcollection + collectionGroup("orders") | Best of both: hierarchical storage, cross-parent queries | Requires collection group indexes |
Prefer Collection Group Queries
Collection group queries let you query across all subcollections with the same name. Store orders as users/{userId}/orders/{orderId} and use collectionGroup("orders") to query all orders across all users. This gives you the organizational benefit of subcollections with the query flexibility of root collections. Just remember to create the required collection group indexes and write appropriate security rules for collection group access.
Pattern 3: Aggregation Documents
Firestore charges per document read. If you need to display a count or sum, reading thousands of documents to compute it client-side is both slow and expensive. Instead, maintain aggregation documents that are updated on each write using Cloud Functions triggers or batched writes.
import { onDocumentCreated } from "firebase-functions/v2/firestore";
import { getFirestore, FieldValue } from "firebase-admin/firestore";
const db = getFirestore();
// Triggered when a new review is created
export const updateProductRating = onDocumentCreated(
"products/{productId}/reviews/{reviewId}",
async (event) => {
const review = event.data?.data();
if (!review) return;
const productRef = db.doc(`products/${event.params.productId}`);
// Use a transaction to safely update the aggregate
await db.runTransaction(async (txn) => {
const product = await txn.get(productRef);
const data = product.data()!;
const oldTotal = data.averageRating * data.reviewCount;
const newCount = data.reviewCount + 1;
const newAverage = (oldTotal + review.rating) / newCount;
txn.update(productRef, {
averageRating: newAverage,
reviewCount: newCount,
});
});
}
);Pattern 4: Distributed Counters
A single Firestore document can sustain approximately 1 write per second. For hot counters (like page views, likes, or inventory decrements), this limit is quickly reached. The solution is distributed counters: shard the count across multiple documents and sum them on read.
const NUM_SHARDS = 10;
// Initialize counter shards
async function initCounter(counterId: string) {
const batch = writeBatch(db);
for (let i = 0; i < NUM_SHARDS; i++) {
const shardRef = doc(db, "counters", counterId, "shards", String(i));
batch.set(shardRef, { count: 0 });
}
await batch.commit();
}
// Increment: randomly pick a shard to distribute writes
async function incrementCounter(counterId: string) {
const shardId = String(Math.floor(Math.random() * NUM_SHARDS));
const shardRef = doc(db, "counters", counterId, "shards", shardId);
await updateDoc(shardRef, { count: increment(1) });
}
// Read: sum all shards
async function getCount(counterId: string): Promise<number> {
const shardsSnap = await getDocs(
collection(db, "counters", counterId, "shards")
);
let total = 0;
shardsSnap.forEach((shard) => {
total += shard.data().count;
});
return total;
}Watch the 1 Write per Second per Document Limit
A single Firestore document can sustain approximately 1 write per second. Exceeding this rate causes contention errors. Use distributed counters for any field that receives frequent updates. For example, a global “page views” counter that updates on every request needs at least 10 shards to support 10 writes per second, or 100 shards for 100 writes per second.
Pattern 5: Data Bucketing for Time-Series
For time-series data (events, logs, metrics), storing each event as a separate document creates high read costs when you need to display aggregated views. Instead, bucket events into time-windowed documents that contain arrays of events.
// Instead of: events/{eventId} (one doc per event)
// Use: events/{YYYY-MM-DD_HH} (one doc per hour, containing array of events)
async function logEvent(event: AppEvent) {
const bucket = new Date().toISOString().slice(0, 13).replace("T", "_");
const bucketRef = doc(db, "analytics", bucket);
await updateDoc(bucketRef, {
events: arrayUnion({
type: event.type,
userId: event.userId,
timestamp: Timestamp.now(),
data: event.data,
}),
eventCount: increment(1),
}).catch(async () => {
// Document doesn't exist yet, create it
await setDoc(bucketRef, {
events: [{
type: event.type,
userId: event.userId,
timestamp: Timestamp.now(),
data: event.data,
}],
eventCount: 1,
bucketStart: Timestamp.now(),
});
});
}
// Read a day of events: only 24 document reads instead of thousands
async function getEventsForDay(date: string): Promise<AppEvent[]> {
const events: AppEvent[] = [];
for (let hour = 0; hour < 24; hour++) {
const bucket = `${date}_${String(hour).padStart(2, "0")}`;
const snap = await getDoc(doc(db, "analytics", bucket));
if (snap.exists()) {
events.push(...snap.data().events);
}
}
return events;
}Indexing Strategy
Firestore automatically creates single-field indexes for every field in every document. Composite indexes (queries with multiple filters or ordering) must be created explicitly. Firestore will return an error with a direct link to create the missing index when you run a query that needs one. This is one of Firestore's most developer-friendly features.
Index Types
- Single-field indexes: Created automatically. Exempt fields you never query on (like large text fields or blob data) to save storage and write costs. Each indexed field adds to write latency and storage.
- Composite indexes: Required for queries with multiple equality filters, range filters combined with equality, or orderBy on a different field than the filter.
- Collection group indexes: Required for collection group queries. Must be explicitly created.
Index Limits
Each document can have a maximum of 200 composite indexes and 500 single-field indexes. The total index entry size for a document is limited to approximately 8 MB. For documents with many fields or large array fields, you may need to exempt some fields from indexing.
{
"indexes": [
{
"collectionGroup": "orders",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "status", "order": "ASCENDING" },
{ "fieldPath": "createdAt", "order": "DESCENDING" }
]
},
{
"collectionGroup": "products",
"queryScope": "COLLECTION",
"fields": [
{ "fieldPath": "category", "order": "ASCENDING" },
{ "fieldPath": "price", "order": "ASCENDING" }
]
},
{
"collectionGroup": "orders",
"queryScope": "COLLECTION_GROUP",
"fields": [
{ "fieldPath": "status", "order": "ASCENDING" },
{ "fieldPath": "createdAt", "order": "DESCENDING" }
]
}
],
"fieldOverrides": [
{
"collectionGroup": "products",
"fieldPath": "description",
"indexes": []
},
{
"collectionGroup": "analytics",
"fieldPath": "events",
"indexes": []
}
]
}# Deploy indexes from the configuration file
firebase deploy --only firestore:indexes
# List existing indexes
gcloud firestore indexes composite list --database="(default)"
# Create an index via gcloud
gcloud firestore indexes composite create \
--database="(default)" \
--collection-group=products \
--field-config=field-path=category,order=ascending \
--field-config=field-path=price,order=ascendingExempt Large Fields from Indexing
Fields containing large text (descriptions, content), arrays with many elements, or data that is never queried should be exempted from single-field indexing. Each indexed field adds to write latency and storage cost. A document with 50 indexed fields takes significantly longer to write than one with 10. Use the fieldOverrides section of your indexes file to exempt fields.
Querying Best Practices
Firestore queries are fundamentally different from SQL queries. Understanding what Firestore can and cannot do helps you design your data model correctly from the start.
What Firestore Queries Can Do
- Filter on one or more fields with equality (==) and range (<, >, <=, >=) operators
- Filter on array membership (array-contains, array-contains-any)
- Filter on equality for multiple values (in, not-in)
- Order results by one or more fields
- Limit results (limit, limitToLast)
- Paginate using startAt/startAfter cursors
- Count documents matching a query (aggregation query)
- Sum and average fields (aggregation queries)
What Firestore Queries Cannot Do
- Join across collections (no foreign keys)
- Full-text search (use Algolia, Typesense, or Vertex AI Search)
- Range filters on multiple fields simultaneously
- Inequality filters on different fields (e.g., price > 10 AND rating > 4)
- OR conditions across different fields
- Query on computed values (no expressions in queries)
import {
collection, query, where, orderBy, limit,
startAfter, getDocs, getCountFromServer
} from "firebase/firestore";
// Simple filtered query
const pendingOrders = query(
collection(db, "users", userId, "orders"),
where("status", "==", "pending"),
orderBy("createdAt", "desc"),
limit(20)
);
// Array-contains query (products with a specific tag)
const saleProducts = query(
collection(db, "products"),
where("tags", "array-contains", "sale"),
orderBy("price", "asc")
);
// Pagination with cursors
async function getNextPage(lastDoc: DocumentSnapshot) {
const nextPage = query(
collection(db, "products"),
where("category", "==", "electronics"),
orderBy("createdAt", "desc"),
startAfter(lastDoc),
limit(20)
);
return getDocs(nextPage);
}
// Aggregation query (count without reading all documents)
const orderCount = await getCountFromServer(
query(
collection(db, "users", userId, "orders"),
where("status", "==", "delivered")
)
);
console.log("Delivered orders:", orderCount.data().count);Security Rules
Firestore Security Rules control access at the document level. They run on every read and write from client SDKs (server SDKs using a service account bypass rules). Well-designed rules are essential because Firestore is often accessed directly from mobile and web clients without a backend API layer; the security rules ARE your authorization layer.
Security Rule Principles
- Default deny: Rules default to denying all access. You must explicitly allow each access pattern.
- Authentication first: Almost every rule should check
request.auth != nullas a baseline. - Validate data: Validate field types, required fields, and value ranges in write rules. The client can send any data; rules are your validation layer.
- Limit document size: Use
request.resource.data.keys().size()to prevent clients from adding unexpected fields. - Test thoroughly: Security rule bugs are security vulnerabilities. Use the Firestore emulator and unit testing.
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
// Helper function: check if user is authenticated
function isAuthenticated() {
return request.auth != null;
}
// Helper function: check if user is the resource owner
function isOwner(userId) {
return isAuthenticated() && request.auth.uid == userId;
}
// Helper function: check if user is an admin
function isAdmin() {
return isAuthenticated() && request.auth.token.admin == true;
}
// Users can only read/write their own profile
match /users/{userId} {
allow read: if isAuthenticated();
allow create: if isOwner(userId)
&& request.resource.data.keys().hasAll(["displayName", "email"])
&& request.resource.data.displayName is string
&& request.resource.data.displayName.size() <= 100;
allow update: if isOwner(userId)
&& !request.resource.data.diff(resource.data).affectedKeys()
.hasAny(["orderCount", "totalSpent"]);
// Users can only access their own orders
match /orders/{orderId} {
allow read: if isOwner(userId);
allow create: if isOwner(userId)
&& request.resource.data.total is number
&& request.resource.data.total > 0
&& request.resource.data.status == "pending";
allow update: if false; // Orders are immutable from client
}
}
// Products are readable by anyone, writable by admins only
match /products/{productId} {
allow read: if true;
allow write: if isAdmin();
match /reviews/{reviewId} {
allow read: if true;
allow create: if isAuthenticated()
&& request.resource.data.rating is int
&& request.resource.data.rating >= 1
&& request.resource.data.rating <= 5
&& request.resource.data.keys().hasAll(["rating", "text", "userId"])
&& request.resource.data.userId == request.auth.uid;
allow update, delete: if request.auth.uid == resource.data.userId;
}
}
// Collection group rule for admin order queries
match /{path=**}/orders/{orderId} {
allow read: if isAdmin();
}
}
}Security Rules Are Not Filters
A common mistake is trying to use security rules as query filters. Security rules evaluate whether a query could return unauthorized data, not whether it does. If your rule allows reading orders only for the authenticated user, the query must include a where("userId", "==", currentUser.uid) filter. A query without this filter will be denied even if no unauthorized documents exist. The rule and the query must be consistent.
Testing Security Rules
import {
initializeTestEnvironment,
assertFails,
assertSucceeds,
} from "@firebase/rules-unit-testing";
const testEnv = await initializeTestEnvironment({
projectId: "test-project",
firestore: {
rules: readFileSync("firestore.rules", "utf8"),
},
});
// Test: authenticated user can read their own profile
test("user can read own profile", async () => {
const db = testEnv.authenticatedContext("alice").firestore();
await assertSucceeds(getDoc(doc(db, "users", "alice")));
});
// Test: user cannot read another user's orders
test("user cannot read other user orders", async () => {
const db = testEnv.authenticatedContext("alice").firestore();
await assertFails(getDoc(doc(db, "users", "bob", "orders", "order1")));
});
// Test: unauthenticated user cannot write
test("unauthenticated user cannot create user", async () => {
const db = testEnv.unauthenticatedContext().firestore();
await assertFails(
setDoc(doc(db, "users", "hacker"), { displayName: "Hacker" })
);
});
// Test: review rating must be 1-5
test("review rating must be valid", async () => {
const db = testEnv.authenticatedContext("alice").firestore();
await assertFails(
setDoc(doc(db, "products", "p1", "reviews", "r1"), {
rating: 6, text: "Great!", userId: "alice"
})
);
});Test Your Security Rules
Use the Firestore emulator and the @firebase/rules-unit-testing library to write automated tests for your security rules. Rules bugs are security vulnerabilities. Test both allowed and denied scenarios for every collection, including edge cases like missing fields, wrong data types, and boundary values. Run these tests in CI on every PR that touches security rules.
Performance Optimization
Firestore performance depends primarily on data model design. A well-designed model makes queries fast and cheap; a poorly designed one creates hot spots and excessive reads.
Read Optimization
- Use real-time listeners wisely: Listeners keep a WebSocket connection open and stream changes. They are efficient for data that changes frequently and is displayed on screen. Do not use listeners for data you read once and discard.
- Paginate large result sets: Never read unbounded collections. Always use
limit()and cursor-based pagination. - Use offline persistence: Enable offline persistence for mobile apps to reduce network reads and provide offline functionality.
- Bundle frequently-read data: Use Firestore Data Bundles to pre-package common queries and serve them from CDN.
Write Optimization
- Batch writes: Group related writes into batched writes (up to 500 operations) for atomicity and reduced latency.
- Avoid hot spots: Do not create documents with sequential IDs (timestamps, auto-incrementing counters) at high rates. Use Firestore's auto-generated IDs for distributed writes.
- Use server timestamps: Always use
serverTimestamp()instead of client-side timestamps for consistent ordering.
Firestore Pricing and Cost Control
Firestore charges for document reads, writes, deletes, and stored data. Understanding the pricing model is essential for designing cost-effective data models.
| Operation | Cost (per 100K operations) | Cost Driver |
|---|---|---|
| Document reads | $0.036 | Queries, listeners, get operations |
| Document writes | $0.108 | Create, update, set operations |
| Document deletes | $0.012 | Delete operations |
| Stored data | $0.108/GB/month | Total data + index storage |
Cost control strategies:
- Use aggregation queries (count, sum) instead of reading all documents to compute totals
- Maintain denormalized aggregation documents to avoid repeated full-collection reads
- Exempt large, unqueried fields from indexing to reduce storage costs
- Use offline persistence to reduce redundant reads from listeners
- Set appropriate TTLs on documents that expire (Firestore TTL is a built-in feature)
# Set TTL policy to auto-delete documents with an 'expiresAt' field
gcloud firestore fields ttls update expiresAt \
--collection-group=sessions \
--database="(default)"
# In your application code, set the TTL field:
# await setDoc(doc(db, "sessions", sessionId), {
# userId: user.uid,
# createdAt: serverTimestamp(),
# expiresAt: Timestamp.fromDate(new Date(Date.now() + 24 * 60 * 60 * 1000)),
# });Migration and Data Management
As your application evolves, you will need to migrate data between schemas, export data for analysis, and manage backups.
# Export all data to Cloud Storage
gcloud firestore export gs://my-backup-bucket/firestore-backup \
--database="(default)"
# Export specific collections
gcloud firestore export gs://my-backup-bucket/firestore-backup \
--database="(default)" \
--collection-ids=users,products,orders
# Import data from a previous export
gcloud firestore import gs://my-backup-bucket/firestore-backup \
--database="(default)"
# Schedule daily backups with Cloud Scheduler
gcloud scheduler jobs create http firestore-daily-backup \
--schedule="0 2 * * *" \
--uri="https://firestore.googleapis.com/v1/projects/my-project/databases/(default):exportDocuments" \
--http-method=POST \
--message-body='{"outputUriPrefix":"gs://my-backup-bucket/daily"}' \
--oauth-service-account-email=backup-sa@my-project.iam.gserviceaccount.comKey Takeaways
- 1Firestore organizes data in documents within collections with no fixed schema required.
- 2Subcollections create hierarchical data structures with independent query capabilities.
- 3Denormalization (duplicating data) is expected and improves read performance.
- 4Security rules validate reads and writes at the document level, so design for them early.
- 5Composite indexes must be created for queries with multiple field filters.
- 6Use batch writes and transactions for atomic multi-document operations.
Frequently Asked Questions
What is the difference between Firestore and Realtime Database?
When should I use subcollections vs nested data?
How do I handle many-to-many relationships in Firestore?
What are Firestore pricing best practices?
How do Firestore security rules work?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.