Hello There,
I am starting a series on AI System Design questions. The questions will be based on existing real-world AI systems. The answers will not be posted here. If you are interested in collaborative learning, refer the closing note.
Introduction
NotebookLM (Google NotebookLM) is research and note taking tool developed by Google Labs. The tool relies on Google Gemini Language Model, to assist with users in understanding and interacting with the document. The system must balance latency, scalability, and user experience while maintaining security and privacy.
Clarifications
Functionalities: What specific features are prioritized? (e.g., code completion, real-time collaboration, data visualization)
Answer: Core features include users upload one or more documents, create projects, generating summarizes, transcripts for podcasts. Collaboration is a stretch goal.
Scale: Expected number of concurrent users and data size?
Answer: Target 1M+ users with <500ms latency for critical features (summaries and transcripts).
AI Models: Are we using existing models (Gemini) or custom ones?
Answer: Leverage Google’s existing models (Gemini), with the flexibility to switch such as deep research.
Data Sources: Can the LM access external data (e.g., web, private databases)?
Answer: Initially, only user uploaded content. Later, integrate with Google Drive and Sheets.
Security: Any compliance requirements (e.g., GDPR)?
Answer: Yes. Data must be encrypted, and user consent is required for training.
User Limits: Are we planning to limit the number of questions, answers, and documents for free users vs. paid users?
Answer: Yes. We may restrict the size of documents uploaded by the number of pages/tokens. There will be a differentiation between paid vs. free users.
Input Formats: Are there any limitations in the format of input documents?
Answer: Yes. The allowed formats will be .PDF, Microsoft Word, Open Document Format, and Google Documents.
Problem Framing
Input: The user uploads various documents to NotebookLM. The documents will be stored in a user specific file vault.
Output: Based on users' prompts or predetermined prompts, generate desired outputs.
Closing Note: 
💡 Want to explore further? Share your solution or improvements as a pull request to the repository https://github.com/jaganadhg/ai-system-desgin
Collaborate, iterate, and learn from the community’s creativity! Happy coding! 🚀
