SWE 411
Software Design Document (SDD)
Group 02
The Presenter
Members
Ammar Alhmmad
Ibrahim Mohsen
Muhammad Ullah
Nawaf Almalki
Thabet Aljbreen
Yazeed Alsaeed
1. Introduction
1.1. Purpose
This document is to show the design artifact for the presenter web application which is consist of a platform and the video creating system.
1.2. Scope
This Software Design Document outlines the development of a comprehensive system designed to convert PowerPoint slide content to an explanation video. The system encompasses several key modules and functionalities:
The platform: is a web application that provides the user with a friendly interface where he can browse, create and watch the videos
Video creating system: this subsystem is responsible of processing the user request of creating video including converting, text-to-speech, and rendering module.
AI system: this sub system is responsible of classifying and parsing the slides, also it’s worked to generate a prompt to be sent to an external LLM API sent by the same system.
1.3. References
- The presenter SRS document
1.4. Overview
The document provides a detailed technical specification for a software system, starting with a system overview that includes use case diagrams and descriptions for a high-level understanding. The second section covers the deployment architecture and package diagram, detailing how internal subsystems interact. This is followed by a technology selection table, specifying the technologies for each system part. The main part of the document consists of four design artifacts: class diagrams, data flow diagrams, sequence diagrams, and activity diagrams, along with pseudocode, offering an in-depth understanding of system components and functionalities. The data design section includes an Enhanced Entity-Relationship (EER) diagram and a data dictionary, explaining database component relationships and constraints. The human interface design is then detailed, focusing on user interaction and experience. The document concludes with a requirements matrix and resource estimates, aligning system requirements with resource management.
1.5. Constraints
Briefly describe any restrictions, limitations or constraints that impact the design or implementation
• The systems’ frontend shall be developed using React library.
• The systems’ backend shall be developed using bun.js
• The system’s AI shall be developed using python.
• The system shall be developed within 10 months
• The system shall be developed following waterfall method.
• The system shall support English language.
2. System Overview
- Briefly introduce the system context and design, and discuss the background to the project. Also add revised [Use case diagram]
The Presenter is an innovative software solution designed to change the way students, educators and content consumers interact with presentations and notes. The system offers
a user-friendly platform to enhance the experience of users with presentation files automatically converted into engaging video content.
Utilizing a combination of advanced AI technologies, Large Language Models, Text-tospeech, Natural language processing and a proprietary Machine Language model, “The Presenter” interprets the content of PowerPoint slides and automatically converts them into a video format with the appropriate narrative. The AI-driven approach ensures each request is catered to the users specific needs with an automatic and speedy procedure.
The system architecture uses a tech stack consisting of MangoDB for the database management, bun.js for server-side operations, React.js for the front-end user interface, and Python for handling backend logic and AI integration.
Background
The increase in demand for engaging educational content in digital formats was one of the core reasons behind the creation of “The Presenter”. The project aims to revolutionize the way users interact with presentations and notes. “The Presenter” effortlessly converts static presentation materials into captivating video content curated to each user’s unique specifications. With the increase in ease of access to AI tools and the ever-increasing refinement of consumer ready applications utilizing Artificial Intelligence, the creation of our project has become more viable and essential.
The project is developed over an 8-month period using Waterfall methodology. The project is designed in mind for a userbase ranging over all levels of technical proficiency ensuring ease of use with no compromise of quality in the output. However, the primary target audience are Students, Educators and Teachers, Content Creators, Business Professionals and General Users with Presentation Needs.

3. System Architecture
3.1. Deployment Architectural Description

The deployment diagram illustrates the structure of a web-based platform designed to transform PowerPoint presentations into video presentations using AI technologies. It consists of a Database Server for storing user data and generated videos, a Web Server running a JavaScript application (Bun.js) for handling requests and serving content, and a Client (Frontend) built with React.js for user interactions. An External API is integrated to leverage AI capabilities like Large Language Models, while an AWS Server hosts an API Gateway that will manage video creation system, where the AI processing occurs. The diagram reflects a system designed for ease of use, scalability, and efficient processing of multimedia content.
3.2. Alternative Deployment Architectural Design Description
3.2.1. First Alternative

The first alternative deployment diagram presents a microservicesoriented architecture for a platform that transforms PowerPoint presentations into video presentations. The client side is a React.js application that communicates with an Node.js Server. The server-side, hosted on AWS, employs an API Gateway built with bun.js to direct requests to various specialized VM microservices. These include a GPT-3 Turbo for scripting, a local TTS model for speech generation, image processing for media handling, and user and content management services for interaction and data operations, along with a video rendering service for final output creation. Data is managed by a separate Database Server featuring Bunny Storage for videos and MongoDB for structured data. This architecture facilitates independent scaling and updating of services, embodying the flexibility and resilience of microservices designs, although it's more complex and may be considered for later stages rather than initial deployment.

This last alternative deployment diagram is a classic three-tier web application structure. The client tier, composed of PCs and mobile devices, interfaces with the system via HTTP/HTTPS requests. These requests are managed by a web server that either delivers content directly to the client or makes requests to the application server. The application server handles the core logic, including communication with specialized AI services to create the final product. Data storage and retrieval are handled by a dedicated database server.
3.3. Design Rationale
The main deployment diagram for the Presenter platform offers a well-rounded approach, focusing on user experience, scalability, and efficient multimedia processing. Central to this design are the Database Server for data storage, a Bun.js Web Server for handling requests and content delivery, and a React.js Client for user interactions. Additionally, the integration of an External AI API and an AWS-hosted API Gateway for video processing highlights the system's advanced AI capabilities and scalability. This design is chosen for its balance between functionality, ease of use, and potential for future expansion, making it a solid foundation for the platform.
In contrast, the alternative designs present different trade-offs. The first alternative suggests a microservices-oriented architecture, offering flexibility and independent service scaling, but its complexity might be more suited for later developmental stages. The second alternative, a classic three-tier web application, provides a straightforward and traditional structure with clear separations of concerns. However, it may lack the advanced AI integration and scalability of the main design. Therefore, the main deployment diagram is preferred for its comprehensive approach, aligning with the platform's goals of user-friendly, scalable, and AI-enhanced video creation.
3.4. Component Decomposition Description
In the presenter application there is 2 packages, one is responsible for creating the video and other one for managing all the AI based processes.
So, video creating system is composition of 5 classes.
Video creator: is the main class which receiving the requests and uses other classes services
Converter: responsible of converting slides to PNG and PDF.
Render: responsible of creating a clip from slides and syn the audio on it
TTS_model: is the module responsible to convert textual script to voiceovers
Slide: is an object that store all the slides attributes.
The AI system is composition of 9 classes
Façade interface: is the system interface that communicate with other systems
layout_classfier: its main objective is to use an AI model to specify the slide layout class
parser: the parser is responsible to extract text information from the slides uses his 3 child classes (regular_parse, OCR_parser, image_processor)
regular parse: is designed to parse the known encodings PDF files
OCR_parser: is designed to extract text from images for unknown PDF encoding this class will extract the text from the PNG version
image_processor: after confirming image existence this class will use AI image recognition to describe the image.
prompt_genrator: after information collection done by all classes this class will use a special AI model that create a prompt asking for script based on the collected information.
API_requester: this class is responsible of ChatGPT API communication
Slide: is an object that store all the slides attributes.

4. Technology selection
Provide detailed comparison of the available technology options such as programming language, database, hardware, etc. and justify your selection.
Layer Technology
Frontend React.js
Backend Bun.js
Database
Mongo DB
Bunny video storage
AI developing Python
External API
Chat GPT 4
Source Control GitHub
Justify
• The most used in industry
• Easily turn the website to web app
• Fast development
• Easy to maintain and understand
• High performance
• Easy to learn
• Fully compatible with Node.js packages
• High performance
High community support
• Cheap
• Scalable
• Easy to learn
• Plenty of AI libraries
• High community support for AI
• can customize it to suit our project easily
• Make development seamless between members
5. Component Design
5.1. System Components: [Detailed Class Diagram]

5.3.13. Creating the video system DFD-12

5.4. Pseudo code and Activity Diagram
Detailed pseudo code and activity diagram for the non-trivial methods; {Set, Get, default constructors, etc.} are considered trivial methods where pseudo code is not required.
5.4.1. AI model:




5.4.1.3. Prompt generating


5.4.2. Video creating:
5.4.2.1. Convert


5.4.2.2. Split



5.4.2.3. Json wrapper


5.4.2.4. Generate audio


5.4.2.5. Render video


5.4.3. Backend:
5.4.3.1. Request creating video


5.4.3.2. Sign in


5.4.3.3. Create account




5.4.3.5. Store video to DB


5.4.3.6. Retrieve video


5.4.3.7. Update account information




6. Data Design
6.1. Database Description

6.2. Data Dictionary
6.2.1. Person
Attributes Relations Types Description of the attributes
id PK Integer Unique User ID
String unique Email
Password String Hashed password
createdAt Timestamp Creating account time stamp
updatedAt Timestamp Last update time stamp
6.2.2. User
Attributes Relations Types Description of the attributes
id FK Integer Unique User ID
Verified Integer Registration status of the user, where 1 indicate verified email, and 0 indicate not verified email, and -1 indicate a blocked email
6.2.3. Admin
Attributes Relations Types Description of the attributes
id FK Integer Unique User ID
6.2.4. Playlist
Attributes Relations Types Description of the attributes id
URL String The URL lead to the playlist
visibility
createdAt
updatedAt
Boolean Is it public or privet
Timestamp Creating playlist time stamp
Timestamp Last update time stamp
6.2.5. Video

7.1.2. Play video page

7.1.3. Library page



7.1.5. Login

7.1.6. Sign up

7.1.7. Create video










9. Resource Estimates
9.1. RAM (Random Access Memory):
Server: Estimated 16-32 GB, to efficiently handle simultaneous requests, data processing, and AI operations.
Client: 4-8 GB, sufficient for smooth running of the client application and handling multimedia content.
9.2. Storage:
Database Server: Approximately 1-2 TB, scalable based on user data and video content growth. Includes space for backups and logs.
Web Server: 500 GB - 1 TB, for application data, logs, and temporary processing files.
Client: Minimal, primarily for temporary files and cache.
9.3. CPU (Central Processing Unit):
Server: Multi-core (8 cores or more) processors, to efficiently handle multiple threads for AI processing and data handling.
Client: Standard modern CPU, capable of handling multimedia content and basic computing tasks.
9.4. Bandwidth:
High bandwidth (1 Gbps or more) is essential for seamless data transfer, especially for video uploading and downloading, and API communications.
9.5. Graphics Processing Unit (GPU):
For servers handling AI and video processing, a high-performance GPU is recommended for accelerated computing tasks.
9.6. Power Consumption:
To be estimated based on server specifications and usage patterns. Efficient power use is crucial for sustainability.
9.7. Network Resources:
Robust network infrastructure to ensure low latency and high throughput, especially for cloud-based components and external API communications.
10. Definitions
Acronyms, and Abbreviations - provide definitions of all terms, acronyms and abbreviations needed for the SDD.