Senior Project SDD by Ammar Alhmmad

SWE 411

Software Design Document (SDD)

Group 02

The Presenter

Members

Ammar Alhmmad

Ibrahim Mohsen

Muhammad Ullah

Nawaf Almalki

Thabet Aljbreen

Yazeed Alsaeed

1. INTRODUCTION 5 1.1. PURPOSE 5 THIS DOCUMENT IS TO SHOW THE DESIGN ARTIFACT FOR THE PRESENTER WEB APPLICATION WHICH IS CONSIST OF A PLATFORM AND THE VIDEO CREATING SYSTEM. 5 1.2. SCOPE 5 1.3. REFERENCES 5 1.4. OVERVIEW 5 1.5. CONSTRAINTS 5 2. SYSTEM OVERVIEW 5 3. SYSTEM ARCHITECTURE 8 3.1. DEPLOYMENT ARCHITECTURAL DESCRIPTION 8 3.2. ALTERNATIVE DEPLOYMENT ARCHITECTURAL DESIGN DESCRIPTION 9 3.2.1. FIRST ALTERNATIVE 9 3.2.2. SECOND ALTERNATIVE 10 3.3. DESIGN RATIONALE 11 3.4. COMPONENT DECOMPOSITION DESCRIPTION 12 4. TECHNOLOGY SELECTION 14 5. COMPONENT DESIGN 15 5.1. SYSTEM COMPONENTS: [DETAILED CLASS DIAGRAM] 15 5.2. FUNCTION 16 5.2.1. CREATE ACCOUNT 16 5.2.2. LOG IN 16 5.2.3. MANAGE USERS 17 5.2.4. DELETE CONTENT 17 5.2.5. UPLOAD FILE 18 5.2.6. CREATE VIDEO 18 5.2.7. MANGE PLAYLIST 19 5.2.8. MANGE LIBRARY 20 5.2.9. PLAY VIDEO 20 5.2.10. SHARE VIDEO/PLAYLIST 20 5.2.11. SEARCH CONTENT 21 5.2.12. EXPLORE CONTENT 21 5.2.13. SAVE PLAYLIST 21 5.2.14. LIKE VIDEO / ADD VIDEO TO PLAYLIST 22 5.2.15. MANAGE VIDEOS 22 5.3. INTERFACES 23 5.3.1. DFD0 23 5.3.2. GENERIC DFD-1 24 5.3.2.1. Log-in 24 5.3.2.2. Create account 24 5.3.3. ADMIN DFD-2 25 5.3.4. CREATING VIDEO DFD-3 26

5.3.5. VIDEO DFD-4 27 5.3.6. PLAYLIST DFD-5 27 5.3.7. LIBRARY DFD-6 28 5.3.8. PLAY VIDEOS DFD-7 28 5.3.9. SHARE VIDEO AND PLAYLIST DFD-8 29 5.3.10. EXPLORE DFD-9 30 5.3.11. LIKE VIDEO AND PLAYLIST DFD-10 31 5.3.12. AI MODEL DFD-11 32 5.3.13. CREATING THE VIDEO SYSTEM DFD-12 33 5.4. PSEUDO CODE AND ACTIVITY DIAGRAM 33 5.4.1. AI MODEL: 33 5.4.1.1. Classify 34 5.4.1.2. Parse 35 5.4.1.3. Prompt generating 36 5.4.2. VIDEO CREATING: 37 5.4.2.1. Convert 37 5.4.2.2. Split 38 5.4.2.3. Json wrapper 39 5.4.2.4. Generate audio 40 5.4.2.5. Render video 41 5.4.3. BACKEND: 42 5.4.3.1. Request creating video 42 5.4.3.2. Sign in 43 5.4.3.3. Create account 44 5.4.3.4. Regeneration 45 5.4.3.5. Store video to DB 46 5.4.3.6. Retrieve video 47 5.4.3.7. Update account information 48 5.4.3.8. Verify account 49 6. DATA DESIGN 50 6.1. DATABASE DESCRIPTION 50 6.2. DATA DICTIONARY 51 6.2.1. PERSON 51 6.2.2. USER 51 6.2.3. ADMIN 51 6.2.4. PLAYLIST 51 6.2.5. VIDEO 52 7. HUMAN INTERFACE DESIGN 52 7.1. SCREEN IMAGES 52 7.1.1. HOME PAGE 52 7.1.2. PLAY VIDEO PAGE 53 7.1.3. LIBRARY PAGE 53 7.1.4. PLAYLIST VIEW 54 7.1.5. LOGIN 55 7.1.6. SIGN UP 55 7.1.7. CREATE VIDEO 56

7.1.8. ADMIN VIEWS 58 7.2. REPORT FORMATS 60 7.2.1. ADMIN REPORTS 60 8. REQUIREMENTS MATRIX 61 9. RESOURCE ESTIMATES 62 9.1. RAM (RANDOM ACCESS MEMORY): 62 9.2. STORAGE: 62 9.3. CPU (CENTRAL PROCESSING UNIT): 62 9.4. BANDWIDTH: 62 9.5. GRAPHICS PROCESSING UNIT (GPU): 62 9.6. POWER CONSUMPTION: 62 9.7. NETWORK RESOURCES: 62 10. DEFINITIONS 63

1. Introduction

1.1. Purpose

This document is to show the design artifact for the presenter web application which is consist of a platform and the video creating system.

1.2. Scope

This Software Design Document outlines the development of a comprehensive system designed to convert PowerPoint slide content to an explanation video. The system encompasses several key modules and functionalities:

The platform: is a web application that provides the user with a friendly interface where he can browse, create and watch the videos

Video creating system: this subsystem is responsible of processing the user request of creating video including converting, text-to-speech, and rendering module.

AI system: this sub system is responsible of classifying and parsing the slides, also it’s worked to generate a prompt to be sent to an external LLM API sent by the same system.

1.3. References

- The presenter SRS document

1.4. Overview

The document provides a detailed technical specification for a software system, starting with a system overview that includes use case diagrams and descriptions for a high-level understanding. The second section covers the deployment architecture and package diagram, detailing how internal subsystems interact. This is followed by a technology selection table, specifying the technologies for each system part. The main part of the document consists of four design artifacts: class diagrams, data flow diagrams, sequence diagrams, and activity diagrams, along with pseudocode, offering an in-depth understanding of system components and functionalities. The data design section includes an Enhanced Entity-Relationship (EER) diagram and a data dictionary, explaining database component relationships and constraints. The human interface design is then detailed, focusing on user interaction and experience. The document concludes with a requirements matrix and resource estimates, aligning system requirements with resource management.

1.5. Constraints

Briefly describe any restrictions, limitations or constraints that impact the design or implementation

• The systems’ frontend shall be developed using React library.

• The systems’ backend shall be developed using bun.js

• The system’s AI shall be developed using python.

• The system shall be developed within 10 months

• The system shall be developed following waterfall method.

• The system shall support English language.

2. System Overview

- Briefly introduce the system context and design, and discuss the background to the project. Also add revised [Use case diagram]

The Presenter is an innovative software solution designed to change the way students, educators and content consumers interact with presentations and notes. The system offers

a user-friendly platform to enhance the experience of users with presentation files automatically converted into engaging video content.

Utilizing a combination of advanced AI technologies, Large Language Models, Text-tospeech, Natural language processing and a proprietary Machine Language model, “The Presenter” interprets the content of PowerPoint slides and automatically converts them into a video format with the appropriate narrative. The AI-driven approach ensures each request is catered to the users specific needs with an automatic and speedy procedure.

The system architecture uses a tech stack consisting of MangoDB for the database management, bun.js for server-side operations, React.js for the front-end user interface, and Python for handling backend logic and AI integration.

Background

The increase in demand for engaging educational content in digital formats was one of the core reasons behind the creation of “The Presenter”. The project aims to revolutionize the way users interact with presentations and notes. “The Presenter” effortlessly converts static presentation materials into captivating video content curated to each user’s unique specifications. With the increase in ease of access to AI tools and the ever-increasing refinement of consumer ready applications utilizing Artificial Intelligence, the creation of our project has become more viable and essential.

The project is developed over an 8-month period using Waterfall methodology. The project is designed in mind for a userbase ranging over all levels of technical proficiency ensuring ease of use with no compromise of quality in the output. However, the primary target audience are Students, Educators and Teachers, Content Creators, Business Professionals and General Users with Presentation Needs.

3. System Architecture

3.1. Deployment Architectural Description

The deployment diagram illustrates the structure of a web-based platform designed to transform PowerPoint presentations into video presentations using AI technologies. It consists of a Database Server for storing user data and generated videos, a Web Server running a JavaScript application (Bun.js) for handling requests and serving content, and a Client (Frontend) built with React.js for user interactions. An External API is integrated to leverage AI capabilities like Large Language Models, while an AWS Server hosts an API Gateway that will manage video creation system, where the AI processing occurs. The diagram reflects a system designed for ease of use, scalability, and efficient processing of multimedia content.

3.2. Alternative Deployment Architectural Design Description

3.2.1. First Alternative

The first alternative deployment diagram presents a microservicesoriented architecture for a platform that transforms PowerPoint presentations into video presentations. The client side is a React.js application that communicates with an Node.js Server. The server-side, hosted on AWS, employs an API Gateway built with bun.js to direct requests to various specialized VM microservices. These include a GPT-3 Turbo for scripting, a local TTS model for speech generation, image processing for media handling, and user and content management services for interaction and data operations, along with a video rendering service for final output creation. Data is managed by a separate Database Server featuring Bunny Storage for videos and MongoDB for structured data. This architecture facilitates independent scaling and updating of services, embodying the flexibility and resilience of microservices designs, although it's more complex and may be considered for later stages rather than initial deployment.

This last alternative deployment diagram is a classic three-tier web application structure. The client tier, composed of PCs and mobile devices, interfaces with the system via HTTP/HTTPS requests. These requests are managed by a web server that either delivers content directly to the client or makes requests to the application server. The application server handles the core logic, including communication with specialized AI services to create the final product. Data storage and retrieval are handled by a dedicated database server.

3.2.2. Second Alternative

3.3. Design Rationale

The main deployment diagram for the Presenter platform offers a well-rounded approach, focusing on user experience, scalability, and efficient multimedia processing. Central to this design are the Database Server for data storage, a Bun.js Web Server for handling requests and content delivery, and a React.js Client for user interactions. Additionally, the integration of an External AI API and an AWS-hosted API Gateway for video processing highlights the system's advanced AI capabilities and scalability. This design is chosen for its balance between functionality, ease of use, and potential for future expansion, making it a solid foundation for the platform.

In contrast, the alternative designs present different trade-offs. The first alternative suggests a microservices-oriented architecture, offering flexibility and independent service scaling, but its complexity might be more suited for later developmental stages. The second alternative, a classic three-tier web application, provides a straightforward and traditional structure with clear separations of concerns. However, it may lack the advanced AI integration and scalability of the main design. Therefore, the main deployment diagram is preferred for its comprehensive approach, aligning with the platform's goals of user-friendly, scalable, and AI-enhanced video creation.

3.4. Component Decomposition Description

In the presenter application there is 2 packages, one is responsible for creating the video and other one for managing all the AI based processes.

So, video creating system is composition of 5 classes.

Video creator: is the main class which receiving the requests and uses other classes services

Converter: responsible of converting slides to PNG and PDF.

Render: responsible of creating a clip from slides and syn the audio on it

TTS_model: is the module responsible to convert textual script to voiceovers

Slide: is an object that store all the slides attributes.

The AI system is composition of 9 classes

Façade interface: is the system interface that communicate with other systems

layout_classfier: its main objective is to use an AI model to specify the slide layout class

parser: the parser is responsible to extract text information from the slides uses his 3 child classes (regular_parse, OCR_parser, image_processor)

regular parse: is designed to parse the known encodings PDF files

OCR_parser: is designed to extract text from images for unknown PDF encoding this class will extract the text from the PNG version

image_processor: after confirming image existence this class will use AI image recognition to describe the image.

prompt_genrator: after information collection done by all classes this class will use a special AI model that create a prompt asking for script based on the collected information.

API_requester: this class is responsible of ChatGPT API communication

Slide: is an object that store all the slides attributes.

4. Technology selection

Provide detailed comparison of the available technology options such as programming language, database, hardware, etc. and justify your selection.

Layer Technology

Frontend React.js

Backend Bun.js

Database

Mongo DB

Bunny video storage

AI developing Python

External API

Chat GPT 4

Source Control GitHub

Justify

• The most used in industry

• Easily turn the website to web app

• Fast development

• Easy to maintain and understand

• High performance

• Easy to learn

• Fully compatible with Node.js packages

• High performance

High community support

• Cheap

• Scalable

• Easy to learn

• Plenty of AI libraries

• High community support for AI

• can customize it to suit our project easily

• Make development seamless between members

5. Component Design

5.1. System Components: [Detailed Class Diagram]

5.3.13. Creating the video system DFD-12

5.4. Pseudo code and Activity Diagram

Detailed pseudo code and activity diagram for the non-trivial methods; {Set, Get, default constructors, etc.} are considered trivial methods where pseudo code is not required.

5.4.1. AI model:

5.4.1.1. Classify

5.4.1.2. Parse

5.4.1.3. Prompt generating

5.4.2. Video creating:

5.4.2.1. Convert

5.4.2.2. Split

5.4.2.3. Json wrapper

5.4.2.4. Generate audio

5.4.2.5. Render video

5.4.3. Backend:

5.4.3.1. Request creating video

5.4.3.2. Sign in

5.4.3.3. Create account

5.4.3.4. Regeneration

5.4.3.5. Store video to DB

5.4.3.6. Retrieve video

5.4.3.7. Update account information

5.4.3.8. Verify account

6. Data Design

6.1. Database Description

6.2. Data Dictionary

6.2.1. Person

Attributes Relations Types Description of the attributes

id PK Integer Unique User ID

String unique Email

Password String Hashed password

createdAt Timestamp Creating account time stamp

updatedAt Timestamp Last update time stamp

6.2.2. User

Attributes Relations Types Description of the attributes

id FK Integer Unique User ID

Verified Integer Registration status of the user, where 1 indicate verified email, and 0 indicate not verified email, and -1 indicate a blocked email

6.2.3. Admin

Attributes Relations Types Description of the attributes

id FK Integer Unique User ID

6.2.4. Playlist

Attributes Relations Types Description of the attributes id

URL String The URL lead to the playlist

visibility

createdAt

updatedAt

Boolean Is it public or privet

Timestamp Creating playlist time stamp

Timestamp Last update time stamp

Integer Unique playlist ID User_id

Integer The owner user ID name String The playlist name

6.2.5. Video

Relations Types Description of the attributes id PK Integer Unique video ID playlist_id FK Integer The including playlist ID name String The video name URL String The URL lead to the video visibility Boolean Is it public or privet createdAt Timestamp Creating video time stamp updatedAt Timestamp Last update time stamp 7. Human Interface Design 7.1. Screen Images 7.1.1. Home page

Attributes

7.1.2. Play video page

7.1.3. Library page

7.1.4. Playlist view

7.1.5. Login

7.1.6. Sign up

7.1.7. Create video

7.1.8. Admin views

7.2. Report Formats 7.2.1. Admin reports

8. Requirements Matrix

9. Resource Estimates

9.1. RAM (Random Access Memory):

Server: Estimated 16-32 GB, to efficiently handle simultaneous requests, data processing, and AI operations.

Client: 4-8 GB, sufficient for smooth running of the client application and handling multimedia content.

9.2. Storage:

Database Server: Approximately 1-2 TB, scalable based on user data and video content growth. Includes space for backups and logs.

Web Server: 500 GB - 1 TB, for application data, logs, and temporary processing files.

Client: Minimal, primarily for temporary files and cache.

9.3. CPU (Central Processing Unit):

Server: Multi-core (8 cores or more) processors, to efficiently handle multiple threads for AI processing and data handling.

Client: Standard modern CPU, capable of handling multimedia content and basic computing tasks.

9.4. Bandwidth:

High bandwidth (1 Gbps or more) is essential for seamless data transfer, especially for video uploading and downloading, and API communications.

9.5. Graphics Processing Unit (GPU):

For servers handling AI and video processing, a high-performance GPU is recommended for accelerated computing tasks.

9.6. Power Consumption:

To be estimated based on server specifications and usage patterns. Efficient power use is crucial for sustainability.

9.7. Network Resources:

Robust network infrastructure to ensure low latency and high throughput, especially for cloud-based components and external API communications.

10. Definitions

Acronyms, and Abbreviations - provide definitions of all terms, acronyms and abbreviations needed for the SDD.