
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
![]()

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
K. Anusha1 , S. Santhi Rupa2 , G. Kiranmayi3, G. Gayatri Akshitha4, B. Jagadeeswari Nagavalli5, P.
Vasantha 6
1Student, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
2Assistant Professor, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
3Student, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
4Student, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
5Student, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
6Student, Department of CST, Sri Vasavi Engineering College(A), Pedatadepalli, Tadepalligudem-534101
Abstract - Legal research is fundamental to the judicial process, but remains slow, labor-intensive, and inefficient. Analyzing vast legal documents, identifying precedents, and predicting case outcomes require extensive manual effort, delaying decision-making. To address these challenges, an AI-driven legal research system has been developed to automate precedent analysis, case prediction, and argument evaluation. The system is trained on a dataset containing 46,000 Supreme Court cases, enabling intelligent legal analysis. Sentence Transformers with FAISS facilitate semantic case retrieval, allowing users to locate relevant legal precedents efficiently. LegalBERT, a specialized NLP model, is utilized for predicting case outcomes, leveraging past judicial decisions to enhance accuracy. Additionally, a large language model provides argument scoring, evaluating the strength of legal reasoning on a 0–100 scale and offering insights for refinement. By automating key aspects of legal research, this system enhances efficiency, reduces errors, and provides data-driven decision support. Future work includes further integrating real-time legal databases and expanding applicability across diverse legal domains, reinforcing AI’s transformative role in the legal field.
Keywords - Legal AI, NLP, Machine Learning, Case Outcome Prediction, Precedent Analysis, Argument Scoring
1. INTRODUCTION
The practice of law depends significantly on thorough research in the judicial process, shaping decisions, strengtheningcourtroomarguments,andensuringjustice. However, traditional research methods are slow, laborintensive, and inefficient. Lawyers must sift through extensive legal databases, analyze complex rulings, and extract key insights a process that consumes significant timeandeffort.Manualresearchalsoincreasestherisk of errors,suchasmissing relevantprecedentsor misjudging their importance, which can weaken legal strategies and delaycaseresolutions.
As court decisions accumulate, even experienced professionals struggle to manage the vast amount of legal
information. Predicting case outcomes remains intuitive mainly, with no concrete evidence-based support. Argument construction requires substantial effort, yet lawyers lack reliable tools to assess their effectiveness before presenting them in court. These inefficiencies strain the judicial system, increasing costs and slowing caseprogress.
ThisprojecttacklesthesechallengesbyintroducinganAIpowered legal assistant designed to revolutionize the researchprocess.Itreducestheworkloadforattorneysby rapidlyidentifyingrelevantcourtcasesandextractingkey insights, significantly cutting down research time. Unlike traditional methods, it leverages data to predict case outcomeswithgreateraccuracy,replacingguessworkwith informed, evidence-based forecasts. Additionally, it evaluates arguments by assigning scores and providing recommendations for improvement. By minimizing uncertaintiesanddeliveringreliableinsightsinstantly,this systemtransformslegalresearchintoamoreefficientand data-drivenprocess,ultimatelyenhancinglegalpractice.
This project aims to streamline legal research by automating case retrieval, analyzing past rulings, and predicting case outcomes. By reducing manual effort, it enhances the efficiency and accuracy of legal decisionmaking. The system allows legal professionals to quickly access relevant precedents, extract crucial insights, and evaluateargumentstrength.Additionally,itprovidesdatadriven predictions to help in legal strategy formulation. The goal is to improve research accuracy, minimize timeconsuming processes, and support lawyers, researchers, andlegalscholarsinmakinginformeddecisions.
1. Automated Case Retrieval: Enables quick access to relevantcaseprecedents,reducingmanualsearchefforts.
2. Legal Insight Extraction: Extracts crucial legal details such as rulings, arguments, and court decisions from precedentcases.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
3. Case Outcome Prediction: Utilizes historical judgments to predict possible case outcomes, aiding in legal strategy formulation.
4 Argument Evaluation: Assesses user-submitted arguments,providing scoresandconstructive suggestions forimprovement.
5. User-Friendly Interface: Features an intuitive webbasedplatformthatallowsseamlesscasesearching,result visualization,andreportgeneration,ensuringaccessibility andeaseofuseforlegalprofessionals.
The most important step in the software development process is the literature review. This will describe some preliminary research that was carried out by several authorsonthisappropriatework,andwearegoingtotake some important articles into consideration and further extendourwork.
Shreyas Shetty, Aditya Mhatre, Ashok Singh, and Prof. Manasi Kulkarni present the “Extraction of Legal Documents for Assistance to Lawyers” . This system leverages TF-IDF and cosine similarity to extract and retrieve relevant legal documents based on keyword queries, aiding lawyers by simplifying the analysis of unstructured case texts. It utilizes a dataset of 180 case documents sourced from IndianKanoon.org, a relatively smallcorpusthatsupportsitsfocusonretrievalefficiency. A key advantage is its ability to process large document corpora quickly,though it maystruggle withgenericlegal terms that reduce retrieval precision, highlighting its aim toenhanceproductivityinlegalresearch.
Kavita Shirsat, Aditya Keni, Pooja Chavan, and Manasi Gosavi present the “Legal Judgement Prediction System.” This system applies Naive Bayes, Random Forest, and SVM to predict applicable IPC sections for murdercasesusingcasefacttexts. Itreliesonadatasetof 100 Supreme Court murder cases from India. It benefits legal professionals and non-experts by providing insights into penalties and provisions with a web interface that enhancesusability.However,itsperformanceislimitedby a small number of 100 cases and manual annotation efforts, suggesting challenges in handling broader, unprocessedlegaldata.
Prof. Priyanka Bhilare, Neha Parab, Namrata Soni, and Bhakti Thakur present the “Predicting Outcome of Judicial Cases and Analysis using Machine Learning.” This system utilizes multiple machine learning models Logistic Regression, K-Nearest Neighbors, Naive Bayes, Random Forest, and Support Vector Machine to predict judicial case outcomes based on textual case descriptions, achieving the highest accuracy of 78% with SVM. Its strengthliesinaidingciviliansandlawyersbyforecasting
win/loss outcomes, reducing legal uncertainties, though its reliance on limited corporate case data and manual feature extraction may constrain its scalability and adaptabilitytodiverselegalcontexts.
Md. Shahin Kabir and Mohammad Nazmul Alam present “The Role of AI Technology for Legal Research and Decision Making.” ThisstudyexploresAI’simpacton legal research and decision-making, leveraging machine learning and NLP tools like ROSS Intelligence to enhance efficiency in tasks such as document analysis and predictive analytics. Its strength lies in highlighting AI’s potential to streamline legal processes, yet it notes challenges like algorithmic bias and the need for human oversight,emphasizingethicalconsiderations
Suma R., Nayan Prakash Lal, Nethravathi K., Poreddy Sasi Vardhan Reddy, and N.V. Ajay Kumar present the “Review and Approaches to Develop Legal Assistance for Lawyers and Legal Professionals using Artificial Intelligence and Machine Learning.” Thisstudyreviews AI and ML techniques like NLP, CNN, KNN, and decision treestoassistlawyerswithcasepreparation,achievingup to85%accuracyintextclassification.ItreferencesIndian judicial data but uses a modest dataset of 250 legal documents for experimentation, reflecting a constrained scope. Its value lies in reducing preparation time and improving case references, yet the small dataset and lack of mainstream adoption limit its practical impact on India’svastlegalsystem.
Traditional legal research systems primarily rely on keyword-based searches, often leading to irrelevant or incomplete results.Legal professionalsmustmanuallysift through vast legal databases, making the process timeconsuming and inefficient. Additionally, existing systems lack intelligent comparison mechanisms for identifying relevant precedents based on contextual similarities. Moreover, there is no automated method to assess the strength of legal arguments, and predictive analysis of case outcomes is largely absent. This results in a lack of strategic insights, making legal decision-making more challengingandpronetohumanbias.
Keyword-Based Search: Mostlegalresearchsystemsrely on basic keyword searches, often retrieving irrelevant or incompleteresults.
Lack of Intelligent Case Comparison: There is no AIdriven mechanism to compare legal cases based on contextual meaning, making it difficult to find truly relevantprecedents.
No Argument Strength Evaluation: There are no automatedtoolstoassessthequalityandstrengthoflegal arguments, leaving legal professionals without clear guidanceforimprovingtheircasestrategies.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
Manual Report Generation: Lawyers and researchers must manually compile case summaries and arguments instead of generating them through an AI-assisted interface.
The proposed system is an AI-driven legal research and case prediction tool that automates case search, legal documentanalysis,andoutcomepredictionusingNLPand machine learning. It leverages LegalBERT for extracting key legal insights, identifying relevant precedents, and assessing argument strength through case embeddings and evidence weighting. By integrating predictive analytics, the system provides data-driven case outcome estimations, reducing manual effort and improving researchefficiency.Additionally,itfeaturesauser-friendly interface for seamless navigation, structured legal report generation, and strategic decision-making support, enhancingtheaccuracyandspeedoflegalanalysis.
Automated Case Search: Leverages LegalBERT and case embeddings to retrieve relevant precedents based on contextualsimilarities,reducingirrelevantresults.
Legal Insight Extraction:UsesNLPtoidentifykeydetails like arguments, rulings, and evidence, minimizing manual analysiseffort.
Outcome Prediction:Providesdata-drivenestimationsof caseoutcomesusingpredictiveanalytics.
Argument Strength Assessment: Evaluates legal arguments with scores and feedback, offering guidance to improvecasestrategies.
User-Friendly Interface: Features seamless navigation and structured report generation, streamlining legal researchandpresentation.
4.1 System Architecture:
Thesystemarchitectureisdesignedasamodularpipeline to streamline legal research, case prediction, and argument analysis. It consists of five main layers, each responsible for a specific function in the end-to-end processingoflegaldocumentsandqueries:

1. Data Acquisition & Preprocessing Layer
This layer collects legal documents in PDF format and extracts their content using text extraction tools. It removes unnecessary symbols and formatting, then splits longdocumentsintomanageablechunkswhilepreserving contextforaccuratemodelinput.
2. Embedding & Indexing Layer
Preprocessed text is converted into dense semantic embeddings using Sentence Transformers. These embeddings are indexed with FAISS to enable fast and accurate retrieval of similar past cases based on contextualsimilarityratherthanjustkeywords.
3. Prediction & Scoring Layer
Using models like LegalBERT, this layer predicts case outcomes (Win or Loss) by analyzing past rulings. It also scores user-submitted legal arguments on factors like clarity and legal strength, offering structured insights for strategyimprovement.
4. Application Logic Layer
This component manages the workflow between user inputs and backend processes. It ensures that queries, predictions, and scoring results are processed correctly androutedbetweenthemodelsandtheinterface.
5. User Interface Layer
Built with Streamlit, the interface allows users to search cases, predict outcomes, and evaluate arguments through a simple and accessible layout. It also supports exporting reportsforlegaldocumentationandpresentation.
5. METHODOLOGY
5.1 Data Collection and Preprocessing
The system utilizes a comprehensive dataset of approximately 46,000 Supreme Court case documents sourced from the Hugging Face repository. Each case file

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
contains detailed information, including facts, rulings, arguments,andcitations.
Data preprocessing involves several steps:
Text Extraction: Legal documents in PDF format are parsed using PyMuPDF, ensuring accurate extraction of legal texts while maintaining structural integrity such as headings,sections,andcitations.
Noise Removal: Redundant symbols, non-textual elements, and formatting inconsistencies are cleaned to ensuresmoothprocessing.
Chunking: Dueto theinherentlengthoflegal documents, the extracted text is divided into manageable overlapping chunks (typically 512 tokens) to fit within the input size limitations of transformer models without losing contextualcontinuity.
Tokenization and Lowercasing: Standard NLP preprocessing techniques like tokenization and lowercasingareappliedtomakethedatacompatiblewith transformerinputlayers.
5.2 Sentence Transformer with FAISS
To enable accurate and efficient retrieval of legally relevant documents, the system integrates Sentence TransformerswithFacebookAISimilaritySearch(FAISS).
Traditional keyword-based search methods often fail to capture the complex semantic relationships present in legal texts. To address this limitation, Sentence Transformers are employed to encode legal documents into high-dimensional dense embeddings, where semantically similar cases are located closer in the vector spaceregardlessofkeywordoverlaps.
The all-MiniLM-L6-v2 model is used for embedding generation, selected for its balance between computational efficiency and semantic expressiveness. Eachcasedocumentistransformedintoafixed-sizevector representing its contextual meaning rather than just surfacetextfeatures.
These embeddings are then indexed using FAISS, an optimized library for high-speed approximate nearest neighbor (ANN) search. FAISS structures the embeddings into an efficient index, allowing rapid similarity queries evenoverlarge-scaledatasets.
During retrieval, a user query is first embedded into the same vector space. FAISS then identifies and returns the most semantically relevant cases based on cosine similarity scores between the query embedding and the indexeddocumentembeddings.

Table 5.2 PerformanceMetricsforSimilarCaseRetrieval
LegalBERTisapre-trainedtransformermodelspecifically designed for legal text processing. It has been fine-tuned on a dataset of legal cases to classify case outcomes into WinorLosscategories.Duetotheextensivelengthoflegal documents,thesystememploystextchunking,embedding generation, classification, and aggregation techniques to determine the final case result accurately. The methodology for case outcome prediction using LegalBERTfollowsthesekeysteps:
Legal documents are segmented into 512-token overlapping chunks to fit within the model’s constraints whilepreservingcontext.Thisensuresthatnocruciallegal informationislost.
Each chunk is converted into a contextual embedding usingLegalBERT’s transformer layers, capturingsemantic meaningforbetterclassification.

ThemodelassignsaWinorLosslabeltoeachchunkusing the Softmax function, which computes probability scores andselectsthehighestone.


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072
LegalBERT is trained with cross-entropy loss, improving accuracy by penalizing incorrect classifications and refiningpredictions.

Final Case Outcome: Majority Voting
The final decision is determined using majority voting, selecting the most frequent prediction among chunks to ensureconsistency

To advance the evaluation of legal arguments, the system integrates Mistral-Saba-24B, a state-of-the-art large language model (LLM) optimized for complex textual reasoningandlegalanalysis.
Mistral-Saba-24Bisemployedtoautomatetheassessment of user-submitted legal arguments, generating both a quantitative strength score and qualitative improvement feedback. This functionality assists legal professionals in objectively evaluating and refining their arguments prior tolitigationoracademicsubmission.
Each argument is systematically analysed across three criticaldimensions:
Clarity: Assessment of linguistic coherence, logicalflow,andstructuralprecision.
Legal Grounding: Evaluation of the argument's alignment with statutory frameworks, precedent usage,andjurisprudentialsupport.
Persuasiveness: Measurement of logical consistency, evidentiary integration, and rhetoricaleffectiveness.
Based on these dimensions, the model generates a strength score (0–100) and actionable recommendations for improvement. Optimized embedding pipelines ensure real-time evaluation without compromising depth. Mistral-Saba-24B enhances legal reasoning and supports moreeffectivelegalstrategyformulation.
5.5
The graph below illustrates the training and validation loss trends for the LegalBERT model during the case outcome prediction task. A consistent decrease in both
losses over the epochs indicates effective learning and modelconvergence.

Fig - 5.5: TrainingvsValidationLossacrossdifferent epochsfortheOutcomePredictionModel
6.1
The homepage acts as the central hub of the Legal Case Analysis System, providing users with an overview of its core functionalities. It offers quick access to key sections like Case Prediction and Search Cases through a streamlined navigation sidebar. Designed with Streamlit, the intuitive interface ensures seamless user engagement and efficient interaction. This central hub aligns with the project’s aim to enhance accessibility and usability for legalprofessionals.

Fig - 6.1: Displaysthewelcomescreenwithanoverviewof featuresforlegalprofessionals.
The Case Prediction page enables users to input case details, like a slip-and-fall scenario, and displays a "Win" prediction with a progress bar, powered by Legal-BERT for accuracy. Provides a detailed explanation, listing key arguments for both parties, such as negligence and causation, alongside relevant legal precedents, using LangChainforstructuredinsights.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

- 6.2: Showstheinputfieldforcasedetailsandthe predictedoutcomewithaprogressbarandanalysis
The Search Cases page serves as an intuitive entry point for legal professionals and researchers to explore a comprehensive repository of past legal cases. Through a simple text input field, users can submit natural language queries or keywords related to specific legal matters, parties, or case types. The system leverages advanced retrieval mechanisms to return the most relevant precedents based on semantic similarity and contextual understanding. This significantly reduces research time and ensures high-quality results. Once a query is submitted,theinterfacedisplaysaconfirmationmessage

- 6.3: Illustratesthesearchqueryinputfieldfor findingsimilarcases.
This functionality enables legal professionals to select a specificcasefromadropdownmenuonthe"CaseDetails" page. The dropdown lists cases retrieved from a prior search,suchas"Case1-SupremeCourtofIndia,"allowing users to choose a case for in-depth review. This feature simplifies the process of accessing relevant case information by providing a user-friendly interface to navigate through search results, ensuring quick access to detailedlegaldataforanalysis.

Fig - 6.4: Showsthedropdownmenuforselectingacase toviewitsdetails.
The system further enriches the research experience by offering a comprehensive legal analysis of each selected case. This includes structured insights such as witness testimony summaries, an evidence impact score that quantifies the strength and relevance of submitted evidence,andcitationsofapplicablelawsandregulations. Additionally, the platform highlights the court’s decision, includingtherulingandthepartyinfavor.A'Downloadas DOCX'featureallowsuserstoexportthefullanalysis

Fig - 6.5: Displaystheinitialbreakdownofaselectedcase, includingpartiesandlegalissues.
The "Argument Strength Analysis" functionality allows legal professionals to input legal arguments and receive a quantified strength score ranging from 0 to 100. This score is computed based on three core criteria: clarity, legalgrounding,andpersuasiveness.Thesystemprovides detailed, component-wise feedback along with actionable suggestionsfor enhancing theargument suchasspecific evidence,citingrelevantlegalprecedents.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

Fig - 6.6: Showstheinputfieldforargumentsandtheir strengthscoresforPlaintiffandDefendant
In conclusion, this AI-driven system for legal precedent analysis and case outcome prediction represents a transformative advancement in the legal industry. By harnessing the power of advanced machine learning and natural language processing, it offers a reliable, scalable solution for automating traditionally time-intensive legal research tasks. The system effectively analyses historical case data, predicts outcomes with high accuracy, and retrieves semantically relevant precedents, thereby reducing reliance on manual effort and enhancing the strategiccapabilitiesoflegalprofessionals.
Furthermore, the integrated argument strength analysis, which scores user-submitted arguments on clarity, legal grounding, and persuasiveness, provides valuable feedback to improve legal reasoning. This feature encourages legal professionals to refine their arguments, helping them present stronger, more compelling cases. The system also offers actionable suggestions for improvement, which guide users in enhancing their arguments based on specific legal criteria. Additionally, it allowsforamorestructuredevaluationoflegalreasoning, aiding in more confident and persuasive courtroom presentations.
By automating repetitive processes and offering intelligent, data-driven insights, the system minimizes human error, accelerates legal workflows, and empowers users to make more informed decisions. Beyond immediate practical benefits, this project underscores the immense potential of AI in revolutionizing legal practice, setting the foundation for future innovations that will redefine how legal professionals interact with data, formulate strategies, and deliver justice efficiently and equitablyacrossdiversejurisdictions.
Looking ahead, the system offers promising opportunities for expansion. It can be adapted to domains beyond law, such as medical research or finance, for document
retrieval and outcome prediction. Integration with realtime legal databases will ensure access to up-to-date case law, enhancing prediction accuracy. Jurisdiction-specific customization can tailor the system to diverse legal systems, while multi-language support will enable processing of cases in various languages, broadening its globalapplicability.
The authors would like to thank the faculty and project guide from Sri Vasavi Engineering College for their guidanceandcontinuoussupportthroughoutthecourseof this project. Their valuable feedback and expertise significantly contributed to the successful completion of thiswork.
[1] Drashti Shah1, Jai Vasi2, Tanik Gandhi3 published a paper titled " AI & ML Based Legal Assistant " in the International Research Journal of Engineering and Technology(IRJET)inJuly2024.
[2]Md.ShahinKabir1,MohammadNazmu,Alam2,Roleof AI Technology for Legal Research and Decision Making, Volume:10Issue:07|July2023
[3] Masha Medvedeva, Martijin Wieling and Michel Vols, Rethinking the field of automatic prediction of court decisions,pp.195-212,2023.
[4] Lex Machina, "Lex Machina: Legal analytics and data driveninsights,"Lexmachina.com,2023.
[5] Min Zheng, Bo Liu and Le Sun, Legal Text Analysis Based on Automated Recommendation of Legal Provisions,vol.2022.
[6]G.SukanyaandJPriyadarshini,"Ameta-analysisof the attention models on legal judgment prediction system", Int.J.Adv.Comput.Sci.Appl.,vol.12,no.2,2021.
[7] G. Sukanya and J Priyadarshini, "A meta-analysis of attention models on legal judgment prediction system", Int.J.Adv.Comput.Sci.Appl.,vol.12,no.2,2021.
[8] C. Xiao, X. Hu, Z. Liu, C. Tu and M. Sun, "Lawformer: A pretrained language model for Chinese legal long documents",AIOpen,vol.2,pp.79-84,2021.
[9] Aashka Trivedi, Anya Trivedi, Sourabh Varshney, Vidhey Joshipura, Rupa Mehta, Jenish Dhanani, “Extracted SummaryBasedRecommendationSystemforIndianLegal Documents”,11thInternationalConferenceonComputing, Communication, and Networking Technologies (ICCCNT), 2020.
[10] Tirath Prasad Sahu, Rafe Athar Shaikh and Veena Anand, Forecasting Results of Court Cases based on Legal VariablesUsingClassifiers,January2020.