Skip to main content

Using OCCRP’s Aleph for Dark Web Data Analysis

Page 1

Using OCCRP’s Aleph for Dark Web Data Analysis

Land Acknowledgment

Humber College is located within the traditional and treaty lands of the Mississaugas of the Credit. Known as Adoobiigok [A-doe-bee-goke], the “Place of the Alders” in Michi Saagiig [Mi-Chee Saw-Geeg] language, the region is uniquely situated along Humber River Watershed, which historically provided an integral connection for Anishinaabe [Ah-nish-nah-bay], Haudenosaunee [Hoeden-no-shownee], and Wendat [Wine-Dot] peoples between the Ontario Lakeshore and the Lake Simcoe/Georgian Bay regions. Now home to people of numerous nations, Adoobiigok continues to provide a vital source of interconnection for all.

Listen to an audio recording of Humber’s Land Acknowledgment (humber.ca/indigenous/truth-reconciliation-audio-video)

Contents Land Acknowledgement 2 Introduction Finding the Right Backend Solution: Why We Chose OCCRP’s Aleph ..................... 7 Positives of Aleph ............................................................................................................................. 8 Considerations of Aleph 8 Conclusion 9 About Humber College’s Office of Research and Innovation 9 About StoryLab 9 About the Toronto Star ................................................................................................................ 10 Acknowledgements ....................................................................................................................... 10 Installation Guide Creating Your Own Aleph Instance 11 Installation ......................................................................................................................................... 12 Prerequisites 12 Setting up a Portable Volume 12 Installing Docker .................................................................................................................14 Configuring Docker ............................................................................................................14 Configuring Aleph’s Docker Environment 15 Importing Data 16
Aleph CLI ............................................................................................................................... 16 Web UI 17 Setting up Keycloak 17 Maintaining Your Aleph Instance 20 Updating Aleph .................................................................................................................. 20 Additional Notes 21 Ingests Getting Stuck 21 Appendix Appendix 1: A conversation with Distributed Denial of Secrets (DDoSecrets) . 24 About DDoSecrets 24 Appendix 2: Basic Malware Awareness 26 Introduction ........................................................................................................................ 26 Understanding Malware 26 Symptoms of Malware Can Be Detected 27 Using VirusTotal 29 Preventive Measures 31 How To Set Up Test Environment For Testing Malware In Windows 11 32 Conclusion 35 Contact

Introduction

This publication was made possible through an Applied Research and Technology Partnership Grant undertaken by The Humber College Institute of Technology and Advanced Learning, and Toronto Star Newspapers Limited.

The grant proposal, “Development of a Data Server Framework,” was undertaken with the purpose of creating a relatively secure, costeffective, and collaborative solution for analyzing large troves of data situated on the un-indexed internet, colloquially referred to as the “Dark Web.”

While journalists are no strangers to working with leaked data (see: Iraq War Logs , Panama Papers , The Troika Laundromat , et al.) dark web data dumps can pose unique ethical and security concerns. We encourage reporters and researchers interested in exploring leaked data to fulsomely engage with the relevant ethical, legal, and editorial authorities within their organizations before beginning an investigation focused on the Dark Web.

Information leaked to the Dark Web is often procured illegally as the result of a targeted cyber-attack against a government, corporation, or institution. The perpetrators run the gamut from state-sponsored actors, ransomware groups, or any combination in-between.

As mentioned previously, working with leaked information is a known practice within journalism, dating back to when information was exchanged using manila envelopes rather than digital packets. Then as now, journalists must weigh the ethical implications of reporting on or publishing leaked data.

The Dark Web adds more complexities to consider. Leaked data dumps can be chaotic, disorganized tranches of information, often rife with personally identifiable information (PII). This means that journalists must take extra care to protect the information of private citizens.

CONTENTS 5 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Ethical Considerations for Working With Leaked Data

• is this information in the public interest?

• Who benefits from the release of this information? Who suffers?

• As a steward of this data, am I storing it responsibly?

About OCCRP’s Aleph

The Organized Crime and Corruption Reporting Project (OCCRP) is a non-profit investigative newsroom founded in 2007. Its large, decentralized newsroom focuses on reporting on and minimizing the threat of crime and corruption around the world. As part of this mission, the OCCRP develops novel technological solutions to aid investigations.

Aleph is an open-source data management tool created to manage and make sense of the enormous, varied caches of documents that are part and parcel of crime and corruption investigations. One of Aleph ’s greatest strengths is serving as a relational database. The OCCRP maintains a massive store of data comprised of opensource and leaked files that journalists can use to find leads, map out complex relationships, and even ingest and cross-reference their own files against Aleph ’s database.

CONTENTS 6 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Finding the Right Backend Solution: Why

We Chose OCCRP’s Aleph

We knew our solution for the Toronto Star had to hit several key features:

• Allow journalists with minimal cybersecurity training to safely view documents that may contain malware

• Allow for secure collaboration between multiple stakeholders

• Be relatively inexpensive

We initially envisioned setting up a physical server that would run a virtual machine, with remote access via Virtual Private Network (VPN). Students would log onto the virtual machine, clean the documents using Python scripts, and then upload them to a “safe” folder for download by the Toronto Star.

Implementation was another story. Accessing a Windows virtual machine via remote VPN was slow and unreliable, and that was before any document cleaning was attempted. When I travelled the 2023 National Institute of Computer-Assisted Reporting Conference (NICAR) in Nashville, TN in March 2023, we were running out of ideas.

Then I attended a session on OCCRP’s Aleph lead by their data editor, Jan Strozyk, and the pieces started falling into place.

While most journalists are drawn to Aleph for its relational capabilities, Aleph is also a powerful document viewer, capable of ingesting and presenting many different file types, from spreadsheets, PDFs and images, through to audio and video.

This capability inadvertently made it perfect for our use case. As Strozyk describes:

“If you use Aleph as a document viewer to look at, for example, emails or .docx files, the way that the malware for those specific files is written is to execute when you open it with the intended software.

“So when you open it with word and you activate the macros, then the malware is going to do something, right? But if you open it in Aleph—which basically converts it to an image and then just shows you the [file]—that just doesn’t work.”

CONTENTS 7 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Positives of Aleph

Aleph allowed us to achieve the parameters we had set forth at the onset of the grant.

By uploading our Dark Web documents into a custom instance of Aleph, the files were able to be safely viewed and analyzed by your average journalist, with no virtual machine or data-cleaning necessary.

Collaborators could share links to specific files within our Aleph instance, but only those granted authorization to the server could access them. We also added a two-factor authentication layer using the open-source software service Keycloak to limit the amount of people who could theoretically have access to the data.

And finally, data storage costs scaled with the amount of data and compute power required, with 1 terabyte of storage costing about several hundred dollars a month (CAD) to host. Not necessarily cheap, but not exorbitantly expensive, either.

Considerations of Aleph

As an open-source platform under constant development, Aleph is a work in progress. Ingesting data, particularly larger datasets, could be finicky, requiring us to manually tweak ingest parameters to skip problematic files that would otherwise hang up the system for hours. The user experience could use some refinement, and bugs are possible (such as one that deleted file previews when you added them to Lists).

Thankfully, the OCCRP Aleph development team is extremely responsive and helpful. They maintain an invite-only Slack channel responding to users as well as a Discourse group for reporting and tracking bugs.

CONTENTS 8 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Conclusion

Cyberattacks are a growing challenge in an increasingly online world, posing threats to government and corporate institutions all the way down to private citizens.

Navigating data leaks is fraught with both technical hurdles and ethical considerations. Journalists must become adept at navigating these murky waters if they are to fulfill their obligation to keep their readers informed.

With the completion of this ARTP-1 Grant and resulting documentation, Humber College’s StoryLab aims to provide an entry point for newsrooms to delve into data leaked onto the Dark Web.

About Humber College’s Office of Research and Innovation

Humber’s Office of Research and Innovation (ORI) provides valuable applied research and innovation services to partner organizations.

ORI’s role is to help faculty and student research teams engage in applied research with industry and community partners, solving specific, real-world problems. As a result, industry and community partners gain solutions and students gain valuable experience.

About StoryLab

Humber College’s StoryLab, was created to bridge the innovation gap between education and industry in the fields of journalism, data science and storytelling writ large.

By collaborating with StoryLab, industry partners can workshop new and ambitious projects by working with Humber students specializing in the area of data-driven storytelling.

StoryLab is a division of Humber Press, the publishing arm of Humber College’s Office of Research and Innovation.

CONTENTS 9 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

About the Toronto Star

Founded in 1892, the Toronto Star has long been Canada’s largest daily newspaper. Now a multi-platform news organization, the Star publishes a newspaper seven days a week in the Greater Toronto Area and publishes ongoing news and information to a global audience on thestar.com on web and mobile applications.

The Star is owned by Toronto Star Newspapers Ltd., a wholly owned subsidiary of Torstar Corporation. Torstar is a broadly based, progressive media company with a long history in daily and community newspapers, book publishing and digital businesses. Built on the foundation of the Toronto Star, Torstar has grown into a diversified media company with a growing portfolio of businesses and investments that reach consumers in Canada, the United States and around the world.

Acknowledgements

Thanks to: Ariana Rydzkowski, Dr. Timothy Wong, Ali Owayid, Janice Saji, Emma, Lorax Horne, Milo Trujillo, Emma Best, Francis Syms, Daniel Schwartz, Jan Strozyk, Alex Ștefănescu, Ginger Grant, Shyama Patel, Daniel Alvarado.

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 10

Installation Guide

Creating Your Own Aleph Instance

This document is targeted towards someone with basic knowledge of unix system administration, a surface level understanding of docker, and ideally an understanding of your organizations identity management system if it has one.

Installation

Prerequisites

If your organization doesn’t have a preferred hosting provider or on-site hosting, DigitalOcean or Hetzner are good, cost effective providers.

• A Linux system capable of running Docker; with at least 16GB of RAM and 4 CPU Cores. Data ingests and indexing will be faster with more RAM and CPU.

• Ability to add or remove volumes / hard drives

• A domain for SSL certificates and public facing IP addresses. A subdomain like aleph.humberstorylab.com will work. you do not need to buy a new domain.

Debian 11 will be used in this guide; it’s a popular, free, long term support and stable Linux distribution. If you chose to use a different Linux distribution be aware of possible differences in your setup.

This document refrains from documenting the basic server setup process. You should probably create yourself an admin user, enable ssh, change your security settings, etc to your liking. If you’re not comfortable with that, the defaults your hosting provider gives are usually good enough.

Setting up a Portable Volume

*if you wish to skip this step, just create a directory /data

To make it easier to migrate data in the event of a server upgrade, or retaining data after deleting a server instance, we’ll set up an external volume. Most hosting providers have different options for creating mountable volumes, so you’ll need to do your own research on how to create one.

After creating and mounting the volume to your VM, it should show up after running lsblk . Name will vary, so ensure you’re using the right disk before formatting. A good way to tell is checking that the volume is of the size you created, and that it has no partitions.

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 12

Format the disk

sudo fdisk /dev/sda

press “n” then spam enter until its done, then press “w” to write the changes

Create a ext4 partition on your new volume and mount it to /data . make sure it automatically mounts on boot by adding it to your fstab.

sudo mkdir /data

sudo mkfs.ext4 /dev/sda1

sudo mount /dev/sda1 /data

echo “UUID=$(lsblk /dev/sda1 -n -o UUID) /data ext4 rw,noatime 0 1” | sudo tee -a/ etc/fstab

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  INSTALLATION 13

Installing Docker

This section closely follows Docker’s own installation guide for Debian

Copped directly from Docker’s own installation guide, this will add the Docker repo to your system

# Add Docker’s official GPG key:

sudo apt-get update

sudo apt-get install ca-certificates curl gnupg

sudo install -m 0755 -d /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o/etc/ apt/keyrings/docker.gpg

sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:

echo \

"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg]

https://download.docker.com/linux/debian \

"$( /etc/os-release && echo "$VERSION_CODENAME")" stable" | \

sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update

And after successfully adding the Docker repo, install docker:

sudo apt install docker-compose-plugin docker-ce docker-ce-cli docker-compose

Configuring Docker

To have our Docker data stored in /data instead of the default /var/lib/docker directory, we need to edit /etc/docker/daemon.json to contain “data-root”: “/data/docker/docker-data”

See Configure the daemon with systemd (https://docs.docker.com/config/daemon/systemd/ ) for more info.

/etc/docker/daemon.json

{ "data-root": "/data/docker/docker-data"

}

And set the permissions right for our new docker-data folder.

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 14

sudo mkdir -p /data/docker/docker-data # -p creates all folders recursively

sudo chown docker:docker /data/docker -R # -R sets permissions for all files and folders in that path

Now that Docker is configured, we can start it using systemctl systemctl enable --now docker

Configuring Aleph’s Docker Environment

This section closely follows Aleph’s own installation guide.

To start, download Aleph ’s own Docker Compose file and template env file to /data/aleph/

mkdir /data/aleph

curl https://github.com/alephdata/aleph/raw/main/docker-compose.yml -Lo /data/aleph/docker-compose.yml

curl https://github.com/alephdata/aleph/raw/main/aleph.env.tmpl -Lo /data/aleph/aleph.env

Generate a secret key using openssl rand -hex 24 and use the result as the value of in /data/aleph/aleph.env

Set the value of ALEPH_UI_URL to the domain chosen for Aleph. For example https://aleph.humberstorylab.ca/

If you do not trust other processes on the system (for example: if you’re using the same machine for multiple services), or are hosting containers across multiple machines, you may wish to enable mTLS documented here (https://docs.aleph.occrp.org/developers/installation/#sentry). We chose not to document this.

In order to keep Aleph running after the system restarts, you should edit the dockercompose.yml and add restart: always to each of the services, except for shell, ingest-file and worker.

After the above steps, you’re ready to start Aleph.

docker-compose up -d

For systems with more resources, allocating more resources to containers while ingesting documents will speed up the process. You should experiment with the numbers to see what works best for your system.

docker-compose up --scale ingest-file=8 --scale convert-document=4 --scale worker=2 -d

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  INSTALLATION 15

Importing Data

There are 3 main ways of importing data:

• Directly using Aleph’s CLI

• Using the web UI

• Using Aleph Client

Aleph CLI

Before starting the import, create or chose an existing investigation and copy the foreign ID

Copy the data for the import to the container, and run the import

docker cp <path to your dataset> aleph_api_1:/data/<your folder> docker exec -it aleph_api_1 /bin/bash

# In docker container

cd /data

aleph crawldir -f <foreign id> <your folder>

# Clean up, as /data is in the persistent data directory in the docker container rm -rf <your folder>

Tip: If you’re connected over ssh, use screen to prevent the command from cancelling if you get disconnected. If you get disconnected, just reconnect and run screen -r to re-attach your last session

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 16

Web UI

The Web UI can be very unreliable when uploading large datasets, we recommend using Aleph ’s CLI or Aleph Client to upload large swaths of data.

Setting up Keycloak

For the purposes of this document we’re going to only document the connection between Aleph and Keycloak. It’s recommended to follow Keycloak’s own guide to set something up that best suits your use case. https://www.keycloak.org/guides

Your organization may already have an access control system compatible with Aleph such as Google Workspace or Okta. In which case, you may wish to contact your IT or sysadmins for assistance in linking your existing system

Once you have your keycloak instance set up, create a Client for Aleph. The following settings should work.

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  INSTALLATION 17

Be sure to change the URLs to point to your Aleph instance.

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 18

And set the following options in your aleph.env file

ALEPH_PASSWORD_LOGIN=false

ALEPH_OAUTH=true

ALEPH_OAUTH_KEY=aleph

ALEPH_OAUTH_SECRET=CHANGEME

ALEPH_OAUTH_METADATA_URL=https://YOUR-KEYCLOAK-ENDPOINT/realms/master/.wellknown/ openid-configuration

You can get your OAuth Secret key from Client Details > Credentials.

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  INSTALLATION 19

Maintaining Your Aleph Instance

Updating Aleph

It is incredibly important to take backups, as any kinds of upgrades are slightly risky

Before beginning; make note of any changes you’ve made to the compose file such as restart: always , as those will be cleared

cd /data/aleph

# Save the old config in case you need to reference the old docker compose file for changes you made.

### Do not rollback the config after running aleph upgrade as it will probably cause database corruption ###

cp docker-compose.yml "docker-compose-backup-$(date '+%F-%H-%M-%S').yml"

# Download the new production docker-compose file

curl https://github.com/alephdata/aleph/raw/develop/docker-compose.yml -Lo

After downloading the latest compose file, make any changes necessary from your own configuration, such as adding restart: always to some of the containers.

Following that, you can restart and update Aleph with the following snippet (taken from Aleph’s FAQ on updating)

docker-compose pull

# Terminate the existing install (enter downtime!):

docker-compose down

docker-compose up -d redis postgres elasticsearch

# Wait a minute or so while services boot up...

# Run upgrade:

docker-compose run --rm shell aleph upgrade

# Restart prod system:

docker-compose up -d

Small footnote: If ever Postgres gets updated, special procedures will need to be taken in order to update the database. If this happens its very likely Aleph will write a guide on how update, however in the case they don’t, you should look for a Postgres migration guide that dumps the DB into SQL commands and re-import once upgraded.

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 20

Additional Notes

Ingests Getting Stuck

There are a few ways that ingests just get stuck, and the Aleph team is working on it; for the meantime here are a few workarounds for that.

Starting off, check if CPU is stuck at 100% with a single soffice.bin process. If that’s the case, It’s likely a document is stuck being converted to pdf

We’ve opened an issue to have the timeout for conversion configurable https://github.com/ alephdata/ingest-file/issues/573 , however, until this is fixed, a temporary bypass is to kill the LibreOffice conversion process to skip the file that its stuck on.

We've made an automated script to stop any conversions taking more than two minutes per time.

# Terrible hack, needs to run as root LAST_SEEN_PIDS =() while true; do

mapfile -t PGREP_OUT < <(pgrep soffice.bin)

for i in "${PGREP_OUT[@]}"; do for j in "${LAST_SEEN_PIDS[@]}"; do if [[ "$i" == "$j" ]]; then pkill -9 "$i" echo "Killed $i" fi done done

mapfile -t LAST_SEEN_PIDS < <(pgrep soffice.bin) sleep 120 done

If CPU is idle, try restarting the Aleph processes. We've found that this can sometimes kick-start things again.

docker-compose down

docker-compose up <your arguments> -d

CONTENTS USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  INSTALLATION 21

If all else fails, you may need to cancel the ingest, and start again

CONTENTS INSTALLATION  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS 22

Appendix

Appendix 1: A conversation with Distributed Denial of Secrets (DDoSecrets)

About DDoSecrets

Distributed Denial of Secrets is a journalist non-profit devoted to publishing and archiving leaks and hacked data. They work with other newsrooms and are focused on ensuring access to information and a safe environment for sources. To learn more about DDoSecrets, you can visit https://ddosecrets.com/wiki/Distributed_Denial_of_Secrets

DDoSecrets provided our research team with the initial test data for our Aleph server, which they had procured and archived from the Dark Web.

The following is an abridged conversation with DDoSecrets, edited for length and clarity.

StoryLab: Why do you archive and publicize data released on the dark web?

DDoSecrets: We go where the data is, and there's a lot on the dark web. Much of it isn't fit for republication, but there's often valuable data. So-called malicious cybercriminals that target extremely profitable entities have a tendency to target many unethical entities along the way, because unethical practices are some of the most profitable.

Can you describe your relationship with journalists and academics?

We work with other journalists and with academics to help process the data and understand it, and a lot of the data is only available to journalists and other researchers. Some datasets are simply too large to properly redact to protect the innocent, and others can't be properly redacted without neutering the data.

CONTENTS 24 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

How can reporting on ransomware (and other leaks) aid journalism?

These companies and governments will never hand over most of this information willingly, even when there's a legal compulsion. A ransomware gang compromised a law firm that had represented, among other clients, the City of Chicago. Because of this, the data included tens of thousands of emails from senior city officials—emails revealing a previously secret drone program paid for with off the books funds, and that a majority of Chicago PD car chases ended in accidents.

What are some of the major challenges of navigating leaked data obtained from the dark web?

People react to announcements and claims much more than they do data, especially when there's a layer of opacity—and that's true for journalists, too. I see a lot of journalists get excited about data without understanding what it is (and more importantly what it isn't), often thinking that something has been released that hasn't. And when faced with actually analyzing the data, many journalists balk because they're accustomed to press releases and easy quotes. The patience and skills needed to interrogate data from the dark web—ranging from the motives, to the (possible lack of) context—are something that few journalists have developed or are willing to spend the handful of hours needed to.

What has your experience been like using OCCRP's Aleph, if any?

We use Aleph to power Hunter because it's a very powerful tool in many ways, but it's also very fickle and has too many improvements that you can't turn off. For instance, no amount of telling the search engine you want exact terms will stop it from returning what it thinks is a match for a translation and/or transliteration of those terms. Things like that are features that become bugs by not being able to be disabled—it's as if Cruise Control were always on no matter what.

The problem comes down to, OCCRP didn't design the codebase for external use. It's an internal tool that they made public. Because of that, it's great at serving the OCCRP's needs, and has a lot to offer anyone else who wants to wear a really nice coat that's both too big and too small.

CONTENTS 25 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX

Appendix 2: Basic Malware Awareness

Introduction

Malware poses a significant threat to individuals and organizations. In this modern age where cyber-attacks are on the rise, understanding the basics of malware analysis becomes crucial for every person who handles sensitive information. This guide aims to provide a basic insight into malware analysis, enabling them to recognize and respond to potential threats.

Understanding Malware

Malware refers to any software designed to harm or exploit computer systems, networks, or users. There are several types of malware, and each malware performs its own functions.

Types of Malware

• Viruses - Replicates itself by attaching to other programs or files, spreading from one computer to another.

• Worms - Self-replicates and spread independently across networks, exploiting vulnerabilities.

• Trojans - Disguises itself as legitimate software to trick users into installing it, often opening a backdoor for other malicious activities.

• Ransomware - Encrypts files on a victim's system and demands a ransom for their release.

• Spyware - Monitors and collects information about a user's activities without their knowledge.

• Adware - Displays unwanted advertisements, often bundled with legitimate software.

Example: Many data breaches are caused by malware infection, leading to many possible avenues of further penetration by hackers.

CONTENTS 26 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Symptoms of Malware Can Be Detected

A. Unusual System Behaviour

Slow Performance

If your computer is taking much longer than usual to respond to commands, open programs, or complete tasks, it could indicate various issues. This slowness might be due to resource-intensive applications, hardware problems, or the presence of malware consuming system resources.

Frequent Crashes

When a program or the entire operating system crashes frequently, it means they are unexpectedly shutting down or becoming unresponsive. This instability can result from software bugs, hardware failures, or the interference of malicious software attempting to disrupt normal operations.

Unexpected Pop-ups

Pop-up messages appearing on your screen without your initiation can be a sign of intrusive activities. Malware often uses pop-ups to deliver unwanted advertisements, fake alerts, or phishing attempts. These pop-ups may attempt to deceive users or trick them into downloading malicious content, compromising the system's security.

Unexpected CMD Box Pop-up

This is specifically for Windows environments. Witnessing an unexpected CMD box popping up on your screen and closing quickly may be an indication of malware.

B. Network Anomalies

Unusual Network Traffic

Unusual patterns or volumes of data flowing through a network can be a sign of abnormal activity. This may include a sudden increase in data transfers, unusual protocols being used, or unexpected communication between devices. Monitoring for these anomalies helps detect potential security breaches or unauthorized access.

Unauthorized Access Attempts

Network anomalies may involve repeated or unexpected attempts to access a network, system, or specific services without proper authorization. This could be a sign of a malicious actor trying to gain unauthorized entry, to extract sensitive information or cause disruption.

CONTENTS 27 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX

C. Changes in Files and Settings

Altered or Missing Files

Any modifications to existing files or unexpected disappearance of files can be indicative of unauthorized activity or potential malware presence. Malicious actors often alter or delete files to conceal their actions, compromise system functionality, or manipulate data.

Changes to System Settings

System settings include configurations that dictate how the operating system and software applications behave. Unauthorized changes to these settings can lead to system instability, performance issues, or security vulnerabilities. Malware may attempt to modify settings to persist on the system or evade detection.

Malware Detection and Analysis

Hackers use different tools to perform various actions to gather or steal information. Likewise, security professionals use different tools and techniques that can be grouped into two categories:

• Static Analysis: Examining the characteristics and code of a malicious program without executing it, providing insights into its structure, functionality, and potential threats.

• Dynamic Analysis: Executing a malicious program in a controlled environment to observe its behaviour, helping to understand its actions and potential impact on a system.

This section is not designed to give detailed instructions on how to do static analysis or dynamic analysis. However, individuals can use free sources available on the internet to verify before opening a file or URL.

CONTENTS 28 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Using VirusTotal

Using VirusTotal to check attachments or URLs in emails is a good practice to ensure that they are not malicious or potentially harmful. VirusTotal is a free online service that analyzes files and URLs for viruses, worms, trojans, and other types of malicious content.

Here is a step-by-step guide on how to use VirusTotal when you receive an email with an attachment or URL:

For Attachments:

• Save the Attachment: If you receive an email with an attachment, save the attachment to your computer.

• Visit VirusTotal Website: Open your web browser and go to the VirusTotal website.

• Upload the File:

` Select the ‘Choose file’ button on the VirusTotal homepage.

` Navigate to the location where you saved the attachment and select it.

• Scan the File:

` Once the file is selected, click on the “Scan it!” button.

` VirusTotal will then analyze the file using multiple antivirus engines.

CONTENTS 29 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX

• Review Results:

` After the analysis is complete, you will see a detailed report showing the results from various antivirus scanners.

` Review the report to see if any antivirus engines detected the file as malicious.

For URLs

• Copy the URL: If you receive an email with a URL, copy the URL to your clipboard.

• Visit VirusTotal Website: Open your web browser and go to the VirusTotal website.

• Paste the URL: On the VirusTotal homepage, you will find a search bar. Paste the URL into the search bar.

• Submit the URL: Press Enter or click on the search icon to submit the URL for analysis.

• Review Results:

` VirusTotal will provide a report displaying information about the URL, including any detections or warnings from various sources.

` Analyze the report to determine if the URL is considered safe or malicious.

CONTENTS 30 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Additional Tips

• Browser Extensions: Consider installing the VirusTotal browser extension for quick and easy checking of URLs directly from your browser.

• API Integration: For advanced users, VirusTotal provides an API that can be integrated into security workflows or applications.

• Regular Checks: Make it a habit to use VirusTotal for checking suspicious attachments or URLs regularly, especially if you are in a position where you frequently deal with emails and files.

Check Digital Certificates

Digital certificates are like online ID cards that help keep our information safe on the internet. They ensure that the websites we visit and the people we talk to online are trustworthy. When you see a little padlock symbol in your web browser, it means a digital certificate is in use, making sure your connection is secure. So, digital certificates play a big role in keeping our online activities safe and protected.

Use Updated Browser

Using an updated browser is crucial for enhancing online security, as it helps protect against known vulnerabilities, ensures compatibility with modern web technologies, and often includes the latest security features to safeguard users from potential cyber threats.

Updated browsers can provide warnings about dangerous and deceptive content. Phishing and malware detection is turned on by default, which helps to identify sites containing malware or other harmful files or programs.

Preventive Measures

In the context of preventive measures, maintaining regular backups is a fundamental practice for ensuring data integrity and resilience against potential data loss or system failures. This involves systematically duplicating critical data at scheduled intervals. The rationale is to create redundant copies of essential files, such as documents or valuable information, which can be invaluable in the event of accidental deletions, hardware failures, or cyber threats like ransomware. Equally crucial is the strategic storage of these backups in a secure and isolated environment. This mitigates the risk of the backups themselves being compromised and ensures their availability for restoration purposes. By adhering to these professional practices, organizations and individuals can significantly enhance their data protection strategy, contributing to a robust and resilient data management framework.

CONTENTS 31 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX

How To Set Up Test Environment For Testing Malware In Windows

11

There is a lesser-known feature in Windows 11 Pro (also in Windows 10 Pro) called Windows Sandbox that can be used to build a safe environment to test potential malware. Windows Sandbox is an isolated virtual environment that has its own self-contained Operating System (OS).

Windows Sandbox is not enabled in Windows 11 Pro by default.

CONTENTS 32 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

To turn it on, go to the Windows 11 search bar on the bottom of the screen, and type ‘turn Windows features on or off’. This will take you to the Control Panel.

To start the Windows Sandbox, go to the Windows 11 search box on the bottom of the screen, and type in ‘Windows Sandbox’. The Windows Sandbox will come up. However, the load time is significant. Be prepared to wait.

CONTENTS 33 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX

Once the Windows Sandbox opens, you will see a Windows 11 environment that may look like what is shown below.

Now you can move the software including the suspected malware files into the above windows sandbox environment by copy and paste.

Once you have the software you need to test inside the sandbox, you can operate the software as you wish.

When you have finished your testing, you can exit the Windows Sandbox. To be safe, you can go back to the ‘Turn Windows features on or off’ and uncheck the Windows Sandbox feature. After that, you can restart the host machine. The above steps will ensure no traces of the potential malware remain in the memory of your system.

Further Reading on Windows Sandbox: https://learn.microsoft.com/en-us/windows/security/application-security/applicationisolation/windows-sandbox/windows-sandbox-overview

CONTENTS 34 APPENDIX  | USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS

Conclusion

In conclusion, understanding and analyzing malware is crucial in the digital age where cyber threats continue to evolve. By recognizing the varied types of malware and their respective behaviours, individuals and organizations can take proactive measures to protect their systems and sensitive information. Identifying signs of malware infection, conducting basic analysis, and implementing preventive measures contribute to a robust defense against cyber threats.

CONTENTS 35 USING OCCRP’S ALEPH FOR DARK WEB DATA ANALYSIS |  APPENDIX
Contact humberstorylab.ca david.weisz@humber.ca

Turn static files into dynamic content formats.

Create a flipbook