Projects

Applications

Dozent (Pre-Alpha)

PyPi Link

Docker Image

Source Code

Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archives.

It is built on top of PySmartDL and multithreading, similar to how traditional download accelerators like axel, aria2c and aws s3 work, ensuring that the biggest bottlenecks are your network and your hardware.

The data that is downloaded is already heavily compressed to reduce download times and save local storage. When uncompressed, the data can easily add up to several petabytes depending on the timeframe of data being collected. Fortunately, you do not have to decompress the data to analyze it! We are working on a separate big data tool named Murpheus that uses Dask to analyze the data without needing to decompress it.

Run as a Pip module

# Install Pip Package
$ python3.6 -m pip install dozent

# Download all tweets from May 12th, 2020 to May 15th, 2020
$ python3.6 -m dozent -s 2020-05-12 -e 2020-05-15

Run as a Docker image

# Pull Docker image
$ docker pull socialmediapublicanalysis/dozent:latest

# Download all tweets from May 12th, 2020 to May 15th, 2020
$ docker run -it socialmediapublicanalysis/dozent:latest python3.6 -m dozent -s 2020-05-12 -e 2020-05-15
About the Data
  • Only collects Tweets in the English language
  • Tweets are stored in JSON format
  • Each day is a compressed file roughly 2.5 GB large or ~ 32 GB uncompressed
  • Each tweet has accompanying metadata about the tweet and user
Sample Data

Interested in seeing what the data that Dozent collects looks like?

Check it out!

https://dozent-tests.s3.amazonaws.com/sample_data.json

Murpheus (Alpha)

PyPi Link

Source Code

Murpheus is a powerful analysis tool written in Python and Dask that analyzes large amounts of Twitter data from the internet archive.

Lagrange Interpolater

Source Code

Java app that I programmed as a personal project that computes Lagrange Interpolating Polynomials. Given a set of (x, y) points, this program will compute the lowest degree polynomial that passes through each of these points.

Live Websites

Fictional Camp

Link

Source Code

Business Web Application for an overnight summer camp. Written in PHP, JavaScript, JQuery, HTML, CSS and is fully integrated with the WordPress content management system and WP REST API.

Greece Blog Theme

Link

Source Code

Custom WordPress theme for a study abroad student. This was built primarily as an exercise to learn HTML, CSS, JavaScript, and PHP.

WordPress Plugins

American to English Autocorrect

Source Code

WordPress Plugin that replaces common American English spellings of words with their respective English spellings.

English to American Autocorrect

Source Code

WordPress Plugin that replaces common English spellings of words with their respective American English spellings.

Miscellaneous

HamSandwichViz

Source Code

Python visualization of the Ham Sandwich Theorem and algorithm for the 2 dimensional plane in O(n) time for my graduate-level computational geometry course. Given two sets of points in 2D space, this program will calculate and visualize the hyperplane that splits these two sets. Implemented in a Jupyter notebook.

Bioinformatics Algorithm Implementations

Source Code

Python 3 implementations of the bioinformatics algorithms presented in Coursera’s Bioinformatics Specialization.