Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archives.
It is built on top of PySmartDL and multithreading, similar to how traditional download accelerators like axel, aria2c and aws s3 work, ensuring that the biggest bottlenecks are your network and your hardware.
The data that is downloaded is already heavily compressed to reduce download times and save local storage. When uncompressed, the data can easily add up to several petabytes depending on the timeframe of data being collected. Fortunately, you do not have to decompress the data to analyze it! We are working on a separate big data tool named Murpheus that uses Dask to analyze the data without needing to decompress it.
Run as a Pip module
# Install Pip Package $ python3.6 -m pip install dozent # Download all tweets from May 12th, 2020 to May 15th, 2020 $ python3.6 -m dozent -s 2020-05-12 -e 2020-05-15
Run as a Docker image
# Pull Docker image $ docker pull socialmediapublicanalysis/dozent:latest # Download all tweets from May 12th, 2020 to May 15th, 2020 $ docker run -it socialmediapublicanalysis/dozent:latest python3.6 -m dozent -s 2020-05-12 -e 2020-05-15
Interested in seeing what the data that Dozent collects looks like?
Check it out!
Murpheus is a powerful analysis tool written in Python and Dask that analyzes large amounts of Twitter data from the internet archive.
Java app that I programmed as a personal project that computes Lagrange Interpolating Polynomials. Given a set of (x, y) points, this program will compute the lowest degree polynomial that passes through each of these points.
WordPress Plugin that replaces common American English spellings of words with their respective English spellings.
WordPress Plugin that replaces common English spellings of words with their respective American English spellings.
Python visualization of the Ham Sandwich Theorem and algorithm for the 2 dimensional plane in O(n) time for my graduate-level computational geometry course. Given two sets of points in 2D space, this program will calculate and visualize the hyperplane that splits these two sets. Implemented in a Jupyter notebook.
Python 3 implementations of the bioinformatics algorithms presented in Coursera’s Bioinformatics Specialization.