Week 1
After setting up my virtual host with a firewall and running it for 1 day and 100 minutes, I found that there have been 4518 attempts trying to connect to my machine (and the number is growing at about 5 per second when I'm writing this post!) I got the information of the last 100 attempts and replaced all spaces with tabs by using the command sudo tail -100 /var/log/ufw.log | sed -e 's/\s/\t/g' . I copied and pasted the results into a spreadsheet, and then sorted the source IP address column. I noticed that the IP address 89.248.165.99 appeared most frequently, so I looked it up with the whois command. In the remarks, it says this IP address belongs to the Recyber Project and they are “only scanning me for legit purposes.” I went to their website and it states: " The Recyber project assists researchers, universities, and other educational institutions. Partnered institutions use our platform to conduct their research." This got me really curious - what kind of research are they doing? Why do they need to scan me frequently? I searched the Recyber Project on google but there isn't much information, and also there are a few people having the same question. On their website, there's also an opt-out form to have an IP address excluded from the project, but an email address is requested (which is kind of fishy), so I ended up not filling out the form. Anyway, this experience of digging out information from the firewall log was really interesting- it gave me a taste of what it's like to be a hacker😎
Question:
When I was upgrading the Linux system, these two windows popped up. I wasn't sure what services they are and which of them should be restarted.
Week 2
First, I downloaded my browser history and used the command line cat ~/Desktop/BrowserHistory.json | grep ' \"url":' | grep '\"http' > http.txt to grab URLs into a new txt file. (Question: I'm a bit confused about this line - the result still contains "url": , so what does the grep '\"http' do?)
Inside vscode, I selected and deleted all occurrences of "url": "https:// by using the shortcut Command+Shift+L. Then, I used ^(.*)\/(.*)$ with $1 several times to get rid of everything after the first slash in the URL. The URLs are finally clean now.
Next, I used sort -u http.txt > unique.txt to get only the unique addresses in my browser history, which made the number of lines shrink significantly to 590. I then turned them into IP addresses with the command line cat unique.txt | nslookup | grep 'Address:' > sihanip.txt. (Question: This gave me 1265 lines of IP addresses, how can I sort out which one or multiple IP addresses belong to which website?)
To find out which domains I visit the most, I copied and pasted the web addresses from the http.txt file to a spreadsheet, and used Remove Duplicates, COUNTIF, RANK functions to count how many times I visited each website and ranked them based on that frequency. No surprise-Google servises won the first four places, followed by several NYU websites. What's interesting is that Unity asset store is also among the top ten.
Only a few IP adresses can be reached by traceroute. Here's the visualization of the connection using Traceroute Mapper.
www.bilibili.com (148.153.56.163)
www.shadertoy.com(34.223.157.0)
store.moma.org(23.227.38.74)
asset. unity.com (35.238.75.111)
beta.film.ai (64.32.28.234)
www.zhihu.com(211.152.149.12)
1. The first 6 or 7 hops are usually within the New York and New Jersey areas.
2. Most IPs are located on the west coast.
3. To get to China, the data can either travel to the west coast then through the North Pacific Ocean or head east to Sweden and then go all the way across the Asian continent.
Week 5
I reused the game controller panel I made last semester for the New Arcade class. It's a pretty good fit for this game.
Week 6
After setting up Nginx on my network host and configuring it for HTTP and HTTPS requests, I pull up the access log using sudo nano /var/log/nginx/access.log . Here's what it looks like:
There were lots of requests coming from the IP address 213.91.182.165 so I looked it up with the whois command and found it belongs to a Bulgarian telecommunications company called Vivacom. I'm wondering what all the strings mean after the GET command (they look like file paths)? And what is the difference between this log and the firewall log we did a few weeks ago? I know these are only http/https requests so what are the other requests/traffic types in the firewall log?
Week 7
I finally bought a domain name and connected it to my virtual host. This tutorial was helpful.
I also set up a server block for my domain and set up HTTPS certificates with Let's Encrypt. It's nice to see the little lock icon before my domain name.
Week 8
I only had a vague understanding of REST, Node.js, and all the network concepts, but luckily I found this tutorial that guided me through creating and deploying an Express REST API on my server. Now I have a better understanding of how everything works.
Step 1 - Creating a Node.js & Express REST API on my local machine
- Adding GET Routes to my API
- Adding POST Routes to my API
Step 2 - Uploading the application to Github
- Adding .gitignore file
- Pushing code to Github via SSH URL
GitHub is no longer accepting account passwords when authenticating Git operations so I created SSH keys for my computer.
Step 3 - Configuring and deploying my Node.js application
- Pulling application from Github + running application
- Installing and configuring PM2 (keep the application running)
Step 4 - Setting up Nginx as a reverse proxy
I updated the /etc/nginx/sites-available/sihanzhang.xyz file with the following configuration
Now when I go to sihanzhang.xyz/users I can get all the user data.
Week 9
I connected a MQ-135 air quality sensor to Arduino and printed the analog inputs. The values were quite high in my room, not sure if the sensor was working ok.
Week 10
I created an undnet/sihan subtopic and sent my sensor data to the broker once per minute. My internet wasn't working well so I only managed to send data for about an hour.
Question: I noticed there's a history section under my topic. Does the broker store each subtopic's history?
Week 11
With Wireshark open, I went to my website sihanzhang.xyz and sihanzhang.xyz/users. There were a lot of packets captured, so I filtered out the packets which have my website's IP address.
I noticed that there were two http packets, one was a Get request and another was the response with code 301. This didn't match what I saw in the browser (user data in JASON format) when I went to the route sihanzhang.xyz/users , I'm wondering if this has to do with Nginx forwarding the request to Node?
Using the protocol hierarchy statistics tool, I found that 75% of the packets sent from my computer are ipv6 packets, and the endpoints of these packets are mainly google services. It's interesting to find out that Grammarly and Pinterest are also running in the background.
My website is pretty clean from user-tracking technologies. The site I often visit (www. vrscout.com) has many ad trackers and third-party cookies.
I didn't know my browser can expose so much of my information before. Installing the uBlock extension is helpful for protecting my privacy.