Understanding Networks

Week 1

After setting up my virtual host with a firewall and running it for 1 day and 100 minutes, I found that there have been 4518 attempts trying to connect to my machine (and the number is growing at about 5 per second when I'm writing this post!) I got the information of the last 100 attempts and replaced all spaces with tabs by using the command sudo tail -100 /var/log/ufw.log | sed -e 's/\s/\t/g' . I copied and pasted the results into a spreadsheet, and then sorted the source IP address column. I noticed that the IP address 89.248.165.99 appeared most frequently, so I looked it up with the whois command. In the remarks, it says this IP address belongs to the Recyber Project and they are “only scanning me for legit purposes.” I went to their website and it states: " The Recyber project assists researchers, universities, and other educational institutions. Partnered institutions use our platform to conduct their research." This got me really curious - what kind of research are they doing? Why do they need to scan me frequently? I searched the Recyber Project on google but there isn't much information, and also there are a few people having the same question. On their website, there's also an opt-out form to have an IP address excluded from the project, but an email address is requested (which is kind of fishy), so I ended up not filling out the form. Anyway, this experience of digging out information from the firewall log was really interesting- it gave me a taste of what it's like to be a hacker😎

Screen Shot 2022-09-17 at 17.21_edited.j

Question:

When I was upgrading the Linux system, these two windows popped up. I wasn't sure what services they are and which of them should be restarted.

Week 2

First, I downloaded my browser history and used the command line cat ~/Desktop/BrowserHistory.json | grep ' \"url":' | grep '\"http' > http.txt to grab URLs into a new txt file. (Question: I'm a bit confused about this line - the result still contains "url": , so what does the grep '\"http' do?)

Inside vscode, I selected and deleted all occurrences of "url": "https:// by using the shortcut Command+Shift+L. Then, I used ^(.*)\/(.*)$ with $1 several times to get rid of everything after the first slash in the URL. The URLs are finally clean now.

Next, I used sort -u http.txt > unique.txt to get only the unique addresses in my browser history, which made the number of lines shrink significantly to 590. I then turned them into IP addresses with the command line cat unique.txt | nslookup | grep 'Address:' > sihanip.txt. (Question: This gave me 1265 lines of IP addresses, how can I sort out which one or multiple IP addresses belong to which website?)

To find out which domains I visit the most, I copied and pasted the web addresses from the http.txt file to a spreadsheet, and used Remove Duplicates, COUNTIF, RANK functions to count how many times I visited each website and ranked them based on that frequency. No surprise-Google servises won the first four places, followed by several NYU websites. What's interesting is that Unity asset store is also among the top ten.

Only a few IP adresses can be reached by traceroute. Here's the visualization of the connection using Traceroute Mapper.

www.bilibili.com (148.153.56.163)

www.shadertoy.com(34.223.157.0)

store.moma.org(23.227.38.74)

asset. unity.com (35.238.75.111)

beta.film.ai (64.32.28.234)

www.zhihu.com(211.152.149.12)

1. The first 6 or 7 hops are usually within the New York and New Jersey areas.

2. Most IPs are located on the west coast.

3. To get to China, the data can either travel to the west coast then through the North Pacific Ocean or head east to Sweden and then go all the way across the Asian continent.

Week 5

I reused the game controller panel I made last semester for the New Arcade class. It's a pretty good fit for this game.

Week 6

After setting up Nginx on my network host and configuring it for HTTP and HTTPS requests, I pull up the access log using sudo nano /var/log/nginx/access.log . Here's what it looks like:

There were lots of requests coming from the IP address 213.91.182.165 so I looked it up with the whois command and found it belongs to a Bulgarian telecommunications company called Vivacom. I'm wondering what all the strings mean after the GET command (they look like file paths)? And what is the difference between this log and the firewall log we did a few weeks ago? I know these are only http/https requests so what are the other requests/traffic types in the firewall log?

Week 7

I finally bought a domain name and connected it to my virtual host. This tutorial was helpful.

I also set up a server block for my domain and set up HTTPS certificates with Let's Encrypt. It's nice to see the little lock icon before my domain name.

Week 8

I only had a vague understanding of REST, Node.js, and all the network concepts, but luckily I found this tutorial that guided me through creating and deploying an Express REST API on my server. Now I have a better understanding of how everything works.

Step 1 - Creating a Node.js & Express REST API on my local machine

- Adding GET Routes to my API

- Adding POST Routes to my API

Step 2 - Uploading the application to Github

- Adding .gitignore file

- Pushing code to Github via SSH URL

GitHub is no longer accepting account passwords when authenticating Git operations so I created SSH keys for my computer.

Step 3 - Configuring and deploying my Node.js application

- Pulling application from Github + running application

- Installing and configuring PM2 (keep the application running)

Step 4 - Setting up Nginx as a reverse proxy

I updated the /etc/nginx/sites-available/sihanzhang.xyz file with the following configuration

Now when I go to sihanzhang.xyz/users I can get all the user data.

Week 9

I connected a MQ-135 air quality sensor to Arduino and printed the analog inputs. The values were quite high in my room, not sure if the sensor was working ok.

Week 10

Code

I created an undnet/sihan subtopic and sent my sensor data to the broker once per minute. My internet wasn't working well so I only managed to send data for about an hour.

Question: I noticed there's a history section under my topic. Does the broker store each subtopic's history?

Week 11

With Wireshark open, I went to my website sihanzhang.xyz and sihanzhang.xyz/users. There were a lot of packets captured, so I filtered out the packets which have my website's IP address.

I noticed that there were two http packets, one was a Get request and another was the response with code 301. This didn't match what I saw in the browser (user data in JASON format) when I went to the route sihanzhang.xyz/users , I'm wondering if this has to do with Nginx forwarding the request to Node?

Using the protocol hierarchy statistics tool, I found that 75% of the packets sent from my computer are ipv6 packets, and the endpoints of these packets are mainly google services. It's interesting to find out that Grammarly and Pinterest are also running in the background.

My website is pretty clean from user-tracking technologies. The site I often visit (www. vrscout.com) has many ad trackers and third-party cookies.

I didn't know my browser can expose so much of my information before. Installing the uBlock extension is helpful for protecting my privacy.

Terms

Packet: In networking, a packet is a small segment of a larger message. Data sent over computer networks, such as the Internet, is divided into packets. These packets are then recombined by the computer or device that receives them.

The structure of the network packet consists of three parts; header, payload, and trailer. The header includes instructions about the data carried by the packet. The payload is the body of a packet, which is the actual data that the packet is delivering to the destination. Finally, the trailer contains a couple of bits that tell the receiving device that it has reached the end of the packet.

https://www.cloudflare.com/learning/network-layer/what-is-a-packet/

Streaming: Streaming refers to the continual transmission of audio and video files from a server to a client. Just like other data that's sent over the Internet, audio and video data is broken down into packets. Each packet contains a small piece of the file, and an audio or video player in the browser on the client device takes the flow of data packets and interprets them as video or audio. Some streaming methods use UDP, and some use TCP. UDP is used when speed takes precedence over reliability, while TCP is used when reliability takes precedence.

Bitrate: Bitrate describes the rate at which bits are transferred from one location to another. It measures how much data is transmitted in a given amount of time. Bitrate is commonly measured in bits per second (bps), kilobits per second (Kbps), or megabits per second (Mbps).

Final Thoughts

This was my first time getting into the world of networks, protocols, back end... Everything was new to me and sometimes it was hard to understand the many terms and concepts, but doing the hands-on assignments like setting up a host and creating a Node program helped me understand how everything works behind the scene. I'm now comfortable with using the terminal and command lines, which I used to think of it as something intimidating and requires a lot of time and effort to learn. The projects I've worked on so far don't involve the need of connecting to the internet or to other devices, but I think in the future when I want to or have to use them it will be easier for me to start.