Thar Multidiscipline Journal
deep web

Deep Web – The Hidden side of Internet

Synopsis

The Deep Web (also called the Deepnet, the Invisible Web, the Undernet or the hidden Web) is World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
Mike Bergman, founder of Bright Planet, credited with coining the phrase, said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed. Most of the Web’s information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot “see” or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.
Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not “see” or retrieve content in the deep Web – those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.

Deep web is the name given to the technology of surfacing the hidden value that cannot be easily detected by other search engines. The deep web is the content that cannot be indexed and searched by search engines. For this reason the deep web is also called invisible web.

Table of content

Sl No.Topic name
1Introduction
2Deep Web
3Size of Deep Web
4Deep resources
5Crawling the Deep Web
6Levels of Internet
7Conclusion
8References

Introduction

deep web

WHAT IS DEEP WEB?

Searching on the Internet today can be compared to dragging a net across the surface of the ocean. While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web’s information on dynamically generated sites, and standard search engines never find it.

Traditional search engines create their indices by spidering or crawling surface Web pages. To be discovered, the page must be static and linked to other pages. Traditional search engines can not “see” or retrieve content in the deep Web – those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden.

Deep web is the name given to the technology of surfacing the hidden value that cannot be easily detected by other search engines. The deep web is the content that cannot be indexed and searched by search engines. For this reason the deep web is also called invisible web.

IMPORTANCE OF DEEP WEB

“Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web”
“The deep Web contains 7,500 terabytes of information compared to nineteen terabytes of information in the surface Web”
“More than 200,000 deep Web sites presently exist”
“Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times”
“On average, deep Web sites receive fifty per cent greater monthly traffic than surface sites and are more highly linked to than surface sites; however, the typical (median) deep Web site is not well known to the Internet-searching public”
” The deep Web is the largest growing category of new information on the Internet”
” Deep Web sites tend to be narrower, with deeper content, than conventional surface sites”
” Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web”

Deep Web

The Deep Web is the content that resides in searchable databases, the results from which can only be discovered by a direct query. Without the directed query, the database does not publish the result. When queried, Deep Web sites post their results as dynamic Web pages in real-time. Though these dynamic pages have a unique URL address that allows them to be retrieved again later, they are not persistent.

The invisible web consists of files, images and web sites that, for a variety of reasons, cannot be indexed by popular search engines. The deep web is qualitatively different from the surface web. Deep web sources store their content in searchable databases that only produce results dynamically in response to a direct request. But a direct query is a “one at a time” laborious way to search. Deep web’s search technology automates the process of making dozens of direct queries simultaneously using multiple-thread technology.

The Deep Web is made up of hundreds of thousands of publicly accessible databases and is approximately 500 times bigger than the surface Web.

SIZE OF DEEPWEB

Estimates based on extrapolations from a study done at University of California, Berkeley in 2001, speculate that the deep Web consists of about 7,500 terabytes. More accurate estimates are available for the number of resources in the deep Web: He detected around 300,000 deep web sites in the entire Web in 2004, and, according to Shestakov, around 14,000 deep web sites existed in the Russian part of the Web in 2006.

  • 400 to 550 times larger than the commonly defined WWW.
  • around 7,500 TB of info compared to 19 TB of info in the surface Web
  • nearly 550 billion individual documents compared to the 1 billion of the surface Web.
  • More than 200,000 deep Web sites presently exist.
  • 50% per cent greater monthly traffic than surface sites
  • Growing Rapidly.
  • narrower, with deeper content

deep web

NAMING:

Bergman, in a seminal paper on the deep Web published in the Journal of Electronic Publishing, mentioned that Jill Ellsworth used the term invisible Web in 1994 to refer to websites that were not registered with any search engine. Bergman cited a January 1996 article by Frank Garcia
“It would be a site that’s possibly reasonably designed, but they didn’t bother to register it with any of the search engines. So, no one can find them! You’re hidden. I call that the invisible Web.”
Another early use of the term Invisible Web was by Bruce Mount and Matthew B. Koll of Personal Library Software, in a description of the @1 deep Web tool found in a December 1996 press release.
The first use of the specific term deep Web, now generally accepted, occurred in the aforementioned 2001 Bergman study.

DEEP RESOURCES

Deep Web resources may be classified into one or more of the following categories:

  1. Dynamic content:dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
  2. Unlinked content: pages which are not linked to by other pages, which may preventWeb crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks).
  3. Private Web: sites that require registration and login (password-protected resources).
  4. Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
  5. Limited access content: sites that limit access to their pages in a technical way (e.g., using theRobots Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headers which prohibit search engines from browsing them and creating cached)
  6. Scripted content: pages that are only accessible through links produced byJavaScript as well as content dynamically downloaded from Web servers via Flash or Ajax
  7. Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.

CRAWLING THE DEEP WEB

Accessing

To discover content on the Web, search engines use web crawlers that follow hyperlinks through known protocol virtual port numbers. This technique is ideal for discovering resources on the surface Web but is often ineffective at finding deep Web resources. For example, these crawlers do not attempt to find dynamic pages that are the result of database queries due to the infinite number of queries that are possible. It has been noted that this can be (partially) overcome by providing links to query results, but this could unintentionally inflate the popularity for a member of the deep Web. In 2005, Yahoo! made a small part of the deep Web searchable by releasing Yahoo! Subscriptions. This search engine searches through a few subscription-only Web sites. Some subscription websites display their full content to search engine robots so they will show up in user searches, but then show users a login or subscription page when they click a link from the search engine results page.

Crawling the deep Web

Researchers have been exploring how the deep Web can be crawled in an automatic fashion. In 2001, Sriram Raghavan and Hector Garcia-Molina presented an architectural model for a hidden-Web crawler that used key terms provided by users or collected from the query interfaces to query a Web form and crawl the deep Web resources. Alexandros Ntoulas, Petros Zerfos, and Junghoo Cho of UCLA created a hidden-Web crawler that automatically generated meaningful queries to issue against search forms. Several form query languages (e.g., DEQUEL) have been proposed that, besides issuing a query, also allow to extract structured data from result pages. Commercial search engines have begun exploring alternative methods to crawl the deep Web. The Sitemap Protocol (first developed by Google) and mod oai are mechanisms that allow search engines and other interested parties to discover deep Web resources on particular Web servers. Both mechanisms allow Web servers to advertise the URLs that are accessible on them, thereby allowing automatic discovery of resources that are not directly linked to the surface Web. Google’s deep Web surfacing system pre-computes submissions for each HTML form and adds the resulting HTML pages into the Google search engine index. The surfaced results account for a thousand queries per second to deep Web content. In this system, the pre-computation of submissions is done using three algorithms:

(1) selecting input values for text search inputs that accept keywords,

(2) identifying inputs which accept only values of a specific type (e.g., date), and

(3) selecting a small number of input combinations that generate URLs suitable for inclusion into the Web search index.

THE LEVELS OF INTERNET

Level 0 Web – Common Web

Level 1 Web – Surface Web

Level 2 Web – Bergie Web

Level 3 Web - Deep Web

Level 4 Web - Charter Web

Level 5 Web - Marianas Web

The Unknown Levels - L6 L7 L8

Level 0: Common Web

The first level of internet is the common web which we normally use in our daily life. It contains that part of internet where we carry out our normal works like browsing of social networks, movie, songs downloads, mail clients and most of our daily internet applications confine to this level. Most websites and data on databases can be easily retrieved at this level using a simple web search. These web searches through various search engines as Google carry out an indexed search of the given keyword and retrieve real time web pages to the user. The users enter data into forms as logins and pages are displayed with dynamic URLs, these URLs can be used again to retrieve the same page whenever we like.

Some common examples of common websites are:
Facebook
google Google
Twitter
instagram Instagram
Apart from these websites there are thousands of other webpages in this level most of which are mail clients, search engines, social networks, non govt organization websites, etc

LEVEL 1: Surface Web

The Next level of internet is called the surface web it resides just below the common web and it is almost same as the common web it also contains mostly easily available contents and documents. Most pages that confine to this area of internet are the foreign social networks, Temporary Email services, MySQL databases etc.
Its not difficult to reach this portion of web a simple web search can crawl to these depths easily. We don’t need any specific query or application to venture the surface web

Some of the common websites of surface web are:
reddit Reddit
digg Digg
blogger Blogger
Other than these websites there are tons of other foreign social networks, most of the web hosting services are in this level also most MySQL databases are also in this level along with Human Intel tasking, college campuses etc

Level 2: Bergie Web

The second level of internet is called The Bergie Web. It contains most of the FTP Servers, Google locked results, Honeypots, Chans.
Most of the internet reside in this part of internet we can access this level easily with our browser without any third party software, bridge or proxy.

FTP Server :

An FTP server is a software.html application running the File Transfer Protocol(FTP), which is the protocol for exchanging files over the Internet.

Honeypots :

In computer terminology, a honeypot is a trap set to detect, deflect, or in some manner counteract attempts at unauthorized use of information systems. Generally it consists of a computer, data, or a network site that appears to be part of a network, but is actually isolated and monitored, and which seems to contain information or a resource of value to attackers.

4Chan:

4chan is an English-language imageboard website. Users generally post anonymously, with the most recent posts appearing above the rest. 4chan is split into various boards with their own specific content and guidelines. Registration is not required, nor is it possible (except for staff.)

RSC (Reconfigurable Super Computing) :

High-Performance Reconfigurable Computing (HPRC) is a computer architecture combining reconfigurable computing-based accelerators like field-programmable gate arrays (FPGAs) with CPUs, manycore microprocessors, or other parallel computing systems. This heterogeneous systems technique is used in computing research and especially in supercomputing. A 2008 paper reported speed-up factors of more than 4 orders of magnitude and energy saving factors by up to almost 4 orders of magnitude. Some supercomputer firms offer heterogeneous processing blocks including FPGAs as accelerators. One research area is the twin-paradigm programming tool flow productivity obtained for such heterogeneous systems.The US National Science Foundation has a center for high-performance reconfigurable computing (CHREC).In April 2011 the fourth Many-core and Reconfigurable Supercomputing Conference was held in Europe.

Level 3: Deep Web

The actual Deep Web begins from the third level beyond this point we require a direct query to the databases and URLs of pages in this level are not a general string of characters but rather a random string which unlike www.facebook.com or www.google.com appears to be somewhat like http://kpvz7ki2v5agwt35.onion/wiki/index.php/Main_Page
To access the top levels we just need a proxy server however to access the bottom levels we need The Onion Router (TOR).

Proxy:

In computer networks, a proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Today, most proxies are web proxies, facilitating access to content on the World Wide Web.

TOR:

deep web

Tor (originally short for The Onion Router) is free software for enabling online anonymity. Tor directs Internet traffic through a free, worldwide volunteer network consisting of thousands of relays to conceal a user’s location or usage from anyone conducting network surveillance or traffic analysis. Using Tor makes it more difficult to trace Internet activity, including “visits to Web sites, online posts, instant messages and other communication forms”, back to the user and is intended to protect users’ personal privacy, freedom, and ability to conduct confidential business by keeping their internet activities from being monitored.”Onion Routing” refers to the layers of the encryption used. The original data, including its destination, are encrypted and re-encrypted multiple times, and sent through a virtual circuit comprising successive, randomly selected Tor relays. Each relay decrypts a “layer” of encryption to reveal only the next relay in the circuit in order to pass the remaining encrypted data on to it. The final relay decrypts the last layer of encryption and sends the original data, without revealing or even knowing its sender, to the destination. This method reduces the chance of the original data being understood in transit and, more notably, conceals the routing of it.This level of internet contains mostly illegal contents ranging from Child Porn, Gore contents, Sex Tapes, Celebrity Scandals, VIP Gossips, Hackers, Virus Information, Suicides, FTP Specific servers, Computer Security, XSS worms, Mathematical research, Supercomputing, Hacking information, Node transfer, Data Analysis, Black Market, Drug Store etc

Level 4: Charter Web

The next level after the third level is The Charter Web the fourth level of internet like the third level the top level of this portion can be accessed by TOR however to access the bottom levels we need a Closed shell System, which require a heavy amount of shell computing.
Most of the contents found here are illegal and banned items.
Sites in this area for example are Hardcandy, Onion IB, The Hidden wiki, Silk Route
Silk Route is a drug store providing illegal and banned drugs we can easily buy drugs anonymously from Silkroute via Bit Coins of dollars.

Other than these we can also find a great amount of Banned Videos, Banned Books, Banned Movies, and Questionable Visual Materials.
Most of the assassination networks are found in these levels there are sites which provide assassins, bounty hunter etc.
Other than these there are also different trades taking place in this level including rare animal trade, Human trafficking, Corporate Deals, Multibillion dollar deals and most of the black market.

Apart from these illegal contents there are also documents of hidden experiments and ongoing research like we can find contents on Tesla Experiment plans, Crystalline power metrics, Gadolinium Gallium Garnet Quantum Electronic Processors(GGGQEP), Artificial super intelligence and other various experiments and research details

Crystalline Power MetricsMaterials

Science is an inter-disciplinary field that studies the relationship between the structure of materials at atomic scales and their macroscopic properties. In this science, a Crystal is defined as ‘a solid substance in which the atoms, molecules or ions are arranged in an orderly repeating pattern that extends in all three Spatial dimensions‘. But Crystalline energy is an omnipotent power source that has implications far beyond the understanding of modern science.
At the onset of World War II, a Theoretical Physicist named Dr. J. Robert Oppenheimer was working as a Professor of Physics at the University of California, Berkeley. At that time in history, there was an increasing demand for a certain type of Crystal called quartz, which is a naturally occurring form of Silicon. It was quartz Crystals that were used in radios that could be tuned to specific frequencies. And the production of quartz frequency control Crystals rated as one of the highest U.S. military priorities, second only to atomic energy.

Gadolinium Gallium Garnet

(GGG, Gd3Ga5O12) is a synthetic crystalline material of the garnet group, with good mechanical, thermal, and optical properties. It is typically colorless. It has cubic lattice, density 7.08 g/cm³ and Mohs hardness is variously noted as 6.5 and 7.5. Its crystals are made by Czochralski method and can be made with addition of various dopants for modification of color. It is used in fabrication of various optical components and as substrate material for magneto–optical films (magnetic bubble memory). You only need three lasers to write the data and one to read. It’s the same technology as 3D glass etching. If you double its size to 10cm^3, then you have 8 times the storage, and even if you are not atomically precise, you could hold two internets.

Tesla Experiment Plans

Nikola Tesla (10 July 1856 – 7 January 1943) was a Serbian-American inventor, engineer, mechanical, physicist, and futurist best known for his contributions to the design of the modern alternating current (AC) electricity supply system. He did research on free energy which would provide better power supply in less cost.
Copper wire alternating with iron bits works better than fiber optics for data transfer until a certain point. Also, it is much easier to make. Its shape is similar to that of a chain, but a certain difference. It is hidden, because it could pose a threat to the oil industry
Other than this information there is a sea of other information in this level mostly illegal and few research and invention based like CAIMEO an artificial super intelligence etc.

Level 5: Marianas Web

deep web

Most of the information which can affect us directly resides in this level unlike our previous levels which can be reached either by proxy, TOR or a closed shell system we can’t just reach this level so easily it is believe that this level is locked and we need to solve quantum equations to break that lock.
In plain sense to access this level we need a quantum computer. Contents in this level range from various things to different things as heavily illegal contents as Snuff Porn, Heavy Child Porn, Jailbaits to numerous research plans and blue prints. Information of human experiment successes by Nazi scientist during World War II and work of Josef Mengele a Nazi scientist who conducted different human experiments at that time, research paper and documents of these experiments can be found in this level.

Quantum Computation:

A quantum computer is a computation device that makes direct use of quantum mechanical phenomena, such as superposition and entanglement, to perform operations on data. Quantum computers are different from digital computers based on transistors. Whereas digital computers require data to be encoded into binary digits (bits), quantum computation uses quantum properties to represent data and perform operations on these data. A theoretical model is the quantum Turing machine, also known as the universal quantum computer. Quantum computers share theoretical similarities with non-deterministic and probabilistic computers. One example is the ability to be in more than one state simultaneously. The field of quantum computing was first introduced by Yuri Manin in 1980 and Richard Feynman in 1981. A quantum computer with spins as quantum bits was also formulated for use as a quantum space-time in 1969.
Although quantum computing is still in its infancy, experiments have been carried out in which quantum computational operations were executed on a very small number of qubits (quantum bits).Both practical and theoretical research continues, and many national government and military funding agencies support quantum computing research to develop quantum computers for both civilian and national security purposes, such as cryptanalysis.
Large-scale quantum computers will be able to solve certain problems much faster than any classical computer using the best currently known algorithms, like integer factorization using Shor’s algorithm or the simulation of quantum many-body systems. There exist quantum algorithms, such as Simon’s algorithm, which run faster than any possible probabilistic classical algorithm. Given sufficient computational resources, a classical computer could be made to simulate any quantum algorithm; quantum computation does not violate the Church–Turing thesis. However, the computational basis of 500 qubits, for example, would already be too large to be represented on a classical computer because it would require 2500 complete values to be stored.

deep web

Fig: The Bloch sphere is a representation of a qubit, the fundamental building block of quantum computers.

The Unknown Levels

Beyond the fifth level it is not yet known properly if there are any more levels in the hierarchy of internet however it is still believed that beyond the fifth level there are three more levels the 6th, 7th and 8th level
Most people think that there are 8 layers in total. the last one being the primarch system. you need quantum computing to get past the 6th layer. and this is where things get really tough. The 7th layer is where the big players are. They are all trying to stop each other.
Basically there are hundreds of million (or billion) dollar operations gunning for control.

Level 8 is impossible to access directly. The primarch system is literally the thing controlling the internet atm. no government has control of it. in fact nobody even knows what it IS. It’s an anomaly that basically was discovered by super deep web scans in the early 2000’s.
The system is unresponsive but it sends out unalterable commands to the entire net randomly.

Conclusion

The Deep Web as we know is that part of Internet which can’t be indexed by regular search engines. Most of the internet is hidden from us and to access it we need proxy and TOR to a certain depth beyond which we need more refined and advance technologies as closed shell systems and quantum computers.

The Deep Web contains regular files as text, image or videos as those of surface web however most of these documents are illegal ranging from hacking to illegal product of underage materials both audio and visual and terrorist networks to underground black markets and assassination network.

Other than most illegal of contents there is also a vast resource of information as books and other documents as research papers, blue prints, experiment details and so on.

References

  1. Bergman, Michael K (July 2000). The Deep Web: Surfacing Hidden Value. Bright Planet LLC.
  2. He, Bin; Patel, Mitesh, Zhang, Zhen; Chang, Kevin Chen-Chuan (May 2007). “Accessing the Deep Web: A Survey”. Communications of the ACM (CACM) 50 (2): 94–101. doi:10.1145/1230819.1241670.
  3. Garcia, Frank (January 1996). “Business and Marketing on the Internet”. Masthead 15
    (1). Archived from the original on 1996-12-05. Retrieved 2009-02-24.
  4. @1 started with 5.7 terabytes of content, estimated to be 30 times the size of the nascent World Wide Web; PLS was acquired by AOL in 1998 and @1 was abandoned. “PLS introduces AT1, the first ‘second generation’ Internet search service” (Press release). Personal Library Software. December 1996. Retrieved 2009-02-24.
  5. Raghavan, Sriram; Garcia-Molina, Hector (2001). “Crawling the Hidden Web” (PDF). Proceedings of the 27th International Conference on Very Large Data Bases (VLDB). pp. 129–38.
  6. The Hidden Wiki [http://kpvz7ki2v5agwt35.onion/wiki/index.php/Main_Page]
  7. TORDIRECTORY[http://dppmfxaacucguzpc.onion]

Durlabh Gogoi

Life is a cycle loop, many a times we find ourselves in the deja vu crossroads over and over again. Its an eternal struggle to overcome this circle of predictability and so, here I start again, to find the truth and overcome this mystery through the pictures within my mind.

Add comment

About Journal

Thar Multidiscipline Journal is a bi-annual international e-journal of scholarly research articles/papers covering all disciples of studies which can be accessed via electronic transmission. The journal shall be solely published on the web in a digital format. However, as most electronic journals, it may subsequently evolve into print component maintaining the electronic version. The journal may publish special issues over and above the mandatory bi-annual issues.

Topics