-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrobots.txt
78 lines (60 loc) · 2.66 KB
/
robots.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#
# robots.txt for your eepsite
#
# You can use this file to control how web crawling robots (for example search
# engine robots like the Googlebot) index your site. Only well-behaving robots
# will abide the rules you set in this file, thankfully those are a majority.
# Robots that do not abide the robots.txt rules will be able to index anything
# they find. So be aware that this file does not allow you to actually lock
# anyone out. It is voluntary.
# Keep this file in the root of your site, ie: myeepsite.i2p/robots.txt
# Remove the # in front of lines you want to use (uncomment). By default robots
# are allowed to index your whole site.
##### Syntax:
# User-agent: Botname
# Disallow: /directory/to/disallow/
# You can use a * in the User-agent field to select all robots:
# User-agent: *
# You can not use * as a wildcard in the Disallow string.
# To allow indexing your whole site you leave the Disallow field empty:
# Disallow:
##### Examples:
# At the time of writing there are only two active search engines in the
# I2P network: http://eepsites.i2p and http://yacysearch.i2p
# Because eepsites.i2p does abide robots.txt but not the User-agent string, the
# Yacybot is used in these examples.
# To control the eepsites.i2p robot you can use the HTML <meta> tag instead.
# Example: <META name="ROBOTS" content="NOINDEX, NOFOLLOW">
# If the robot sees above line it will neither index that url not will it
# follow links on it to further pages.
# Options for the content attribut are: INDEX or NOINDEX and FOLLOW or NOFOLLOW.
# You can also use <meta name="robots" content="noarchive"> to disable caching.
# To allow Yacy to access anything but disallow all other robots:
# User-agent: yacybot
# Disallow:
# User-agent: *
# Disallow: /
# To disallow Yacy from accessing anything:
# User-agent: yacybot
# Disallow: /
# To disallow Yacy from accessing the /stuff/ directory, eg me.i2p/stuff/ :
# User-agent: yacybot
# Disallow: /stuff/
# If Google was crawling I2P and you would not want them to index your site
# User-agent: Googlebot
# Disallow: /
# To disallow any well-behaving robots from accessing your /secret/ and
# /private/ directories:
# Keep in mind that this is NOT blocking anyone else. Use proper authentication
# if you want your private and secret things to stay private and secret. Also
# everyone can read the robots.txt file and see what things you want to hide.
# User-agent: *
# Disallow: /secret/
# Disallow: /private/
# Disallow robots to index a specific file:
# User-agent: *
# Disallow: /not/thisfile.html
# Allow everyone to access everything. This rule is active by default.
# Comment it with # at the start of the line to disable it.
User-agent: *
Disallow: