A Search Log Analysis of a Portuguese Web Search

Transcrição

A Search Log Analysis of a Portuguese Web Search
A Search Log Analysis of a
Portuguese Web Search Engine
Miguel Costa, Mário J. Silva
LaSIGE @ Faculty of Sciences, University of Lisbon
Foundation for National Scientific Computing
INFORUM 2010, Braga, Portugal
Problem
Do Portuguese users search
in the same way as other users?
Does search behavior influence
web search engine design?
2/20
Applications
• Speed
– e.g. special indexes, cache
• Quality of results
– e.g. better ranking
• Web design
– e.g. stand out most used functionalities
3/20
Summary
• Introduction
• Methodology & Dataset
• Results
• Conclusions
4/20
Search Log Analysis
PROS:
• Large and varied
• Less bias
• Cheap
• Non-intrusive
CONS:
• Lack of context
• Lack of control
5/20
Dataset
• Tumba – http://www.tumba.pt
• 2 full years – 2003 & 2004
– several studies from the same period
– baseline for future works
• 90% of the IP addresses → Portugal
• 98% of the interactions → Portuguese interface
6/20
How do users search?
7/20
How do users search
Session Duration (min)
50%
40%
30%
20%
10%
0%
[240,inf[
[180,240[
[120,180[
[60,120[
[30,60[
[15,30[
[10,15[
[5,10[
[1,5[
80%
[0,1[
Fast and short sessions
• Fast
• Few queries
• Few terms
• Few result pages
• Few clicks
8/20
How do users search
Queries per Session
50%
40%
30%
20%
10%
0%
87%
# Terms Changed
82%
≤-5
-4
-3
-2
-1
0
1
2
3
4
≥5
50%
40%
30%
20%
10%
0%
1
2
3
4
5
6
7
8
9
≥10
Result Page Viewed
100%
80%
60%
40%
20%
0%
1
2
3
4
5
6
7
8
9
≥10
75%
1
2
3
4
5
6
7
8
9
≥10
50%
40%
30%
20%
10%
0%
Terms per Query
Evolution from 2003 to 2004
• -½ term of query length
• +10% of sessions with less than 1 minute
• +9% of sessions with only one query
• +8% of sessions where only the first result page was viewed
10/20
Evolution from 2003 to 2004
• -½ term of query length
• +10% of sessions with less than 1 minute
• +9% of sessions with only one query
• +8% of sessions where only the first result page was viewed
Less data submitted, less results seen
11/20
What do users search for?
12/20
Top Search Queries
13/20
Top Search Queries
14/20
Topic Categories
Categories
1
2
3
4
5
6
7
8
9
10
11
Commerce, Travel, Employment or Economy
People, Places or Things
Health or Sciences
Education or Humanities
Society, Culture, Ethnicity or Religion
Computers or Internet
Sex or Pornography
Entertainment or Recreation
Government
Performing or Fine arts
Unknown or Other
2003
2004
∆%
% queries % queries
22.4
14.8
10.5
7.2
5.6
6.4
4.9
8.7
7.0
1.6
11.2
20.3
17.7
11.8
10.5
6.1
5.9
5.8
5.1
4.2
1.6
11.3
- 2.1
2.9
1.3
3.3
0.5
- 0.5
0.9
- 3.6
- 2.8
0.0
0.1
15/20
Comparison
world region
search engine
U.S.
Excite
Europe
FAST
Portugal
Tumba!
single term queries
terms per query
result pages viewed
queries per session
topic most seen
20% -30%
2.6
1.7
2.3
Commerce,
Travel
25% -35%
2.3
2.2
2.9
People,
Places
40%
2.2
1.4
2.49 -2.94
Commerce,
Travel
Less data submitted, less results seen
16/20
Conclusions
17/20
Conclusions
• Portuguese users
– spend little time and effort on individual searches
– tend to submit less data and see less results
– search differently than other users
– specificities can be used to tune web search engines
18/20
Future Work
• Updated characterization of Portuguese users
• Characterization of Portuguese users from web archives
19/20
Portuguese Web Archive
http://archive.pt
80% of the web documents are
unavailable after 1 year
20/20
Questions
Thank you.

Documentos relacionados

Europass Curriculum Vitae

Europass Curriculum Vitae Hardware and Network Administrator, Multimedia Developer Web Designer, 3D Modeling, Multimedia Creation, Graphic Design, Leaflet Design, Network and Hardware Administration

Leia mais

This is a title

This is a title for data collection. For each unlock method users learned or configured their code and tried it out until they were confident that it was memorized. The observer was then called to observe above th...

Leia mais