Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/crawler #90

Closed
wants to merge 57 commits into from
Closed

Features/crawler #90

wants to merge 57 commits into from

Conversation

steffencruz
Copy link
Collaborator

@steffencruz steffencruz commented Feb 4, 2024

Introduces a Markov chain-based framework which enables multi-turn conversation with coherence and continuity between rounds.

  • Adds TransitionMatrix which wraps the Markov process
  • Adds ContextChain

This is still WIP, but has been tested (ad hoc) with Wiki-based tasks

@steffencruz steffencruz changed the base branch from main to staging February 4, 2024 22:41
@steffencruz steffencruz linked an issue Feb 4, 2024 that may be closed by this pull request
@steffencruz steffencruz marked this pull request as draft February 5, 2024 17:22
@steffencruz
Copy link
Collaborator Author

steffencruz commented Feb 8, 2024

conversation.csv

Trial 1

Step 1 (Summarization)

  • title: Gonçalo Cardoso (footballer, born 2000)
  • topic: Career
  • subtopic: International career

Step 2 (Summarization)

  • title: 2020–21 FC Basel season
  • topic: Overview
  • subtopic: Winter break transfer window

Step 3 (QA)

  • title: 2020–21 Swiss Super League
  • topic: Awards
  • subtopic: Annual awards

Step 4 (QA)

  • title: 2004–05 Swiss Super League
  • topic: Overview
  • subtopic: Overview

Step 5 (QA)

  • title: 2004 in association football
  • topic: Winners of national club championships
  • subtopic: South America

Step 6 (QA)

  • title: 2004 in association football
  • topic: Events
  • subtopic: Events

Step 7 (QA)

  • title: 1830s in association football
  • topic: Events
  • subtopic: Events

Step 8 (Summarization)

  • title: 1820s in football
  • topic: Events
  • subtopic: 1823

Step 9 (QA)

  • title: 1830s
  • topic: Science and technology
  • subtopic: Electricity

Trial 2

Step 1 (QA)

  • title: Humpolec
  • topic: Geography
  • subtopic: Geography

Step 2 (Summarization)

  • title: Havlíčkův Brod
  • topic: History
  • subtopic: History

Step 3 (Summarization)

  • title: Havlíčkův Brod
  • topic: Demographics
  • subtopic: Economy

Step 4 (QA)

  • title: Czech Republic national football team
  • topic: History
  • subtopic: 2000s

Step 5 (QA)

  • title: FIFA
  • topic: Structure
  • subtopic: Video replay and goal-line technology

Trial 3

Step 1 (Summarization)

  • title: Eve's Hangout
  • topic: Legacy
  • subtopic: Legacy

Step 2 (Summarization)

  • title: Eve's Hangout
  • topic: Police raid and closure
  • subtopic: Police raid and closure

Trial 4

Step 1 (DateQA)

  • title: March_18
  • topic: Events
  • subtopic: 1852

Step 2 (Summarization)

  • title: 1852 United States presidential election
  • topic: General election
  • subtopic: Records

Step 3 (QA)

  • title: 1852 United States presidential election
  • topic: Nominations
  • subtopic: Native American (Know-Nothing) Party nomination

Step 4 (QA)

  • title: 1852 United States presidential election
  • topic: General election
  • subtopic: Results

Step 5 (Summarization)

  • title: 2020 United States presidential election in Delaware
  • topic: Primary elections
  • subtopic: Primary elections

Step 6 (QA)

  • title: 2020 United States presidential election
  • topic: Texas v. Pennsylvania
  • subtopic: Suggestion to have state legislatures choose Electoral College voters

Step 7 (Summarization)

  • title: Timeline of the 2020 United States presidential election (November 2020–January 2021)
  • topic: January 2021
  • subtopic: January 2021

Trial 5

Step 1 (QA)

  • title: Battersby railway station
  • topic: History
  • subtopic: History

Step 2 (QA)

  • title: Battersby railway station
  • topic: Services
  • subtopic: Services

Step 3 (Summarization)

  • title: Reddish South railway station
  • topic: History
  • subtopic: History

Step 4 (QA)

  • title: Atherton station
  • topic: History
  • subtopic: History

Step 5 (QA)

  • title: Southern Pacific Railroad
  • topic: Presidents
  • subtopic: See also

Step 6 (QA)

  • title: Southern Pacific Railroad
  • topic: Diesel locomotives
  • subtopic: Notable accidents

Trial 6

Step 1 (Debugging)

  • title: lexek/chat
  • topic: Java
  • subtopic: src/main/java/lexek/wschat/db/model/form/UsernameForm.java

Step 2 (Summarization)

  • title: Final Fantasy
  • topic: Development and history
  • subtopic: Music

Step 3 (QA)

  • title: Final Fantasy VI
  • topic: Development
  • subtopic: Creation

Step 4 (QA)

  • title: Final Fantasy III
  • topic: Plot
  • subtopic: Story

Step 5 (Summarization)

  • title: Final Fantasy III
  • topic: Gameplay
  • subtopic: Gameplay

Step 6 (Summarization)

  • title: Music of Final Fantasy III
  • topic: Final Fantasy III Original Soundtrack
  • subtopic: Legacy

Step 7 (QA)

  • title: Final Fantasy VII
  • topic: Synopsis
  • subtopic: Later releases

Step 8 (QA)

  • title: Final Fantasy
  • topic: Other media
  • subtopic: Characters

Step 9 (QA)

  • title: Gameplay of Final Fantasy
  • topic: Development and history
  • subtopic: Reception

Step 10 (Summarization)

  • title: Eidos Interactive
  • topic: History
  • subtopic: Parent Eidos taken over by SCi (2005–2009)

Trial 7

Step 1 (Mathematics)

  • title: misc
  • topic: misc
  • subtopic: signum_function

Step 2 (Mathematics)

  • title: algebra
  • topic: algebra
  • subtopic: vector_dot

Step 3 (Mathematics)

  • title: basic_math
  • topic: basic_math
  • subtopic: addition

Step 4 (Mathematics)

  • title: misc
  • topic: misc
  • subtopic: prime_factors

Step 5 (Mathematics)

  • title: computer_science
  • topic: computer_science
  • subtopic: modulo_division

Step 6 (Mathematics)

  • title: geometry
  • topic: geometry
  • subtopic: pythagorean_theorem

Trial 8

Step 1 (DateQA)

  • title: June_24
  • topic: Births
  • subtopic: 1898

Step 2 (DateQA)

  • title: July_02
  • topic: Deaths
  • subtopic: 1864

Step 3 (DateQA)

  • title: May_13
  • topic: Births
  • subtopic: 1972

Step 4 (DateQA)

  • title: July_04
  • topic: Births
  • subtopic: 1960

Step 5 (QA)

  • title: Solar eclipse of August 21, 2017
  • topic: Views outside of the US
  • subtopic: Solar eclipses ascending node 2015–2018

Trial 9

Step 1 (Mathematics)

  • title: geometry
  • topic: geometry
  • subtopic: perimeter_of_polygons

Step 2 (Mathematics)

  • title: basic_math
  • topic: basic_math
  • subtopic: cube_root

Step 3 (Summarization)

  • title: Solid geometry
  • topic: History
  • subtopic: History

Step 4 (Summarization)

  • title: Euclidean space
  • topic: Definition
  • subtopic: Distance and length

Step 5 (QA)

  • title: Affine transformation
  • topic: Properties
  • subtopic: Groups

Step 6 (Summarization)

  • title: Real projective space
  • topic: Basic properties
  • subtopic: Construction

Step 7 (QA)

  • title: Projective space
  • topic: Projective transformation
  • subtopic: Classification

Step 8 (Summarization)

  • title: Antilinear map
  • topic: Citations
  • subtopic: References

Step 9 (QA)

  • title: Bijection
  • topic: Examples
  • subtopic: Inverses

Step 10 (QA)

  • title: Binary relation
  • topic: Examples
  • subtopic: Heterogeneous relation

Trial 10

Step 1 (DateQA)

  • title: December_16
  • topic: Births
  • subtopic: 1944

Step 2 (DateQA)

  • title: January_27
  • topic: Events
  • subtopic: 1999

Step 3 (DateQA)

  • title: September_25
  • topic: Births
  • subtopic: 1957

Step 4 (Summarization)

  • title: 1957
  • topic: Births
  • subtopic: September

@steffencruz
Copy link
Collaborator Author

Debugging examples after most recent commit

Trial 1

Step 1 (Debugging)

  • title: jgb11/design-patterns-tutorial
  • topic: Java
  • subtopic: src/main/java/com/jgb/designpatterns/factorymethod/parser/impl/ResponseXMLParser.java

Step 2 (QA)

  • title: Java syntax
  • topic: java.lang.Throwable
  • subtopic: Packages

Step 3 (Summarization)

  • title: Java syntax
  • topic: Type inference
  • subtopic: Comments

Trial 2

Step 1 (Debugging)

  • title: Altoterras/TheHeartOfSourcerer
  • topic: C++
  • subtopic: game/src/sourcerer/es/EsObjectBox.cpp

Trial 3

Step 1 (Debugging)

  • title: ycabon/presentations
  • topic: JavaScript
  • subtopic: 2018-user-conference/arcgis-js-api-road-ahead/demos/gamepad/api-snapshot/esri/renderers/smartMapping/creators/support/utils.js

Step 2 (QA)

  • title: Futures and promises
  • topic: History
  • subtopic: History

Step 3 (Summarization)

  • title: Dataflow programming
  • topic: Languages
  • subtopic: Languages

Trial 4

Step 1 (Debugging)

  • title: bamsdev/meanjsstack
  • topic: HTML
  • subtopic: modules/timetables/client/views/timetable.client.view.html

Step 2 (Debugging)

  • title: sunqm/mpi4pyscf
  • topic: Python
  • subtopic: tests/scf/test_hf.py

Step 3 (Summarization)

  • title: History of Python
  • topic: Version 3
  • subtopic: Version 3

@p-ferreira
Copy link
Contributor

Is this PR planned to be in the next release?

@steffencruz steffencruz marked this pull request as ready for review February 12, 2024 22:31
@steffencruz steffencruz changed the base branch from staging to pre-staging February 16, 2024 18:55
@p-ferreira p-ferreira added v1.1.1 and removed v1.1.0 labels Feb 20, 2024
@p-ferreira p-ferreira added backlog and removed v1.1.1 labels Feb 28, 2024
@steffencruz steffencruz changed the base branch from pre-staging to staging March 1, 2024 20:28
@steffencruz steffencruz changed the base branch from staging to pre-staging March 1, 2024 20:29
@steffencruz
Copy link
Collaborator Author

Tracked experiment

@Hollyqui
Copy link
Collaborator

Closed on request of @steffencruz due to not being relevant anymore

@Hollyqui Hollyqui closed this Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve wikipedia retrieval mechanism
3 participants